Hunch

2

Tokyo Cabinet Python bindings

March 2, 2009 by Rasmus, tagged code, dbm, python, software and tokyocabinet, filed under software

Today I released tc – Python bindings to the Tokyo Cabinet database library. The code is heavily based on that of Tasuku Suenagas pytc and improves in many ways (documentation, code structure, python 2.6 and 3.0 compatibility, robust setup.py, etc).

It’s currently available for Python 2.4, 2.5, 2.6 and 3.0 in MacPorts and PyPI. Source resides in the ‘Hub.

Continue reading...

10

Tokyo Cabinet

February 28, 2009 by Rasmus, tagged database, dbm, performance, python, software, tokyocabinet and tyrant, filed under software

Lately I’ve been researching some into the holy grail of keyed data storage – best combination of performance, scalability, efficiency and availability. There are many, many options available ranging from the Berkeley DB to BigTable implementations like Hypertable.

Last weekend I spent some time looking into using BDB in a BigTable fashion for managing schema-free tables. However my tests revealed many problems with a solution like that. For instance, BDB is really slow when writing random keys into databases of >100k row size. In the beginning of this week I had a chat with Jon Åslund regarding this idea and he introduced me to Tokyo Cabinet – a modern, battle-tested and extremely high-performance DBM.

Despite the somewhat uncool name, Tokyo Cabinet is a silent beast developed by Mikio Hirabayashi and used in the high-load environment of Japanese Facebook-equivalent Mixi. TC (short for Tokyo Cabinet) is written in C99 C, sporting a clean and modern API.

Mikio states TC improves on other DBMs in the following areas:

  • Improves space efficiency – smaller size of database file.
  • Improves time efficiency – faster processing speed.
  • Improves parallelism – higher performance in multi-thread environment.
  • Improves usability – simplified API.
  • Improves robustness – database file is not corrupted even under catastrophic situation.
  • Supports 64-bit architecture – enormous memory space and database file are available.

Continue reading...

2

Smisk 1.1 released

December 19, 2008 by Rasmus, tagged python, smisk and software, filed under software

Smisk 1.1 have been released, including an extensive python library introducing Content Negotiation, MVC design pattern support and much more.

Smisk is a web service framework for Python. Learn more about Smisk at the Smisk website. There is also some brand new documentation available.

Continue reading...

3

Smisk in Spotify

November 24, 2008 by Rasmus, tagged python, smisk, software and spotify, filed under software

Smisk

Spotify replaced Twisted with Smisk for one of its backend services, which marks the first serious deployment of Smisk! Version 1.1 is soon to be released and will be announced on smisk-announce as well as in this blog.

Smisk boosted performance for some services in 10 multiples, when compared to Twisted.

Smisk is a high-performance web service framework, with its core components written in C, but controlled by Python. Smisk also exposes a MVC-inspired package, adding Transparent Content Negotiation, class tree routing, templating, ORM-support, etc. Read more at the Smisk website.

1

Hunch Aggregator

August 14, 2008 by Rasmus, tagged aggregation, essay, hunch, php, python, software and web, filed under software

I’ve never written about this piece of software before, but have gotten a few questions about it recently, so I thought I’d shed some light on things.

The Hunch Aggregator is a pretty simple thing – it keeps a central state of many addressable online services. Once upon a time, I created and managed most services myself, like hosting images, blogging, chatting, etc. Suddenly better services than those which I have written popped up – Flickr, Jaiku/Twitter, Facebook, Wordpress, Google Reader and so on. So like any pragmatic tech-savvy user would do, I distributed the tasks. Outsourced the pain in the ass of keeping things up to date and working. Now, a new problem arouse: I am one person, but to the outside, the myriad of services did express several different users. Different “persons” or identities. All I want is my friends to be able to hear me, not spend all their time surfing around this myriad of specialized websites.

So I came up with the simple idea of presenting my stuff as a singular stream of events, occurring over time. The first versions of the aggregation software was clumsy and hard to extend. It was unable to synchronize (only add new things) and the presentation was not very sexy. A few years later I blew off the dust from the idea and began from scratch. The result was what is now running on hunch.se.

Continue reading...

Smisk 1.0

May 12, 2008 by Rasmus, tagged programming, python and smisk, filed under software

So, after two years, The Smisk web service framework has finally been released as a stable version.

Smisk is a small but powerful framework, or more like a base, for building and running high-performance web services, based on FastCGI. Before I get any further, here is the classic example:

from smisk import Application
class MyApp(Application):
  def service(self):
    self.response.write("Hello World!")

MyApp().run()

This example is obviously not much to hang up in the christmas tree, as we say in Sweden, but there’s definitely more. The major feature of Smisk is it’s speed and this because of the fact that it’s written in C. Yes, not Python. It’s a machine-native library that manifests itself as a Python extension thus control is done with Python.

Installing Smisk is very easy if you have setuptools:

sudo easy_install smisk

(There are other means of installation available…)

As I mentioned earlier, Smisk is a FastCGI based entity. As the name suggests, this is a fast interface, or a fast proxy interface, for HTTP services. FastCGI was built to do two things in particular: Be as fast as possible and scale as good as possible. Smisk retains both of those criteria.

Continue reading...

First release of py-fcgi

June 11, 2007 by Rasmus, tagged fastcgi, fcgi and python

My first serious python extension (written in C), py-fcgi, has been released.
It is a FastCGI client/process layer allowing for efficient and simple creation of FastCGI applications written in Python.

1

A dive into Python, FastCGI and Lighttpd

May 6, 2007 by Rasmus, tagged fastcgi, lighttpd and python

I decided to do some performance analysis on python and fcgi.
As I’ve been reading about, testing, coding alot and using FastCGI for quite a while, I have my doubts and wonders. But one question remains:

Do you gain from FCGI connection multiplexing?

After a few hours, I have a very clear answer:
No. You don’t. Multiplexing will just slow things down. To get you up to speed; multiplexing is as “simple” as one process handling several network connections simultaneous. In the case of Allan Saddis fcgi module, this is done using select() and threads.

The overhead of creating and destroying threads, locking stuff and using asynchronous network i/o is simply too big to be overseen.

Before we totally discard the multiplexed, threaded model, I want to make it very clear that a multiplexed, threaded model MAY HAVE positive performance impact in certain applications. If for example a process (application) require considerable amount of startup time (when the process is spawned), it may be quicker to run few, or one single, process and instead make it spawn threads, thus avoiding any startup processing overhead each time it needs more connection handlers.

I have run two tests on two different machines. The X-axis specifies the number of processes being run by lighttpd. (real computer processes) In the case of multiplexing, thread count is not included and may be high. The Y-axis displays requests/second (How much throughput a setup can handle. Higher is better.).
First up is an iMac 24″ featuring a Intel Core Duo 2.16 GHz processor with 2 cores.

rasmusim_i686-1×2c_darwin.png

We clearly see how both the multiplexed and non-multiplexed variant runs optimally on 2 cores. The Multiplexed variant experiences much less fluctuation in process count, but requires more time due to threading and locking overhead.
Next up is a mac pro sporting 2 x 2.66 GHz Dual-Core Intel Xeon processors. That makes it a total 4 cores.

apple2_i686-2×2c_darwin.png

Not a big surprise – this machine is speedier. Here we also see the optimal 4-core-line. However, a very interesting pattern occurs with the, much better, non-multiplexed version of our program. Another strange notation is the performance is slightly better when running 13 processes than running theoretically optimal 4 processes. May be due to BSD/Darwin process handling, lighttpd or a number of other things. I have no idéa what causes this phenomenon.

Test setup

I generally have the idea of not getting to down in to geeky details, but in this case it’s unfortunately hard to avoid.