Python metaclasses in depth

Metaclasses have a reputation for being among the blackest of Python’s black magic. As it turns out, they’re actually pretty simple, but they require you to understand some concepts. There are two ways to learn metaclasses in Python:

  1. Skip the details and get straight to a useful implementation of a metaclass.
  2. Understand the core concepts behind metaclasses.

Number 1 is useful, but will only get you so far. If you really want to understand metaclasses, you need to take approach number 2.

With that in mind, I’m going to start with the basics of how classes are constructed in Python. Let’s consider the following class:

class SomeClass(object):
     x = 1
     y = 2

If you’re ready to learn about metaclasses, the above statement shouldn’t require much thought to understand. But let’s stop to think about it a bit anyway. In Python, everything is an object. Therefore, classes are also objects. But if SomeClass is an object, it must be an instance of some class. Let’s find out what that class is:

>>> type(SomeClass)
<type 'type'>

So apparently, it’s an instance of type. We just saw the most common usage of type: to query an object’s type. But have you ever read the help on that function?

>>> help(type)

Help on class type in module __builtin__:

class type(object)
 |  type(object) -> the object's type
 |  type(name, bases, dict) -> a new type
 ...

So it turns out that type is not only a builtin function, it’s also a class! How do we instantiate an instance of type? You’ve already seen the most obvious way: using a class statement. But did you know that you can create a class using type?

SomeClass = type('SomeClass', (object,), {'x' : 1, 'y' : 2})

For all intents and purposes, the above statement is equivalent to the prior definition of SomeClass. But this way is ugly and isn’t used very commonly. That said, we’ve demonstrated something: type isn’t just any class. It’s the class of classes. It’s a metaclass.

Let’s go a step further though. How does the compiler generate the dictionary that’s the third argument to type? As it turns out, classes have something in common with functions: they have local namespaces. You might have seen the locals function used like this:

def some_func():
    x = 1
    y = 2
    print locals()

If you execute this, the output is this:

>>> some_func()
{'y': 2, 'x': 1}

If classes have their own namespaces, then they must also be able to use the locals function as well:

>>> class SomeClass(object):
...     x = 1
...     y = 2
...     print locals()
...
{'y': 2, 'x': 1, '__module__': '__main__'}

The only difference here is that the namespace is generated at import time and passed into the type function.

So this is all interesting, but we haven’t really seen anything terribly useful yet. Let’s go further. We’ve seen that type is the class of classes. But if type is a class, then we must be able to subclass it. There are a number of reasons you might want to do this. For instance, you may sometimes want to attach a property to a class rather than to an instance. Let’s do just that:

class SomeMetaClass(type):
    @property
    def z(self):
        print 'In class property z'
        return self.x + self.y

>>> SomeClass = SomeMetaClass('SomeClass', (object,), {'x' : 1, 'y' : 2})
>>> SomeClass.z
In class property z
3

But we don’t want to use this same ugly notation for creating SomeClass. Python provides syntactic sugar for this. We can instead define SomeClass like this:

class SomeClass(object):
    __metaclass__ = SomeMetaClass
    x = 1
    y = 2

A more common use of metaclasses is to create a class constructor. Let’s attach z to the class directly rather than defining a property:

>>> class SomeMetaClass(type):
...         def __init__(self, name, bases, dict):
...             self.z = self.x + self.y
...
>>> class SomeClass(object):
...         __metaclass__ = SomeMetaClass
...         x = 1
...         y = 2
...
>>> SomeClass.z
3

As we can see, we’ve defined a constructor for SomeClass. Now let’s go a bit further. What if we want to change the base class of SomeClass? That can be done, but we have to use a __new__ method. I’m going to presume that you know a bit about __new__ methods. If you don’t, you might want to read up on them.

>>> class SomeMetaClass(type):
...     def __new__(cls, name, bases, cls_dict):
...         new_cls = type.__new__(cls, name, (dict,), cls_dict)
...         return new_cls
...
>>> class SomeClass(object):
...     __metaclass__ = SomeMetaClass
...     x = 1
...     y = 2
...
>>> x = SomeClass()
>>> x['foo'] = 'bar'
>>> x
{'foo': 'bar'}

That should hopefully give you an idea of what metaclasses are and how to use them. If you’re even more lost than you were before, don’t worry. This is just one of those things that requires an “aha!” moment. You might also want to check out Michael Foord’s great Meta-classes Made Easy for a different perspective.

How Celery, Carrot, and your messaging stack work

If you’re just starting with Celery right now, you’re probably a bit confused. That’s not because celery is doing anything wrong. In fact, celery does a very good job of abstracting out the lower-level stuff so you can focus just on writing tasks. You don’t need to know very much about how any of the messaging systems you’re using will work. However, to truly understand celery, you need to know a bit about how it uses messaging and where it fits in your technology stack. This is my attempt to teach you the things you need to know about the subject to be able to make everything work.

Messaging

At the very bottom of celery’s technology stack is your messaging system, or Message Oriented Middleware in enterprise-speak. As of this writing, there are a couple of standards out there in this market:

  • AMQP – A binary protocol that focuses on performance and features.
  • STOMP – A text-based format that focuses on simplicity and ease of use.

Of course, there are a lot more players out there than just this. But these are the two protocols that are the most important to celery.

Now, a protocol is totally useless without software that actually implements it. In the case of AMQP, the most popular implementation seems to be RabbitMQ. The popular implementation of STOMP seems to be Apache ActiveMQ.

Carrot

A good analogy that I think most people can wrap their heads around is the SQL database. STOMP and AMQP are like SQL, while RabbitMQ and ActiveMQ are like Oracle and SQL Server. Any one who has had to write software that works with more than one type of database knows how challenging this can be. Sure, it’s easy to issue SQL commands directly when you just support one type of database, but what happens when you need to support another? One possible solution is to use an ORM. By abstracting out the lower-level stuff, you make your code more portable.

The first thing most ORMs do is provide an abstraction to write SQL queries. For instance, if I want to write a LIMIT query for SQL Server, I would do something like this:

SELECT TOP (10) x FROM some_table

Oracle’s query would look something like this:

SELECT x FROM some_table WHERE row_num < 10

These are different queries, but they are both doing the same basic thing. That’s why SQLAlchemy allows you to write the query like this:

select([some_table.x], limit=10)

This is the functionality that carrot provides. Although most messaging systems are fundamentally different in a lot of ways, there are certain operations that every platform has some version of. For example, sending a message in STOMP would look like this:

SEND
destination:/queue/a

hello
^@

AMQP’s version is binary, but would look something like this in text format:

basic.publish "hello" some_exchange a

Since we don’t want to worry too much about these protocols at a low level, carrot creates a Publisher class with a “send” message.

Celery

Carrot makes it so that we can forget about a lot of the lower-level stuff, but it doesn’t save us from the fact that we’re still working with a messaging protocol (albeit a higher-level one). Going back to the ORM analogy, we can see the same thing happening: we need a layer of abstraction to make dealing with different implementations of SQL easier, but we don’t want to write SQL. We want to write Python (or whatever your language of choice is).

Thus, ORMs will add another layer of abstraction. Wouldn’t it be nice if we could just treat a database row as a Python object? Or, in the case of task execution, wouldn’t it be nice if we could just treat a task as a Python function? This is where celery comes in. See, we could run tasks like this:

  1. Process A wants to run task “foo.bar”
  2. Process A puts a message in queue saying “run foo.bar”
  3. Process B sees this message and starts on it
  4. When done, Process B replies to Process A with the status.
  5. Process A acknowledges this message and uses the return result.

Rather than having to code all the details of the messaging process, celery allows us to just create a Python function “foo.bar” that will do the above for us. Thus, we can execute tasks asynchronously without requiring that people reading our code know everything about our messaging backend.

Hopefully, this gives you a high-level overview of how celery is working behind the scenes. There are a lot of details that I’ve left out, but hopefully this provides you with enough knowledge that you can figure the rest out.

Embrace quirks when working with others

For me, learning to work with people was a two-step process:

  1. Realize that other people are different from me. And I don’t just mean that other people just act differently. I mean they are completely different with different motivations.
  2. Realize that this is actually a good thing.

It seems to me that Derek Sivers has learned the first lesson, but I’m not sure that he’s learned the second. When we realize the first lesson, most of us try to avoid the issue. If people are different, the goal must be to suppress those differences. Unfortunately, people just don’t work that way. David Keirsey would describe this as a Pygmalion Project.

For those of you who aren’t familiar with Roman Mythology, Pygmalion was a sculptor who could not find the perfect woman. Solution? He sculpted one. Venus was touched by this story, so she turned the statue into a real woman.

This is a cute story, but it parallels reality more than we realize. Like Pygmalion, when we encounter someone who’s different our first instinct is to shape them into something that’s the same. Keirsey talks about this as the downfall of many relationships: we go to great lengths to find somebody who is different from us, but then we try to make them into ourselves. Of course Derek shows that one can have a Pygmalion project on themself. If we view ourselves as abnormal, the solution is to be more normal.

This tends to have the effect of lowering one’s self esteem. No matter how hard you try, you can’t make yourself be something you’re not. If you try to suppress your quirks, you will either fail or become a mediocre version of what you perceive as “normal”.

I would argue that not only is it not feasible change peoples' personalities, it’s not preferable. Your quirks are what make you unique, and they are the reason your coworkers need you. Quirks are different ways of looking at things. Rather than suppress them, learn to appreciate them. Most importantly, become proud of your quirks. Other people may not understand them, and that’s the point.

The only time quirks become problematic is when you turn them into Pygmalion projects. If you expect others to share your quirks, you will quickly run out of friends. However, when you learn to appreciate quirks in yourself and others, you become a better person. You begin to appreciate peoples' strengths and weaknesses. If you pay attention, you might even discover that the person you thought was incompetent is really just different from you.

The moral of this story: Embrace quirks, but don’t enforce them.

The Rumsfeld Hazard

There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we now know we don’t know. But there are also unknown unknowns. These are things we do not know we don’t know. —Donald Rumsfeld

(I should note something here: although this blog post is named after a political figure, I generally prefer to avoid political references here. So don’t construe this as being a political message.)

I seem to recall Rumsfeld being very much criticized for the above quote. Now, I’m not saying that the above quote was the appropriate response to the question he was asked. However, there is some truth to what he said. Too often, we think in terms of knowns and unknowns. We treat planning a software project as if it were an obstacle course: overcome this and this obstacle and you’re done!

If this were the case, software development would be a much easier practice. It’s not so much that comparing software development to an obstacle course isn’t an apt analogy. But the obstacle course is more insidious than we think it is. Almost always, the obstacle course looks easy. The problem is that it’s built upon a minefield and you weren’t told about it. Unless you get really lucky, you’re in for a huge surprise.

And don’t think that you can beat this by bringing a mine detector through the second time around. By then, they’ll have replaced the mines with pits of spikes or some other twisted but hidden trap. A software developer has the unenviable job of traversing this obstacle course.

A lot of us will carefully map our plans out, only to have a single mine invalidate the entire plan. That mine is the Rumsfeld Hazard: a problem you didn’t know about, couldn’t have predicted, and thus couldn’t plan for. If there’s one thing that agile development methods help with, it’s dealing with the Rumsfeld hazard. Sure you won’t be any less surprised when the mine goes off, but you won’t have spent so much time making plans that are shattered by one unseen event. It’s a beautiful thing when a software team can roll with the punches this way and plan enough in advance to have a general direction.