How Celery, Carrot, and your messaging stack work

If you’re just starting with Celery right now, you’re probably a bit confused. That’s not because celery is doing anything wrong. In fact, celery does a very good job of abstracting out the lower-level stuff so you can focus just on writing tasks. You don’t need to know very much about how any of the messaging systems you’re using will work. However, to truly understand celery, you need to know a bit about how it uses messaging and where it fits in your technology stack. This is my attempt to teach you the things you need to know about the subject to be able to make everything work.

Messaging

At the very bottom of celery’s technology stack is your messaging system, or Message Oriented Middleware in enterprise-speak. As of this writing, there are a couple of standards out there in this market:

  • AMQP – A binary protocol that focuses on performance and features.
  • STOMP – A text-based format that focuses on simplicity and ease of use.

Of course, there are a lot more players out there than just this. But these are the two protocols that are the most important to celery.

Now, a protocol is totally useless without software that actually implements it. In the case of AMQP, the most popular implementation seems to be RabbitMQ. The popular implementation of STOMP seems to be Apache ActiveMQ.

Carrot

A good analogy that I think most people can wrap their heads around is the SQL database. STOMP and AMQP are like SQL, while RabbitMQ and ActiveMQ are like Oracle and SQL Server. Any one who has had to write software that works with more than one type of database knows how challenging this can be. Sure, it’s easy to issue SQL commands directly when you just support one type of database, but what happens when you need to support another? One possible solution is to use an ORM. By abstracting out the lower-level stuff, you make your code more portable.

The first thing most ORMs do is provide an abstraction to write SQL queries. For instance, if I want to write a LIMIT query for SQL Server, I would do something like this:

SELECT TOP (10) x FROM some_table

Oracle’s query would look something like this:

SELECT x FROM some_table WHERE row_num < 10

These are different queries, but they are both doing the same basic thing. That’s why SQLAlchemy allows you to write the query like this:

select([some_table.x], limit=10)

This is the functionality that carrot provides. Although most messaging systems are fundamentally different in a lot of ways, there are certain operations that every platform has some version of. For example, sending a message in STOMP would look like this:

SEND
destination:/queue/a

hello
^@

AMQP’s version is binary, but would look something like this in text format:

basic.publish "hello" some_exchange a

Since we don’t want to worry too much about these protocols at a low level, carrot creates a Publisher class with a “send” message.

Celery

Carrot makes it so that we can forget about a lot of the lower-level stuff, but it doesn’t save us from the fact that we’re still working with a messaging protocol (albeit a higher-level one). Going back to the ORM analogy, we can see the same thing happening: we need a layer of abstraction to make dealing with different implementations of SQL easier, but we don’t want to write SQL. We want to write Python (or whatever your language of choice is).

Thus, ORMs will add another layer of abstraction. Wouldn’t it be nice if we could just treat a database row as a Python object? Or, in the case of task execution, wouldn’t it be nice if we could just treat a task as a Python function? This is where celery comes in. See, we could run tasks like this:

  1. Process A wants to run task “foo.bar”
  2. Process A puts a message in queue saying “run foo.bar”
  3. Process B sees this message and starts on it
  4. When done, Process B replies to Process A with the status.
  5. Process A acknowledges this message and uses the return result.

Rather than having to code all the details of the messaging process, celery allows us to just create a Python function “foo.bar” that will do the above for us. Thus, we can execute tasks asynchronously without requiring that people reading our code know everything about our messaging backend.

Hopefully, this gives you a high-level overview of how celery is working behind the scenes. There are a lot of details that I’ve left out, but hopefully this provides you with enough knowledge that you can figure the rest out.