Persistence is a subset of immutability

Whenever I tell someone I’m working on a persistent data structure
library for Python, I almost always get the same response:
“Persistent means that something is stored across process boundaries.
I think you mean to say immutable data structures.”  In fairness, this
response is understandable.  But it’s not quite right.  I’m going to
address both of the points in the above statement.

Persistent means that something is stored across process boundaries

Did you know that letter can mean a note you send through the mail or a thing you use to make up words?  Did you know that programming can mean writing computer programs or mathematical optimization?  I’m sure if I sat down long enough, I could think of more examples.  The point is that the language we speak is full of ambiguities and whatnot.

Before we consider the computer science meaning of persistence, let’s
look at the common meaning.  If we look at the word persistent in
Merriam-Webster, the first two definitions we run into are:

1 : existing for a long or longer than usual time or continuously: as
a : retained beyond the usual period  b : continuing without change in
function or structure 

The meaning Python programmers tend to think of is the definition in
part a.  That is, a persistent data structure is stored beyond the
usual period (the process’s lifespan).  However, when I talk about a
persistent data structure, I tend to mean the definition in part b.

Going with definition b, we can see that lists for example aren’t persistent:

  1. >>> a = [1,2,3]
  2. >>> b = a
  3. >>> b.append(4)
  4. >>> a
  5. [1, 2, 3, 4]

 

Tuples, however, are a different matter:

  1. >>> a = (1,2,3)
  2. >>> b = a + (4,)
  3. >>> b
  4. (1, 2, 3, 4)
  5. >>> a
  6. (1, 2, 3)

 

Ah, but here’s where the second part of this post comes in.

I think you mean to say immutable data structures

In the case of the tuple, you are correct to say that a tuple is
immutable.  And I’m correct in saying that a tuple is persistent.
Just like I’d be right to say that I own a car while someone else
claims that I own an automobile.  However in both cases, they can’t
quite be used interchangeably.  After all, an automobile isn’t
necessarily a car.  It can be a truck or a van or an SUV.

In the same sense, a persistent data structure is effectively
immutable.  But an immutable data structure isn’t necessarily
persistent.  Let’s consider the definition of both of these words from
wikipedia:

Immutable object

In object-oriented and functional programming, an immutable object is 

an object whose state cannot be modified after it is created. 
Persistent data structure

In computing, a persistent data structure is a data structure which 

always preserves the previous version of itself when it is modified; 

such data structures are effectively immutable, as their operations

do not (visibly) update the structure in-place, but instead always

yield a new updated structure. 
These definitions have a lot of common ground, but there is some
difference between them.  Let’s consider a data structure that is
immutable but not persistent.  Prepare yourself.  This will be
complex.  Ready?

  1. >>> 1
  2. 1

 

Has your mind been blown yet?

In this sense, the number 1 is definitely immutable.  You can’t change
it.  It’s always the number 1.  However, it’s not persistent.  How do
you update the number 1?  No matter what you do, it will always be the
number 1.  Sure, you can get 3 by adding it to 2.  But in a
mathematical sense, that’s not really changing the number 1.

In this sense, the number 1 is atomic.  It simply doesn’t make sense
to modify it.  Heck, you can’t even copy it:

  1. >>> from copy import copy
  2. >>> a = 1
  3. >>> a is copy(a)
  4. True

Conclusion

With this semantic difference in mind, remember that it’s just that:
a semantic difference.  Call them Bob if you want to.  However, bear
in mind that when someone uses the word persistence in this way,
they’re not being inaccurate.

Syntax vs semantics: what’s the difference?

This is a subject that many programmers get confused about.  The difference isn’t really a difficult concept, it’s just that it’s not explained to programmers very frequently.  Let’s consider a snippet of a poem by Jeff Harrison:

know-bodies, devoted we to under-do for you
every Sunday-day of dressy morning, black pond,
sky’s germs, chairs’ ponds – prove it, stain!
us, rain-free & orphaned, we’re living laboratories
This is from Dart Mummy & The Squashed Sphinx.  If you haven’t figured it out, this text is generated by a computer.  And it uses the same techniques that a lot of spammers use.  What’s most interesting about this is that it’s grammatically correct.  For instance, “prove it, stain!” is a perfectly valid English sentence.  The problem is just that it’s meaningless.  Thus, we can say that the above poem is syntactically correct but has no meaning semantically.
Programming languages are similar in concept.  Consider the following Python snippet:
The above code is obviously incorrect.  It uses variables that haven’t been declared yet.  Therefore, it is semantically incorrect.  But it is a syntactically valid Python program.
In the same manner, we can say that these two snippets of code are identical semantically although they definitely have very different syntax:
python
def f(x):
return x + 1
rawplus1.py hosted with ❤ by GitHub
ocaml
let f x = x + 1
rawplus1.ml hosted with ❤ by GitHub
Therefore, we can define these terms like this:
 * syntax – A set of rules for specifying a program
 * semantics – The meaning behind that program

The devil’s in the details

There are two schools of thought in the programming world:

  1. Explicit is better than implicit (configuration over convention)
  2. A developer should only have to program the unconventional aspects of a program (convention over configuration).

We’ll call #1 the Python school and #2 the Ruby school. In fact, I
would argue that this is an issue that’s at the core of whether code
is considered “Pythonic” or “Rubyic” (I doubt the last one is a word).

So which school of thought is right? I personally think they both
are. It doesn’t really take a whole lot to demonstrate that the
Python school of thought isn’t always right. Think about it. Did you
know that the Python runtime has a component that goes around deleting
objects from memory totally implicitly? How unpythonic is that?

The Ruby school of thought takes a bit more work though. After all,
if it’s unconventional, why should you have to configure it? Of
course, the problem here is in defining “conventional”. What’s
conventional to me is likely unconventional to others. And what’s
conventional to others could be unconventional to me.

I wish I had more advice on how to reconcile these two schools of
thought. The truth is that I struggle with them daily. But I think
having an intuition about this is the dividing line between
“experienced programmer” and “newb”. After all, if programming were
merely about “make everything explicit” or “make everything implicit”,
any idiot could do it.

I think this is also the core skill for writing readable code. You
need to determine what details are relevant to each piece of code.
Whatever the case, you need to make a conscious decision as to what
details shine through and what details you obscure. Because if these
things happen on accident, they’re almost guaranteed to be wrong.

Pimp my Interactive Interpreter

A couple of things that bother me about Python’s interactive interpreter:

  • Having to import commonly used modules like sys.
  • Not having history stored across interpreter sessions.
  • No tab-completion.

Of course, things like iPython and bpython help, but I generally prefer just a plain old python interactive interpreter session. Plus, the above three problems are easy to solve without installing any extra packages, but the way to solve them is documented in somewhat obscure locations. The solution?

First, create a file somewhere with the following text (I save mine to ~/.pystartup):

import atexit
import os
import readline
import rlcompleter
import sys
histfile = os.path.join(os.environ["HOME"], ".pyhist")
try:
    readline.read_history_file(histfile)
except IOError:
    pass
readline.parse_and_bind('tab: complete')

atexit.register(readline.write_history_file, histfile)
del os, histfile

Then, you just need to add a line to your .bashrc, .zshrc, or whatever else your shell uses:

export PYTHONSTARTUP=~/.pystartup

…and viola! Your interactive interpreter has just been pimped.

If you’re on Windows, I’m afraid I have bad news. This probably won’t work for you without using cygwin (as you will need readline).

How Celery, Carrot, and your messaging stack work

If you’re just starting with Celery right now, you’re probably a bit confused. That’s not because celery is doing anything wrong. In fact, celery does a very good job of abstracting out the lower-level stuff so you can focus just on writing tasks. You don’t need to know very much about how any of the messaging systems you’re using will work. However, to truly understand celery, you need to know a bit about how it uses messaging and where it fits in your technology stack. This is my attempt to teach you the things you need to know about the subject to be able to make everything work.

Messaging

At the very bottom of celery’s technology stack is your messaging system, or Message Oriented Middleware in enterprise-speak. As of this writing, there are a couple of standards out there in this market:

  • AMQP – A binary protocol that focuses on performance and features.
  • STOMP – A text-based format that focuses on simplicity and ease of use.

Of course, there are a lot more players out there than just this. But these are the two protocols that are the most important to celery.

Now, a protocol is totally useless without software that actually implements it. In the case of AMQP, the most popular implementation seems to be RabbitMQ. The popular implementation of STOMP seems to be Apache ActiveMQ.

Carrot

A good analogy that I think most people can wrap their heads around is the SQL database. STOMP and AMQP are like SQL, while RabbitMQ and ActiveMQ are like Oracle and SQL Server. Any one who has had to write software that works with more than one type of database knows how challenging this can be. Sure, it’s easy to issue SQL commands directly when you just support one type of database, but what happens when you need to support another? One possible solution is to use an ORM. By abstracting out the lower-level stuff, you make your code more portable.

The first thing most ORMs do is provide an abstraction to write SQL queries. For instance, if I want to write a LIMIT query for SQL Server, I would do something like this:

SELECT TOP (10) x FROM some_table

Oracle’s query would look something like this:

SELECT x FROM some_table WHERE row_num < 10

These are different queries, but they are both doing the same basic thing. That’s why SQLAlchemy allows you to write the query like this:

select([some_table.x], limit=10)

This is the functionality that carrot provides. Although most messaging systems are fundamentally different in a lot of ways, there are certain operations that every platform has some version of. For example, sending a message in STOMP would look like this:

SEND
destination:/queue/a

hello
^@

AMQP’s version is binary, but would look something like this in text format:

basic.publish "hello" some_exchange a

Since we don’t want to worry too much about these protocols at a low level, carrot creates a Publisher class with a “send” message.

Celery

Carrot makes it so that we can forget about a lot of the lower-level stuff, but it doesn’t save us from the fact that we’re still working with a messaging protocol (albeit a higher-level one). Going back to the ORM analogy, we can see the same thing happening: we need a layer of abstraction to make dealing with different implementations of SQL easier, but we don’t want to write SQL. We want to write Python (or whatever your language of choice is).

Thus, ORMs will add another layer of abstraction. Wouldn’t it be nice if we could just treat a database row as a Python object? Or, in the case of task execution, wouldn’t it be nice if we could just treat a task as a Python function? This is where celery comes in. See, we could run tasks like this:

  1. Process A wants to run task “foo.bar”
  2. Process A puts a message in queue saying “run foo.bar”
  3. Process B sees this message and starts on it
  4. When done, Process B replies to Process A with the status.
  5. Process A acknowledges this message and uses the return result.

Rather than having to code all the details of the messaging process, celery allows us to just create a Python function “foo.bar” that will do the above for us. Thus, we can execute tasks asynchronously without requiring that people reading our code know everything about our messaging backend.

Hopefully, this gives you a high-level overview of how celery is working behind the scenes. There are a lot of details that I’ve left out, but hopefully this provides you with enough knowledge that you can figure the rest out.

Building a Common Lisp webapp using Python’s envbuilder

One of my goals in starting envbuilder was to make it not totally wedded to Python. My main use-case for envbuilder is to build Python apps, but I think it’s important to make sure envbuilder can be flexible and simple enough to get the job done. I think that the best way to do that is to try using it to build something other than Python.

Shaneal Manek’s excellent guide to setting up a simple Lisp webapp seemed like the perfect project to try this out with. I’ll start off by telling you how to try it out. The first step is to actually get envbuilder. I recommend using the version that’s in master as of this writing. Although this example can probably be made to work with envbuilder 0.2.1 (the current stable release), it is just much easier to understand if you use some of 0.3.0’s features.

So, to get envbuilder, you just need to use the following command:

   git clone git@github.com:jasonbaker/envbuilder.git

This example is already set up in examples/lisp-webapp. To get up and running, you just need to do the following commands:

  1. envb checkout # get the files you need to start
  2. export INSTALL_ROOT=`pwd` # tell sbcl to install to the current directory
  3. envb setup # create a local build of SBCL
  4. export SBCL_HOME=`pwd`/lib/sbcl # tell sbcl where it is installed
  5. envb start # start the webserver

Alright, so here is the .env file in all its glory:

  1. # It seems that there’s a bug in configobj where a section name can 
  2. # conflict in interpolation, so name this differently 
  3. sbcl_ = ‘$CWD/bin/sbcl’
  4. [project]
  5.  parcels = ‘sbcl’‘webapp’
  6.  [[sbcl]]
  7.  checkout = ‘git clone git://git.boinkor.net/sbcl’
  8.  setup = ‘cp $CWD/customize-target-features.lisp $dir’‘sh make.sh’‘sh install.sh’
  9.  [[webapp]]
  10.  dir = ‘trivial-lisp-webapp’
  11.  checkout = ‘git clone git://github.com/smanek/trivial-lisp-webapp.git’
  12.  start = ‘$sbcl_ –no-sysinit –no-userinit –load $CWD/$dir/src/init.lisp’,
  13. [commands]
  14.  [[start]]
  15.  working_dir = ‘%dir/scripts’
  16.  required = False
  17.  help = ‘Start the server’

The first thing to understand about envbuilder is the concept of a “parcel”. A parcel is simply a piece of software that’s usually checked out from version control (although it doesn’t have to be). The commands that are run when you do “envb checkout” are defined using the checkout option. Similarly, the command that is run when you do “envb setup” is defined by the setup option. You’ll notice that the webapp parcel doesn’t have a setup command. This is because we don’t really need to do anything to set it up other than checking it out from git. There is one important thing to note here though. The sbcl section uses a $CWD variable to run itself. There is a difference between $CWD and simply using a dot (“.”). The difference being that the setup command is run from inside the parcel’s directory. Using dot is relative to that directory while $CWD is the directory that envb was run from.

The last feature to note is the custom command (start). Just as with checkout and setup, the start command is to be placed in a start option in a parcel. In this case, the start command only runs on the webapp parcel. I won’t go into detail about each option on the start command as that’s covered in the envbuilder documentation.

So, the purpose of this blog post isn’t really to teach you how to set up a lisp web application with envbuilder. It’s to show you that envbuilder is much more than a tool to create a Python virtualenv. If you’d like to contribute to envbuilder, the best way to do so at present is to submit other examples of how you’ve gotten envbuilder to work in a neat or creative way. After all, the real fun in creating an extensible application is finding out that it can do things you didn’t realize it could do.

Python metaclasses in depth

Metaclasses have a reputation for being among the blackest of Python’s black magic. As it turns out, they’re actually pretty simple, but they require you to understand some concepts. There are two ways to learn metaclasses in Python:

  1. Skip the details and get straight to a useful implementation of a metaclass.
  2. Understand the core concepts behind metaclasses.

Number 1 is useful, but will only get you so far. If you really want to understand metaclasses, you need to take approach number 2.

With that in mind, I’m going to start with the basics of how classes are constructed in Python. Let’s consider the following class:

class SomeClass(object):
     x = 1
     y = 2

If you’re ready to learn about metaclasses, the above statement shouldn’t require much thought to understand. But let’s stop to think about it a bit anyway. In Python, everything is an object. Therefore, classes are also objects. But if SomeClass is an object, it must be an instance of some class. Let’s find out what that class is:

>>> type(SomeClass)
<type 'type'>

So apparently, it’s an instance of type. We just saw the most common usage of type: to query an object’s type. But have you ever read the help on that function?

>>> help(type)

Help on class type in module __builtin__:

class type(object)
 |  type(object) -> the object's type
 |  type(name, bases, dict) -> a new type
 ...

So it turns out that type is not only a builtin function, it’s also a class! How do we instantiate an instance of type? You’ve already seen the most obvious way: using a class statement. But did you know that you can create a class using type?

SomeClass = type('SomeClass', (object,), {'x' : 1, 'y' : 2})

For all intents and purposes, the above statement is equivalent to the prior definition of SomeClass. But this way is ugly and isn’t used very commonly. That said, we’ve demonstrated something: type isn’t just any class. It’s the class of classes. It’s a metaclass.

Let’s go a step further though. How does the compiler generate the dictionary that’s the third argument to type? As it turns out, classes have something in common with functions: they have local namespaces. You might have seen the locals function used like this:

def some_func():
    x = 1
    y = 2
    print locals()

If you execute this, the output is this:

>>> some_func()
{'y': 2, 'x': 1}

If classes have their own namespaces, then they must also be able to use the locals function as well:

>>> class SomeClass(object):
...     x = 1
...     y = 2
...     print locals()
...
{'y': 2, 'x': 1, '__module__': '__main__'}

The only difference here is that the namespace is generated at import time and passed into the type function.

So this is all interesting, but we haven’t really seen anything terribly useful yet. Let’s go further. We’ve seen that type is the class of classes. But if type is a class, then we must be able to subclass it. There are a number of reasons you might want to do this. For instance, you may sometimes want to attach a property to a class rather than to an instance. Let’s do just that:

class SomeMetaClass(type):
    @property
    def z(self):
        print 'In class property z'
        return self.x + self.y

>>> SomeClass = SomeMetaClass('SomeClass', (object,), {'x' : 1, 'y' : 2})
>>> SomeClass.z
In class property z
3

But we don’t want to use this same ugly notation for creating SomeClass. Python provides syntactic sugar for this. We can instead define SomeClass like this:

class SomeClass(object):
    __metaclass__ = SomeMetaClass
    x = 1
    y = 2

A more common use of metaclasses is to create a class constructor. Let’s attach z to the class directly rather than defining a property:

>>> class SomeMetaClass(type):
...         def __init__(self, name, bases, dict):
...             self.z = self.x + self.y
...
>>> class SomeClass(object):
...         __metaclass__ = SomeMetaClass
...         x = 1
...         y = 2
...
>>> SomeClass.z
3

As we can see, we’ve defined a constructor for SomeClass. Now let’s go a bit further. What if we want to change the base class of SomeClass? That can be done, but we have to use a __new__ method. I’m going to presume that you know a bit about __new__ methods. If you don’t, you might want to read up on them.

>>> class SomeMetaClass(type):
...     def __new__(cls, name, bases, cls_dict):
...         new_cls = type.__new__(cls, name, (dict,), cls_dict)
...         return new_cls
...
>>> class SomeClass(object):
...     __metaclass__ = SomeMetaClass
...     x = 1
...     y = 2
...
>>> x = SomeClass()
>>> x['foo'] = 'bar'
>>> x
{'foo': 'bar'}

That should hopefully give you an idea of what metaclasses are and how to use them. If you’re even more lost than you were before, don’t worry. This is just one of those things that requires an “aha!” moment. You might also want to check out Michael Foord’s great Meta-classes Made Easy for a different perspective.

7 tools for working with Python in Emacs (and Ubuntu)

I’ve been meaning to blog about this for some time, so I suppose now
is as good an opportunity as any. This is going to be a very “stream
of consciousness”-esque posting, so bear with me. Some of these are
things that have radically changed the way I use emacs. Others are
minor changes that I like. Feel free to pick and choose from them.

I’ll assume you’re running Ubuntu. Also, not all of these are Python
specific. But I feel that they will be useful to most Python
programmers who use emacs.

I should also note that (like most emacs users), most of these things
are tricks that I’ve picked up from various sources along the way. If
you wrote something that I put in here, thanks!

Ropemacs

Ropemacs is among the tools I love the most and hate the most. When
it works, it opens up a new world of automated refactorings for you.
But it seems to be a bit buggy at times. That said, setting it up is
really easy. Firstly, you need to have ropemacs installed. This is
pretty easy:

 
sudo apt-get install python-ropemacs 

After that, just a couple of lines of elisp in your .emacs file and
you’re good to go!

 
(require 'pymacs) 
(pymacs-load "ropemacs" "rope-") 
(setq ropemacs-enable-autoimport t) 

Anything

Anything is almost like Quicksilver for emacs. To begin, you need to
download anything.el and anything-config. I also use
anything-match-plugin. Then you just need the following lines of elisp:

 
(require 'anything-config) 
(require 'anything-match-plugin) 
(global-set-key "\C-ca" 'anything) 
(global-set-key "\C-ce" 'anything-for-files) 

Then, prepare to spend a lot less time searching for files!

Line number mode

When you’re pair programming, nothing is more helpful than being able
to direct people to a certain line of code. This lets you spend less
time saying “hey, see that over there? It’s about 3 lines up. No,
too far! go down another two lines.” Installing this is really easy.
You just need linum.el and two lines of elisp:

 
(require 'linum) 
(global-linum-mode 1) 

Flymake through pyflakes

In case you miss the automatic error highlighting of the Visual Studio
world, you should realize that emacs has a similar system built-in.
It’s called flymake. And you can make it work with Python as well. I
personally prefer to use pyflakes for this. All that’s needed is
pyflakes:

 
sudo apt-get install pyflakes 

…and some elisp:

 
(when (load "flymake" t) 
 (defun flymake-pyflakes-init () 
 (let* ((temp-file (flymake-init-create-temp-buffer-copy 
 'flymake-create-temp-inplace)) 
 (local-file (file-relative-name 
 temp-file 
 (file-name-directory buffer-file-name)))) 
 (list "pyflakes" (list local-file)))) 
 
 (add-to-list 'flymake-allowed-file-name-masks 
 '("\\.py\\'" flymake-pyflakes-init))) 

Uniquify

How many __init__.py buffers do you have open at this moment? If
you’re using emacs for Python programming, probably a lot. This is
where emacs’s uniquify functionality is useful. It gives you a more
useful name for your buffers other than just appending a number at the
end. I have mine use the reverse of the directory. For instance, if
I have foo/__init__.py and bar/__init__.py open, they will be named
__init__.py/foo and __init__.py/bar respectively.

You just need this in your .emacs:

 
(setq uniquify-buffer-name-style 'reverse) 
(setq uniquify-separator "/") 
(setq uniquify-after-kill-buffer-p t) ; rename after killing uniquified 
 
(setq uniquify-ignore-buffers-re "^\\*") ; don't muck with special 
buffers (or Gnus mail buffers) 

python-mode

To be totally honest with you, I haven’t used the built-in emacs mode
for python. I just installed python-mode because I was told it was
better. What I can tell you is that there is an occasional plugin
that requires python-mode. Installing it is easy. Just install
python-mode:

 
sudo apt-get install python-mode 

And add some elisp:

 
(autoload 'python-mode "python-mode" "Python Mode." t) 
(add-to-list 'auto-mode-alist '("\\.py\\'" . python-mode)) 
(add-to-list 'interpreter-mode-alist '("python" . python-mode)) 

Pylookup

Pylookup is useful for those moments when you find yourself asking
something like “Is join in os or os.path?” Unfortunately, the setup
can be complex, but well worth it. There are instructions here.