Whenever I tell someone I’m working on a persistent data structure
library for Python, I almost always get the same response:
“Persistent means that something is stored across process boundaries.
I think you mean to say immutable data structures.” In fairness, this
response is understandable. But it’s not quite right. I’m going to
address both of the points in the above statement.
Persistent means that something is stored across process boundaries
Did you know that letter can mean a note you send through the mail or a thing you use to make up words? Did you know that programming can mean writing computer programs or mathematical optimization? I’m sure if I sat down long enough, I could think of more examples. The point is that the language we speak is full of ambiguities and whatnot.
Before we consider the computer science meaning of persistence, let’s
look at the common meaning. If we look at the word persistent in
Merriam-Webster, the first two definitions we run into are:
1 : existing for a long or longer than usual time or continuously: as
a : retained beyond the usual period b : continuing without change in
function or structure
The meaning Python programmers tend to think of is the definition in
part a. That is, a persistent data structure is stored beyond the
usual period (the process’s lifespan). However, when I talk about a
persistent data structure, I tend to mean the definition in part b.
Going with definition b, we can see that lists for example aren’t persistent:
- >>> a = [1,2,3]
- >>> b = a
- >>> b.append(4)
- >>> a
- [1, 2, 3, 4]
Tuples, however, are a different matter:
- >>> a = (1,2,3)
- >>> b = a + (4,)
- >>> b
- (1, 2, 3, 4)
- >>> a
- (1, 2, 3)
Ah, but here’s where the second part of this post comes in.
I think you mean to say immutable data structures
In the case of the tuple, you are correct to say that a tuple is
immutable. And I’m correct in saying that a tuple is persistent.
Just like I’d be right to say that I own a car while someone else
claims that I own an automobile. However in both cases, they can’t
quite be used interchangeably. After all, an automobile isn’t
necessarily a car. It can be a truck or a van or an SUV.
In the same sense, a persistent data structure is effectively
immutable. But an immutable data structure isn’t necessarily
persistent. Let’s consider the definition of both of these words from
In object-oriented and functional programming, an immutable object is
an object whose state cannot be modified after it is created.
Persistent data structure
In computing, a persistent data structure is a data structure which
always preserves the previous version of itself when it is modified;
such data structures are effectively immutable, as their operations
do not (visibly) update the structure in-place, but instead always
yield a new updated structure.
These definitions have a lot of common ground, but there is some
difference between them. Let’s consider a data structure that is
immutable but not persistent. Prepare yourself. This will be
- >>> 1
Has your mind been blown yet?
In this sense, the number 1 is definitely immutable. You can’t change
it. It’s always the number 1. However, it’s not persistent. How do
you update the number 1? No matter what you do, it will always be the
number 1. Sure, you can get 3 by adding it to 2. But in a
mathematical sense, that’s not really changing the number 1.
In this sense, the number 1 is atomic. It simply doesn’t make sense
to modify it. Heck, you can’t even copy it:
- >>> from copy import copy
- >>> a = 1
- >>> a is copy(a)
With this semantic difference in mind, remember that it’s just that:
a semantic difference. Call them Bob if you want to. However, bear
in mind that when someone uses the word persistence in this way,
they’re not being inaccurate.