Sunday, March 16, 2008

Memory models

I've been trying to wrap my head around memory models for transactional memory. That sounds a bit esoteric, but it's a problem that a lot of people are wrestling with right now.

Here's an attempt at an explanation for non-computer scientists:

One of the key ways that computers have been made faster in recent years, and a key way they will get faster in the next decade or so, is to do many things at once ("in parallel"). Programmers, however, cannot really think very well in terms of many things going on at once; it's much easier for humans to break tasks down into sequences of steps. So, computer hardware and computer software is designed to present the illusion of things happening in some predictable order, even when they actually overlap and interleave. A "memory model" is one of the ways that this illusion is maintained --- it is a set of rules that say, in effect, the memory of the computer should appear as if storing values into memory and fetching values from memory were really individual steps that happen just when the programmer says they should happen, rather than really being broken down into a bunch of smaller steps that get mixed together in all sorts of unpredictable ways to make the program run faster.

A "strong" memory model maintains the illusion well, not only for correct programs but even for programs with bugs. A "weak" memory model may maintain the illusion only for programs that follow the rules. If a program has a bug, things can get weird. So stronger should be better, right? But no, if the memory model is too strong, it may often be necessary to really do things in the simple, understandable way, which makes the program run much, much slower. The illusion is expensive to maintain, especially for buggy programs. So most memory models are a compromise: If the program is correct and follows all the rules imposed by the model, it maintains the illusion of simple, sequential execution pretty well. If the program breaks the rules, then things may get weird, but there are limits to the weirdness. (If there are no limits to the weirdness, that's called "catch fire" semantics --- it is as if your program caught fire, and anything could happen.)

It's easy to understand a very strong memory model. It's easy to understand a very weak memory model (easy in the sense that you know exactly what it promises: nothing). It's surprisingly tough to make sense of intermediate positions and make the right trade-offs. A few years from now, it will not seem so hard. The right compromises will look obvious in retrospect, based on some elegant theory. We're in a Ptolmeic stage now, and we know we need to reach a Copernican stage with a small handful of simple rules, but it is by no means obvious what those simple rules will be.

No comments: