I was recently pointed to a 2011 SPLASH presentation by David Ungar, an IBM researcher working on parallel programming for manycore systems. In particular, in a project called Renaissance, run together with the Vrije Universiteit Brussels in Belgium (VUB) and Portland State University in the US. The title of the presentation is “Everything You Know (about Parallel Programming) Is Wrong! A Wild Screed about the Future“, and it has provoked some discussion among people I know about just how wrong is wrong.
When reading the slides I really wish I had been there, this must have been a blast live. Still, there is an interview video posted online taken at the event (it failed to play properly for me).
Two things stand out in the talk:
- The poetic notion that a 1000-core processor is like discovering a new space. It is something new, not just more of the same.
- Synchronization is bad.
I think this second point is why the talk is titled “everything you know is wrong”. When teaching parallel programming, synchronization is a core part of the subject. When designing parallel languages and systems, the hardest problem is how to handle synchronization (or hide it so that it does not have to be explicitly handled).
Instead of finding better ways to synchronize, David Ungar makes the point that we need to find ways to eliminate synchronization altogether. Any synchronization will sooner or later become a bottleneck. I see where he is coming from. A talk I heard in 2009 made much the same point:
They also had the opportunity to test their solution [for parallel Erlang] on a Tilera 64-core machines. This mercilessly exposed any scalability limitations in their system, and proved the conventional wisdom that going beyond 10+ cores is quite different from scaling from 1 to 8… The two key lessons they learned was that no shared lock goes unpunished, and data has to be distributed as well as code.
In the same vein, it seems that all big system parallelization efforts turn into a hunt for locks and the splitting up of code and data into units that can run in parallel without having to synchronize. The “upper limit” of this process is clearly a system with no synchronization points at all.
He makes the valid point that even “lock-free” algorithms depend on synchronization, albeit a simple form of it. They do want a single atomic instruction – and that means that the hardware has to do some kind of synchronization to guarantee this atomicity across cores sharing memory (at least at the hardware level). I had not really considered that before, but it is true.
I would add that message-passing shared-nothing systems like Erlang are not exempt either – sending a message is a synchronizing operation, and it is quite possible to build synchronized systems in Erlang that do not scale either. It is all about communication vs computation, and making sure each thread can do as much work as possible without having to stop and wait for other threads.
Dropping synchronization as a concept makes the system non-deterministic, and that is the part that makes this hard to do. Most programmers do not like the idea that their programs do not have defined results, but there is a fundamental conflict here. Synchronized or high-performance, choose one. I have been working alongside very smart programmers for a long time trying to build parallel simulation systems. As long as there is a requirement for determinism and precise repeatability, performance from parallelism is very hard to obtain. Relax determinism, and performance is much easier to achieve. But maybe the entire value of the program is lost in the process?
I am reminded of an old Dilbert cartoon, where Wally asks the boss “Is it OK to do things wrong if we are really really fast?” Maybe it is, in the world of a thousand cores…
Ungar’s core idea seems to be to move towards a more approximate or statistical idea of program correctness. That makes sense, and is applicable in quite a few domains.
In a case like web searching, is there really a “right” result? For a human, all you really want is a useful result. Behind this is the idea of eventually consistent databases: it does not hurt if data is not quite the same in all places in a system, as long as updated data eventually reaches all parts of it. Better to available and fast and produce a useful result, than to be perfectly right but only be able to serve a portion of the user base.
In “soft” real-time systems we accept that we will occasionally miss a deadline, drop a phone call, have a bump in the flight of airliner, skip a frame in a movie playback, etc – as long as the results are not fatal but just an inconvenience. We rely on humans being forgiving, adaptable, and able to cope with the occasional problem. Traditionally, computers are not like that, in a binary a world a single error tends to be absolute.
Obviously, there are systems that just have to be right – I would not like control systems for aircraft to occasionally hiccup, nor my bank to play roulette with my money on a massively parallel machine… and the same goes for basic data structures. David Ungar brings up the example of a parallel hash table, and compares a few different implementations that all actually strive to give a correct results. The actual order of things in lists would seem to be different depending on how races play out, but with checking and patching making sure that the results are all correct (containing all data in the right buckets with no duplicates). That is familiar territory in parallel programming – correct final result, different way to get there (in terms of the exact order in which threads insert values into the table).
The techniques used in the hash table example remind me of the old idea of optimistic concurrency control – make a change and then check if it was invalidated, rather than checking before you make the change. This is benchmarked against compare-and-swap atomic instructions, and the performance trade-off is not clear. Even very slight synchronization will hurt performance, even though patching things up after a mistake has been done should use more cycles for retries and redoing work.
Overall, I think the core point David Ungar really makes is our thinking about right or wrong needs to go from a binary concept to shades-of-grey. In the future, programmers will not create programs that are fast but wrong, but rather programs which are sufficiently correct and still fast.
How to debug such a thing is still a bit of mystery though…