StackOverflow interviews CouchDB

couchdbLast year, FLOSS Weekly interviewed Jan Lehnard of the CouchDB project. I put up a blog post on this, noting that it was interesting with a scalable parallel program written in Erlang, a true concurrent language. The interview was interesting,  but not very deeply technical. Now, almost a year later, the StackOverflow podcast, number 59, interviewed the founder of the project, Damien Katz. This interview goes a bit more into the technical details and what CouchDB is good for and what not, as well as some details on the use and performance of Erlang.

An interesting point made is that the light-weight user-level threading of the virtual machine in Erlang optimizes for massively threaded performance. The key property is that the context for each thread is very small compared to an OS-level application thread (like pthreads, for example), and this means that the context switch cost is dramatically smaller thanks to less cache and TLB contents needing to be swapped in and out. Thus, for lots of threads, Erlang tends to get more work done per time unit, as there is less execution time lost to friction in the memory system. I am not sure you can emulate this in C using a user-level package. The very small initial stack and heap size of the Erlang VM is partially achieved by the very fact that in a VM, you have more insight into and control over when memory allocation happens, and thus you can more easily do stack and heap grow operations in small units.

Another interesting aspect of Erlang as opposed to C/C++ brought out in the interview is how to do error handling. In Erlang, this is part of the language, while in C/C++, writing code to handle all cases (and handle them correctly) quickly gets painful and overwhelming. Instead in Erlang, you have a system policy to kill any thread that does something bad and restart it. With that simple strategy imposed on you, the code gets much simpler.

stackoverflowlogo250hq2The podcast also brought up a StackOverflow question about CouchDB that resulted in a good explanation of the concurrency model (optimistic concurrency on entire documents, an nothing smaller or larger than that). Damien Katz came in with some more insights on transactions and CouchDB, in a discussion on how to solve the classic bank account problem: moving money from one account to another. The “ACID” solution is to make sure that changes to two accounts are always both done or none done. The CouchDB solution is to put in a record of the account-to-account money transfer (I won’t use the word “transaction” as that is overloaded in this context) in the database, and just go through all records pertaining to a particular account to arrive at its current balance. That does feel more like proper bookkeeping practice, rather than having a single unauditable  balance in an account record…

Overall, worth its time to listen to.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.