As might be evident from this blog, I do have a certain interest in history and the history of computing in particular. One aspect where computing and history collide in a not-so-nice way today is in the archiving of digital data for the long term. I just read an article at Forskning och Framsteg where they discuss some of the issues that use of digital computer systems and digital non-physical documents have on the long-term archival of our intellectual world of today. Basically, digital archives tend to rot in a variety of ways. I think virtual platform technology could play a role in preserving our digital heritage for the future.
Living in a decently old city does provoke a certain interest in history and how to access the past. Note that Uppsala, like everything in Sweden, is fairly recent on the grand scale of things. We have a cathedral from the 1200s, and some older churches going back to the 1000s, but before that there is not much to boast about. It is not like Rome or Egypt or China with thousands of years of civilization and with impressive buildings that still stand. But I digress…
The real problem with archives and digital technology is that it is really hard to preserve a large portion of the intellectual data that we produce today. When I was an archivist at Smålands Nation here at the university in the 1990s, it was real fun to browse through the old papers we had in the archive (only stuff from the past century, all the really old and valuable stuff was in professional hands at the national archives). Reading a menu from a party held in 1945, or the time plan for a large formal celebration in the 1950s, or a guest list from such an event was actually quite intruiging. You could see how tastes and traditions change, even as students all believe they are maintaining ages-old traditions faithfully… Browsing the member list in old handwritten ledgers was also great fun.
In another fifty year’s time, what will we have left from today’s activities? The party plans are now all just printouts from a throwaway word document, the member list a database on a PC, and unless you make a conscious effort, memories and printed papers will quickly fade (you do need to use the appropriate laser toner on low-acid archive-quality paper if you want the things to last even a decade, let alone a hundred years) and the students fifty years from now will have no idea on we ran our parties, who were in the nation, and other such facts of daily life. “Print it out” is easy to say but hard to do. Much of today’s digital data does not really print out in a useful way. A wiki, for example, can be printed as a snapshot. But the links and the edit history is gone.
But imagine if we could retain these digital documents accessible, in their living digital form. Having a proper database available for use and query is so much more interesting and powerful than a stack of papers. Being able to look at the audit history of a wiki, or the commit history of a program’s source code would offer a much deeper insight into the people and processes that produced the end results compared to just looking at a static snapshot.
What does it take to do that? A huge and known problem in digital archiving is the physical media access. All great archival institutions in the world have scores of different tape readers, disk readers, some new, some antique, and try to use these to pry the bits of equally antiquated storage media. This is bad, just like we have problems looking at old movies as the reams of film are literally falling apart. Clay tablets start to look positively sophisticated and a smart choice compared to our very brittle technologies.
Second, once we have the bits, what can we do with them? Now we have to tackle the file formats, and having pried them open, to find the fonts and character sets which are suitable for displaying them. Rendering a MacWrite document from 1985 on a modern machine will likely not quite give the same wonderful black-and-white 72-dpi view that you got back then… Or figuring out that for a while, we used 7-bit ASCII to write Swedish texts by replacing [, ], |, {, }, \ with å, ä, ö, Å, Ä, Ö (not in exactly that order — but I used to be fluent in reading and writing that!). Even worse, decoding a database file or purely binary CAD/CAM file is going to require some serious reverse engineering of the old programs that created them… unless you could run these same old programs in some way.
At least to me, an “obvious” solution is to use virtual platform technology to make keep the old software stacks running. Some archivists are apparently working on OS emulation of various kinds to keep the old software running, but I do think that it is much simpler and more general to just simulate the entire machine hardware. This will result in the smallest risk of error, and the greatest fidelity to the original software as oddities such as word lengths and character sets can be faithfully reproduced. Simply because the SW-HW interface in a machine is the best document and narrowest interface in the entire machine. Old tapes and disks should be turned into files stored on the mass storage of current machines, and these files can then be migrated to new machines as they come online. As long as you maintain the copies of the materials by reproducing them on new hard drives, they will not rot.
It will also have the property that the entire look and feel of the old systems remains accessible. Which is a blessing in that it maintains our digital heritage and lets computer historians also go back and see how old software looked and worked. It is also considered a curse by practising archivists who do not particularly like the idea of learning how to operate strange old operating systems with arcane command lines. I think this is not necessarily a big problem. Just like people today study and learn ancient languages to learn more about ancient cultures as embodied in their written records, I can see historians of the future learning to use old operating systems and programming languages to study the ancient culture of our time as embodied in our computerized records.
Creating these kinds of virtual platforms is quite different from the dominant virtual platform thinking in the commercial market place today which is focused on modeling new and future hardwares. Rather, we need to study an existing artefact and make a very good model of it. It is a different type of task, requiring slightly different types of tools.
Of particular importance is that the simulation platform totally insulates simulation models from the underlying machine. You need to be able to port the simulation to new operating systems, machines, and architectures without changing a line in the source code of the models. This means making sure to locate all host dependencies inside the simulation framework, and making sure that models are completely portable regardless of word lengths, endianness, and other aspects. You want to be able to take a model that you run today on an x86-64 on Linux and run it on some future 128-bit middle-endian platform with a completely different type of operating system. Basically, the simulation platform has to be a complete virtual machine in its own right. Probably, over time, we are going to add more layers of virtual platforms to get around really major shifts in computer system architecture. There is nothing saying that you can only virtualize once, already the IBM S/370 showed that you could virtualize recursively and to an arbitrary depth.
I do think that some tools that we have today are perfectly useful for this kind of undertaking.
The business aspects are going to be interesting, though. We would like to use the technologies already available in the sophisticated commercial virtual platform tools, that’s kind of given. But the archival market place is not the most lucractive… and that they would like to have very good insurance that the tools remain available. Including source-code escrow that would make the requirements of 25-years military projects look positively light.
Developing a completely new totally open-source solution sounds like a nice idea in theory, but is probably a bit too much work. It also takes no advantage of all the commercial technology available today. Maybe as the virtual platform market matures, the industry can come up with some way to provide this technology for the greater benefit of society. Basically, doing a bit of social work where we really have the tools to help.
I have no good solutions to that to offer right now.
But imagine how cool it would be to, in fifty years from now, gather the old team from Virtutech around some kind of display device and watch an ancient Simics 4.0 run on top of a virtual quadcore x86 with Linux 2.6… all running on top of Simics 29.0 (estimating one major version every two years) on some future I-have-no-idea-what-it-will-be hardware and software system. That really is something that would be cool to see happen.
Jakob:
Maybe Virtutech ought to partner with the Computer History Museum in San Francisco ( http://www.computerhistory.org ). They are frequently working on physical restoration projects of old computers. They often also build simulators of these systems to support developing and rescuing applications to run on the hardware once the restoration is complete.
If you are interested send me an email and I can connect you with a key volunteer.
Here’s an example project where they are restoring an IBM 1620:
http://www.computerhistory.org/projects/ibm_1620/Journal/
Thanks for the comment — it was marked as very probable spam by my filters, so I did not spot it in the queue until today. Sorry for the delay in posting it through. I have to use a filter like that, getting hundreds of spam comments every day from bots otherwise. No fun world we live in, see http://jakob.engbloms.se/archives/72 for what the situation was prior to filtering.
The computer history museum is an interesting place. Visited it last year when I had some spare days around the ESC. Fascinating stuff. You are right, a cooperation with them could be an interesting way to test such ideas.
Another good read on just how hard it is to accurate emulate an old machine where users did some serious low-level programming is found at http://arstechnica.com/gaming/news/2011/08/accuracy-takes-power-one-mans-3ghz-quest-to-build-a-perfect-snes-emulator.ars (about how to emulate the Super Nintendo Entertainment System).