inThere will be a session on checkpointing in SystemC at the upcoming SystemC Evolution Day in München on October 18, 2017. I will be presenting it, together with some colleagues from Intel. Checkpointing is a very interesting topic in its own right, and I have written lots about it in the past – both as a technology and it applications.
I had many interesting conversations at the HiPEAC 2017 conference in Stockholm back in January 2017. One topic that came up several times was the GEM5 research simulator, and some cool tricks implemented in it in order to speed up the execution of computer architecture experiments. Later, I located some research papers explaining the “full speed ahead” technology in more detail. The mix of fast simulation using virtualization and clever tricks with cache warming is worth a blog post.
There is a new post at my Wind River blog, about how some new features in Simics 4.8 improve the collaboration power of Simics checkpoints. For the first time, Simics checkpoint can now carry a piece of history (slice of time), which also makes reverse execution and reverse debug work with checkpoints in a logical way.
Carbon Design Systems have been quite busy lately with a flurry of blog posts about various aspects of virtual prototype technology. Mostly good stuff, and I tend to agree with their push that a good approach is to mix fast timing-simplified models with RTL-derived cycle-accurate models. There are exceptions to this, in particular exploratoty architecture and design where AT-style models are needed. Recently, they posted about their new Swap ‘n’ Play technology, which is a old proven idea that has now been reimplemented using ARM fast simulators and Carbon-generated ARM processor models.
For some reason, when I think of reverse execution and debugging, the sound track that goes through my head is a UK novelty hit from 1987, “Star Trekkin” by the Firm. It contains the memorable line “we’re only going forward ’cause we can’t find reverse“. To me, that sums up the history of reverse debugging nicely. The only reason we are not all using it every day is that practical reverse debugging has not been available until quite recently. However, in the past ten years, I think we can say that software development has indeed found reverse. It took a while to get there, however. This is the first of a series of blog posts that will try to cover some of the history of reverse debugging. The text turned out to be so long that I had to break it up to make each post usefully short. Part two is about research, and part three about products.
Continue reading “Reverse History Part One”
This post features some additional notes on the topic of transporting bugs with checkpoints, which is the subject of a paper at the S4D 2010 conference.
The idea of transporting bugs with checkpoints is some ways obvious. If you have a checkpoint of a state, of course you move it. Right? However, changing how you think about reporting bugs takes time. There are also some practical issues to be resolved. The S4D paper goes into some of the aspects of making checkpointing practical.
I have a paper about “Transporting Bugs with Checkpoints” to be presented at the S4D (System, Software, SoC and Silicon Debug) conference in Southampton, UK, on September 15 and 16, 2010. The core concept presented is to leverage Simics checkpointing to capture and move a bug from the bug reporter to the responsible developer. It is a fairly simple idea, but getting it to work efficiently does require that some things are done right. See the longer Wind River blog posting about this topic for a few more details.
Continuing on my series of posts about checkpointing in virtual platforms (see previous posts Simics, Cadence, our FDL paper), I have finally found a decent description of how CoWare does things for SystemC. It is pretty much the same approach as that taken by Cadence, in that it uses full stores a complete process state to disk, and uses special callbacks to handle the connection to open files and similar local resources on a system. The approach is described in a paper called “A Checkpoint/Restore Framework for SystemC-Based Virtual Platforms”, by Stefan Kraemer and Reiner Leupers of RWTH Aachen, and Dietmar Petras, and Thomas Philipp of CoWare, published at the International Symposium on System-on-Chip, in Tampere, Finland, in October of 2009.
Part of my daily work at Virtutech is building demos. One particularly interesting and frustrating aspect of demo-building is getting good raw material. I might have an idea like “let’s show how we unravel a randomly occurring hard-to-reproduce bug using Simics“. This then turns into a hard hunt for a program with a suitable bug in it… not the Simics tooling to resolve the bug. For some reason, when I best need bugs, I have hard time getting them into my code.
I guess it is Murphy’s law — if you really set out to want a bug to show up in your code, your code will stubbornly be perfect and refuse to break. If you set out to build a perfect piece of software, it will never work…
So I was actually quite happy a few weeks ago when I started to get random freezes in a test program I wrote to show multicore scaling. It was the perfect bug! It broke some demos that I wanted to have working, but fixing the code to make the other demos work was a very instructive lesson in multicore debug that would make for a nice demo in its own right. In the end, it managed to nicely illustrate some common wisdom about multicore software. It was not a trivial problem, fortunately.
This is end of the second day of FDL 2009, and it is proving to be quite an interesting experience. The location is very bad, apart from the weather (coming from a Swedish Fall where temperatures are dropping towards 10 C, to a sunny 27 C is quite nice). But Sophia Antipolis is just a tech park with some hotels, and you cannot get anywhere interesting or civilized without a car. No shops, no restaurants except for hotels, and so sidewalks in parts.
But the conference is good enough to be worth the bodily discomforts. And I did find a nice Parcours Sportif for the morning run, as well as a nice breakfast buffet at the Mercure Hotel.
The paper will explain how we did Simics-style checkpointing in SystemC, using the GreenSocs GreenConfig mechanisms to obtain an approximation for the Simics attribute system.
I while ago I wrote a blog post on checkpointing in virtual platforms, and what it is good for. Checkpointing has been a fairly rare feature in virtual platform tools for some reason, but it seems to be picking up some implementations. In particular, I recently noticed that Cadence added it to their simulator solutions a while ago (2007 according to their blog posts). There are a two blog posts by George Frazier of Cadence (“saving boot time” and “advanced usage“) that offer some insight into what is going on.
One thing that surprises me is how rare the feature of checkpointing or snapshotting is in the land of virtual platforms, despite the obvious benefits of that feature. Indeed, checkpointing was one of the first cool things demonstrated to me when I joined Virtutech back in 2002. Today, I could not ever imagine doing without it. Not having checkpointing is like having a word processor where you only get to save once, when your document is finished, with no option of saving intermediate states.
But not everyone seems to consider this an important feature, judging from its relative rarity in the world of EDA and virtual platforms. Why is this? Let’s look at some possible explanations.