I just read the EETimes coverage of the recently concluded court case in the US, where Toyota settled for 3 million USD in damages due to experts finding that the software in a 2005 Camry L4 could indeed cause “unintended acceleration”. In the particular case that was concluded, the accident resulting from the issue caused one driver to be injured and one driver to get killed. This feels like it could be the beginning of something really good, or just as well this could go really wrong.
Debugging – the 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems by David Agans was published in 2002, based on several decades of practical experience in debugging embedded systems. Compared to the other debugging book I read this Summer, Debugging is much more a book for the active professional currently working on embedded products. It is more of a guidebook for the practitioner than a textbook for students that need to learn the basics.
This blog post is a review of the book “If I Only Changed the Software, why is the Phone on Fire“, (see more information on Amazon, for example), by Lisa Simone. The book was released in 2007, on the Elsevier Newnes imprint. It is a book about debugging embedded systems, written in a murder-mystery style with a back story about the dynamics of an embedded development team. It sounds strange, but it works well.
Logging as as debug method is not new, and I have been writing about it to and from over the past few years myself. At the S4D conference, tracing and logging keeps coming up as a topic (see my reports from 2009, 2010 and 2012 ). I recently found an interesting piece on logging from the IT world in the ACM Queue (“Advances and Challenges in Log Analysis“, Adam Oliner, ACM Queue December 2011). Here, I want to address some of my current thoughts on the topic.
Last week, I attended my fourth System, Software, SoC and Silicon Degug conference (S4D) in a row. I think the silicon part is getting less attention these days, most of the papers were on how to debug software. Often with the help of hardware, and with an angle to how software runs in SoCs and systems. I presented a paper reviewing the technology and history of reverse debugging, which went down pretty well.
The 2012 edition of the SiCS Multicore Day was fun, like they have always been in the past. I missed it in 2010 and 2011, but could make it back this year. It was interesting to see that the points where keynote speakers disagreed was similar to previous years, albeit with some new twists. There was also a trend in architecture, moving crypto operations into the core processor ISA, that indicates another angle on the hardware accelerator space.
I am going to be talking about how to transport bugs with virtual platform checkpoints, in the Software Tools track at the Embedded Conference Scandinavia, on October 3, 2012, in Stockholm (Sweden). The ECS is a nice event, and there are several tracks to choose from both on October 2 and October 3. In addition to the tracks, Jan Bosch from Chalmers is going to present a keynote that I am sure will be very entertaining (see my notes from a presentation he did in Göteborg last year).
We just uploaded a short movie about reverse execution and reverse debugging to Youtube, to the Wind River official channel. In the short time available in this demo, we really only show reverse execution. Reverse debug, as I define it, is not used much at all, as explaining what goes on when you start to put breakpoints into a program and analyze its behavior takes a surprising amount of time.
Once upon a time, all programming was bare metal programming. You coded to the processor core, you took care of memory, and no operating system got in your way. Over time, as computer programmers, users, and designers got more sophisticated and as more clock cycles and memory bytes became available, more and more layers were added between the programmer and the computer. However, I have recently spotted what might seem like a trend away from ever-thicker software stacks, in the interest of performance and, in particular, latency.
Every once in a while I need to build demo setups to show debugging in action. As I have blogged before, finding a good bug when you need one isn’t always easy. The solution is to try to invent artificial bugs, and I was very happy when I managed to stage a buffer overrun in a VxWorks program.
It is pretty very nice demo in which you first start a period program A, which prints the value of an incrementing counter every target second. You then run a supposedly unrelated program B, resulting in the values that program A prints to become corrupted. Perfect to show off reverse execution and data breakpoints in reverse as you go from the point where the corrupted value is printed to the piece of code that overwrote the variable.
But then I ported the demo to a new platform… and the bug didn’t work anymore. My bug had caught a bug and was now not working, or at least not they way I expected it to. What had happened?
Continue reading “My Bug Doesn’t Work!”
Software is Concrete. Once poured it becomes extremely difficult and very expensive to change.
It comes from a blog post by Robert Howe, CEO of Verum, a company selling formal-methods-based and model-based programming tools. It does capture something of the phenomenon we all know: that software can be pretty darn hard to change, once it has shipped and is in use. It fits well with the fact that the later bugs are found, the more expensive they are to fix.
But it also provoked quite a bit of opposition when I put the quote up on Facebook, and I have to agree that maybe not all is as simple as that blog makes it out to be.
There is a new post at my Wind River blog, about how Simics was used to kick-start the development of the 64-bit version of VxWorks. It is an interesting example of how to use a virtual platform as a model of something much simpler and gentler than actual hardware systems.
There is a new post at my Wind River blog, about the testing on an integrated software stack in simulation. I base the discussion on the very interesting report about the Toyota “unintended acceleration” problems and the deep investigation into the control software of the affected vehicles performed by a NASA team (!). The report covers a lot of different tools, but also notes that about the only thing not done was to integrate the complete software stack in simulation.
There is a new post at my Wind River blog, about warnings in virtual platforms. It is an art to add good warnings to virtual platform models, and just being correct visavi the hardware behavior is not necessarily that helpful for a software developer. A virtual platform should warn about suspicious operations, even if they are technically “correct”.
I also have to apologize for the slow blogging in January of 2011. There was too much going on at work and quite a few days taking care of sick kids. Hopefully, the pace can improve going forward.
There is a new post at my Wind River blog, about iterative hardware-software interface design. It is a discussion with some examples of why hardware designers would do well to use virtual platforms to include software designers in the loop when designing new devices and their programming interfaces.
I have a fairly lengthy new blog post at my Wind River blog. This time, I interview Tennessee Carmel-Veilleux, a Canadian MSc student who have done some very smart things with Simics. His research is in IMA, Integrated Modular Avionics, and how to make that work on multicore.
There is a very interesting worm going around the world right now which is specifically targeting industrial control systems. According to Business Week, the worm is targeting a Siemens plant control system, probably with the intent to steal production secrets and maybe even information useful to create counterfeit products. This is the first instance I have seen of malware targeting the area of embedded systems. However, the actual systems targeted are not really embedded systems, but rather regular PCs running supervision and control software.
I recently started using a new mobile phone, a Blackberry Bold 9700. I am a bit ambivalent on some of its design features, but it is certainly a very different device from the much more friendly SonyEricsson I had before. Like anybody would do, I have been playing around with it to see what it can do and what not (notable things not working: the “AppWorld” application store is not available in Sweden, YouTube videos do not play in any way that I can figure out).
And almost inevitably, as you play around with a complex modern piece of software (which is what most of the phone is, after all), you find some obvious things which are just plain broken. You wonder, “why didn’t they think of this”, and “how could this ever escape testing?” My current best example is that the built-in web browser does not render the pages from Blackberry’s own support knowledgebase.
I have a new blog post up at the Wind River blog network, about the new target analysis tools in Simics 4.4. It is a very fun piece of technology to play with, and you learn a lot just by poking around at existing software systems…
In his most recent Embedded Bridge Newsletter, Gary Stringham describes a solution to a common read-modify-write race-condition hazard on device registers accessed by multiple software units in parallel. Some of the solutions are really neat!
I have seen the “write 1 clears” solution before in real hardware, but I was not aware of the other two variants. The idea of having a “write mask” in one half of a 32-bit word is really clever.
However, this got me thinking about what the fundamental issue here really is.
An embedded researcher friend of mine has posted some data on code sizes from various compilers at http://embed.cs.utah.edu/embarrassing/. The “embarrassing” bit is the idea that compiler writes should be ashamed when other compilers do better than theirs. It is worth looking over the data, even though the methodology and benchmarks are not yet perfect by any means.
Part of my daily work at Virtutech is building demos. One particularly interesting and frustrating aspect of demo-building is getting good raw material. I might have an idea like “let’s show how we unravel a randomly occurring hard-to-reproduce bug using Simics“. This then turns into a hard hunt for a program with a suitable bug in it… not the Simics tooling to resolve the bug. For some reason, when I best need bugs, I have hard time getting them into my code.
I guess it is Murphy’s law — if you really set out to want a bug to show up in your code, your code will stubbornly be perfect and refuse to break. If you set out to build a perfect piece of software, it will never work…
So I was actually quite happy a few weeks ago when I started to get random freezes in a test program I wrote to show multicore scaling. It was the perfect bug! It broke some demos that I wanted to have working, but fixing the code to make the other demos work was a very instructive lesson in multicore debug that would make for a nice demo in its own right. In the end, it managed to nicely illustrate some common wisdom about multicore software. It was not a trivial problem, fortunately.
Andras Vajda of Ericsson wrote an interesting blog post on domain-specific languages (DSLs). Thanks for some success stories and support in what sometimes feels like an uphill battle trying to make people accept that DSLs are a large part of the future of programming. In particular for parallel computing, as they let you hide the complexities of parallel programming.
Past Tuesday, I attended the Freescale Design With Freescale (DWF) one-day technology event in Kista, Stockholm. This is a small-scale version of the big Freescale Technology Forum, and featured four tracks of talks running from the morning into the afternoon. All very technical, aimed at designing engineers.
I have written several times on this blog about the odd propensity of the “EDA” business to consider the C and C++ languages “high level” languages. They are what I use almost daily for most of the demo-order programming I do, but I still don’t consider them very high-level. High-level for me is scripting (Python, Lua, …) or domain-specific languages (DML, Lex, Yacc, MatLab, …) or model-driven development (UML, LabView, Simulink, …) or languages which at least provide sensible and reasonably safe semantics (Erlang, Java, …).
However, in fact, most the embedded industry and the “virtual platform” industry rely on C and C++ to get our daily jobs done. Question is, how much longer can we expect to do that? An interesting post at Embedded.com by Michael Barr brought back my argument that modeling needs to move up in levels of abstraction just like mainstream programming.
Freescale has now released the collected, updated, and restyled book version of the article series on embedded multicore that I wrote last year together with Patrik Strömblad of Enea, and Jonas Svennebring, and John Logan of Freescale. The book covers the basics of multicore software and hardware, as well as operating systems issues and virtual platforms. Obviously, the virtual platform part was my contribution.