I find the subject of fault tolerance and resiliency in computers quite interesting. It also very interesting to look into what kinds of faults actually do happen in the real world, and what impact they have. I recently found a couple of good sources on this. First of all, a paper from Super Computing 2012 by Fiala et al, called “Detection and Correction of Silent Data Corruption for Large-Scale High-Performance Computing” (ACM Digital Library). One of its references was to a 2011 talk by Al Geist, “What is the Monster in the Closet”, which provided some more data on how common faults are.
I am using the “Webex productivity tools” at work to quickly schedule and start meetings from within Outlook. It really is a very useful piece of software for those of us that do quite a few Webex conferences each week. However, it came with one annoying side effect: little webex tabs started to appear on select application windows. In particular, on top of Skype windows.
When mobile phones first appeared, they were powered by very simple cores like the venerable ARM7 and later the ARM9. Low clock frequencies, zero microarchitectural sophistication, sufficient for the job. In recent years, as smartphones have come into their own as the most important computing device for most people, the processor performance of mobile phones have increased tremendously. Today, cutting-edge phones and tablets contain four or eight cores, running at clock frequencies well above 2 gigahertz. The performance race for most of the market (more about that in a moment) was mostly about pushing higher clock frequencies and more cores, even while microarchitecture was left comparatively simple. Mobile meant “fairly simple”, and IPC was nowhere near what you would get with a typical Intel processor for a laptop or desktop.
Today, that seems to be changing, as the Nvidia Denver core and Apple’s Cyclone core both go the route of a few fat cores rather than many thin cores.
The purpose of the free chapter is to provide a way to understand the style of the book – and hopefully lead people on to buy the whole thing to read it.
The paperback edition looks really nice, and the printed copies that I have had the honor to get have been very well made.
At the Wind River corporate blog, there is a blog post that I wrote about continuous integration and Simics. At the Elsevier Computer Science Connect blog, there is also a blog post about continuous integration and Simics that I wrote. These two texts are essentially the same, and I had the good fortune to get it posted in multiple places. The reason it is up at Elsevier is to help promote our soon-to-be-released book at about virtual platforms and simulation (and a little bit about Simics), and hopefully we will reach a larger audience with both messages: CI with Simics is a great idea, and the book is a great book to buy.
I just found and read an old text in the computer systems field, “Why Do Computers Fail and What Can Be Done About It?” , written by Jim Gray at Tandem Computers in 1985. It is a really nice overview of the issues that Tandem had encountered in their customer based, back in the early 1980s. The report is really a classic in the computer systems field, but I did not read it until now. Tandem was an early manufacturer of explicitly fault tolerant and highly reliable and available computers. In this technical report Jim Gray describes the basic principles of fault tolerance, and what kinds of faults happen in the field and that need to be tolerated.
Recently, I finally got to ride (if that is the right word) a Segway two-wheeler. Quite fun, actually. But when thinking hard about it, it really seems like a pretty pointless invention. Cool technology, fantastic control system design and programming – but still, it does not solve any real problem. As a product manager, my mind tends to view new things with an eye toward “what is the problem they are trying to solve” rather than how fun, attractive, or well-designed they are. Sometimes, good design is the point, of course. However, in this case, we are talking about a transportation device, and as such, the question is where it fits.
At the ISCA 2014 conference (the biggest event in computer architecture research), a group of researchers from Microsoft Research presented a paper on their Catapult system. The full title of the paper is “A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services“, and it is about using FPGAs to accelerate search engine queries at datacenter scale. It has 23 authors, which is probably the most I have ever seen on an interesting paper. There are many things to be learnt from and discussed about this paper, and here are my thoughts on it.
During my vacation, a blog post went up on the Wind River blog with an interview with Hyungmin Cho, a researcher at Stanford. Hyungmin has done some seriously heavy and cool work with Simics, using it together with a circuit-level simulator to investigate error resiliency in hardware devices, and how errors propagate from hardware into the software. As part of this process, he has setup an automated test system using Simics, and this system has done more than a million automated Simics runs. That is an insane number – I have been using Simics for twelve years now, and if I had used it every day for all these years, I would have had to start 10 runs per hour, every hour of the day. It shows the power of automation along with parallel runs on clusters of machines – once the setup is automated, you can pour on the volume.
This is another vacation-related post, of the kind that I do every once in a year or so. I recently came back from a family vacation to Gothenburg (Göteborg in Swedish), where I had some time to visit a few great museums dealing with history, and in particular with military history and the history of technology.
For the past six months I have not been doing much blogging at all, neither here nor on the Wind River blog. The reason is that I have been directing my writing energy into writing a text book about Simics together with Daniel Aarno at Intel. Last year, Daniel and I worked on an Intel Technology Journal issue on Simics. The ITJ issue was kind of a first step on the way to the book, collecting several articles about Simics usage at Intel and elsewhere. The book itself will be much more of a detailed description of Simics and how it works and why it works the way it works.
When Microsoft released Windows 8 in 2012, the operating system received an incredible amount of bad press. There were lots of good ideas, but also a lot of bad execution, and some pretty drastic changes to the old familiar way that personal computer desktops had worked since approximately 1995. Most people that voice an opinion about Windows 8 dislike it, whether it be on social media or in person. For some reason, I seem to be one of the few people who really like it. When I just recently got a new laptop at work and it came with old Windows 7, I was actually disappointed. Here is why.
The Mill is a new general-purpose high-performance processor design from out-of-the-box computing (http://ootbcomp.com/). They claim to beat typical high-end out-of-order (OOO) designs like the Intel Haswell generation by crazy factors, such as being 2.3x faster while using 2.3x less power compared to a Haswell. All the while costing less. Ignoring the cost aspect, the power and performance numbers are truly impressive – especially for general code. How can they do something so much better than what we have today? For general code? That requires some serious innovation. With that perspective, I ask myself where the Mill is really significantly different from what we have seen before.
Paranormality – Why we believe the impossible, written by Professor Richard Wiseman, manages to combine four stories into a single book. Wiseman is a well-known name in skeptic circles, and this book does not disappoint in the debunking department. But it also uses the investigation of paranormal phenomena as a way to explain how our brains work. And then some. It all makes for a very satisfying read.
Trust Me, I’m Lying – Confessions of a Media Manipulator by Ryan Holiday is a brilliant book about the online media landscape, and how it is driving public discourse in a very bad direction. Ryan has a very interesting background, having worked in marketing and being part of the problem he describes. In his work, he has exploited the weaknesses of the new media landscape to get stories into blogs, press, and often national television. Stories about his clients, to get them attention and ultimately business. In this book, he describes what he did, how he did it, and why we as a society have a big problem. It has changed the way I read online media, and made me a lot more critical of things I previously did not take notice of.
On the Wind River blog network, I have a short posting about network simulation with Simics. It points to the network demo video that we put up on Youtube a few weeks ago, along with some explanations of what is shown in the video. In short, we show a simple example of a network being simulated in Simics, along with some examples of what you can do with it.
Last week, my iPod Nano (6th generation) stopped working since its power button got stuck and failed to do anything to activate the machine. I rushed out, and got myself a replacement player in the form of an Apple iPod Nano 7th generation. I must admit that I have not found any alternative to an iPod paired with iTunes when it comes to a plain stand-alone audio player. After the utter disappointment that the 6th gen nano was, the 7th gen turned out to be surprisingly good and might even be almost up to the standards of the near-perfect 3rd generation.
There is a new post at my Wind River blog, featuring a recently-posted video demo of device and systems modeling with Simics. In this video demo, we show an outline of the modeling flow used with Simics 4.8, using only the Eclipse interface. It is actually quite new that we can do this much modeling from within Eclipse; recent efforts in improving the Simics user experience are starting to pay off. As part of the product design team, it feels good to see how even quite small features can really improve the usability of the product.
It is also my first blog post on the recently renovated Wind River blog network. I like the new look of the corporate blog, even if I will have to go back and adjust some older blog images to account for the change from a dark to a light background.
I recently made my first acquaintance with Windows 8, having bought a new Sony ultrabook for the family. Including a touch screen. The combination of the touch-based interface and the phone-like look of Windows 8 even on a PC has led me to think about the (unconscious) expectations that I have come to have on how systems behave and how services are accessed, from how smart phones and tablets have come to work in the past few years. In particular, where are web-based services going?
Apple just released their new iPhone 5s, where the biggest news is really the 64-bit processor core inside the new A7 SoC. Sixty four bits in a phone is a first, and it immediately raises the old question of just what 64 bits gives you. We saw this when AMD launched the Opteron and 64-bit x86 PC computing back in the early 2000’s, and in a less public market the same question was asked as 64-bit MIPS took huge chunks out of the networking processor market in the mid-2000s. It was never questioned in servers, however.
Simics can run and debug UEFI BIOSes, and that is the topic of my latest blog at Wind River. UEFI is actually pretty interesting once you get to know it, and building a good debug experience for UEFI took a bit of work. Still, it was built as just another target for the standard uniform Simics debugger, which is not the way most other UEFI and BIOS debuggers are built. I guess in that in the past, debugging a BIOS required such specialized tools that it made sense to also build a custom specialized frontend for the purpose. With a simulator as the backend, things do become simpler and more uniform, and Eclipse CDT is a actually a very good basis for a debugger for any kind of C and C++ code.
For more reading on UEFI itself, I can recommend the 2011 Intel Tech Journal on the topic.