When IBM moved their mainframe systems (the S/360 family that is today called System Z) from BiCMOS to mainstream CMOS in 1994, the net result was a severe loss in clock frequency and thus single-processor performance. Still, the move had to be done, since CMOS would scale much better into the future. As a result, IBM introduced additional parallelism to the system in order to maintain performance parity. Parallelism as a patch, essentially.
Carbon Design Systems have been quite busy lately with a flurry of blog posts about various aspects of virtual prototype technology. Mostly good stuff, and I tend to agree with their push that a good approach is to mix fast timing-simplified models with RTL-derived cycle-accurate models. There are exceptions to this, in particular exploratoty architecture and design where AT-style models are needed. Recently, they posted about their new Swap ‘n’ Play technology, which is a old proven idea that has now been reimplemented using ARM fast simulators and Carbon-generated ARM processor models.
Once upon a time, all programming was bare metal programming. You coded to the processor core, you took care of memory, and no operating system got in your way. Over time, as computer programmers, users, and designers got more sophisticated and as more clock cycles and memory bytes became available, more and more layers were added between the programmer and the computer. However, I have recently spotted what might seem like a trend away from ever-thicker software stacks, in the interest of performance and, in particular, latency.
I was recently pointed to a 2011 SPLASH presentation by David Ungar, an IBM researcher working on parallel programming for manycore systems. In particular, in a project called Renaissance, run together with the Vrije Universiteit Brussels in Belgium (VUB) and Portland State University in the US. The title of the presentation is “Everything You Know (about Parallel Programming) Is Wrong! A Wild Screed about the Future“, and it has provoked some discussion among people I know about just how wrong is wrong.
Fault Injection is a topic that has fascinated me for a long time. Not just the area of software-to-software fault injection, but more so how you inject faults into hardware using hardware (and how to conveniently approximate this using a simulator). I just stumbled on a short interesting note about such hardware-actuated fault injection in a Fujitsu article.
I just read a quite interesting article by Christian Pinto et al, “GPGPU-Accelerated Parallel and Fast Simulation of Thousand-core Platforms“, published at the CCGRID 2011 conference. It discusses some work in using a GPGPU to run simulations of massively parallel computers, using the parallelism of the GPU to speed the simulation. Intriguing concept, but the execution is not without its flaws and it is unclear at least from the paper just how well this generalizes, scales, or compares to parallel simulation on a general-purpose multicore machine.
Nvidia recently announced that their already-known “Kal-El” quad-core ARM Cortex-A9 SoC actually contains five processor cores, not just four as a “normal” quad-core would. They call the architecture “Variable SMP”, and it is a pretty smart design. The one where you think, “I should have thought of that”, which is the best sign of something truly good.
From what little I had heard and read, the IBM AS/400 (later known as iSeries, and now known as simply IBM i) sounded like a fascinating system. I knew that it had a rich OS stack that contained most of the services a program needs, and a JVM-style byte code format for applications that let it change from custom processors to Power Architecture without impacting users at all. It was supposedly business-critical and IBM-quality rock solid. But that was about it.
So when Software Engineering Radio episode 177 interviewed the i chief architect Steve Will, I was hooked. It turned out that IBM i was cooler than I imagined. Here are my notes on why I think that IBM i is one of the most interesting systems out there in real use.
Episodes 299 and 301 of the SecurityNow podcast deal with the problem of how to get randomness out of a computer. As usual, Steve Gibson does a good job of explaining things, but I felt that there was some more that needed to be said about computers and randomness, as well as the related ideas of predictability, observability, repeatability, and determinism. I have worked and wrangled with these concepts for almost 15 years now, from my research into timing prediction for embedded processors to my current work with the repeatable and reversible Simics simulator.
I previously blogged about the HAVEGE algorithm that is billed as extracting randomness from microarchitectural variations in modern processors. Since it was supposed to rely on hardware timing variations, I wondered what would happen if I ran it on Simics that does not model the processor pipeline, caches, and branch predictor. Wouldn’t that make the randomness of HAVEGE go away?
When I was working on my PhD in WCET – Worst-Case Execution Time analysis – our goal was to utterly precisely predict the precise number of cycles that a processor would take to execute a certain piece of code. We and other groups designed analyses for caches, pipelines, even branch predictors, and ways to take into account information about program flow and variable values.
The complexity of modern processors – even a decade ago – was such that predictability was very difficult to achieve in practice. We used to joke that a complex enough processor would be like a random number generator.
Funnily enough, it turns out that someone has been using processors just like that. Guess that proves the point, in some way.
Endianness is a topic in computer architecture that can give anyone a headache trying to understand exactly what is happening and why. In the field of computer simulation, it is a pervasive problem that takes some thinking to solve in an efficient, composable, and portable way.
This blog post describes how I am used to working with endianness in virtual platforms, and why this approach makes sense to me. There are other ways of dealing with endianness, with different trade-offs and overriding goals.
Past Friday, I posted a new blog post in my Wind River blog. It is an interview the PhD student Girish Venkatasubramanian from the University of Florida. He is doing research on virtual machines/hypervisors and how they can be implemented more efficiently by making fairly small changes to the architecture of memory management units.
The recent news that Microsoft has taken out an ARM architectural license has caused a lot of speculation about just what this might mean. There are several quite well reasoned ideas around the web, and I have one idea of my own: sixty-four bits.
I have just found what almost has to be the first cycle-accurate computer simulator in history. According to the article “Stretch-ing is Great Exercise — It Gets You in Shape to Win” by Frederick Brooks (the man behind the Mythical Man-Month) in the January-March 2010 issue of IEEE Annals of the History of Computing, IBM created a simulator of the pipeline for the IBM 7030 “Stretch” computer developed from 1956 to 1961 (photo from IBM.com).
For some reason (I guess it is the job…) I was browsing through the Power ISA version 2.06 specification last week and hit the following gem of an instruction: “rvwinkle“. It is named after a short story I had never heard about, but which apparently is sufficiently well-known in the US literary canon to warrant a sleep mode being named after it.
Continue reading “Power Architecture Rip Van Winkle”
I just spotted a fun little application on Freescale’s homepage: an interactive demo of the fault tolerance functions of the MPC564XL dual-core microcontroller.
My post on SiCS multicore, as well as the SiCS multicore day itself, put a renewed spotlight on the GPGPU phenomenon. I have been following this at a distance, since it does not feel very applicable to neither my job of running Simics, nor do I see such processors appear in any customer applications. Still, I think it is worth thinking about what a GPGPU really is, at a high level.
About two months ago, Cavium Networks launched their second generation of Octeon chips, the Octeon II. The most obvious difference to the previous generation (Octeon, Octeon Plus) is a new MIPS64 core with much better support for hypervisors and virtualization. There are some other interesting aspects to this chip, though.
When I started out doing computer science “for real” way back, the emphasis and a lot of the fun was in the basics of algorithms, optimizing code, getting complex trees and sorts and hashes right an efficient. It was very much about computing defined as processor and memory (with maybe a bit of disk or printing or user interface accessed at a very high level, and providing the data for the interesting stuff). However, as time has gone on, I have come to feel that this is almost too clean, too easy to abstract… and gone back to where I started in my first home computer, programming close to the metal.
Yes, when does hardware acceleration make sense in networking? Hardware acceleration in the common sense of “TCP offload”. This question was answered by a very nicely reasoned “no” in an article by Mike Odell in ACM Queue called “Network Front-End Processors, Yet Again“.
In a very roundabout way, I recently got to hear about a cool Sun server feature introduced sometime back in 2003 or 2004: the SCC System Configuration Card. This is a smart card that stores the system hostid and Ethernet MACs, along with other info, and which can be transferred from one server to another.
In Episode 157 of Security Now,Steve Gibson and Leo Laporte discuss the recently discovered security issues with DNS. In particular, the cost of making a good fix in terms of bandwidth and computation capacity. Fundamentally, according to Steve, today’s DNS servers are running at a fairly high load, and there is no room to improve the security of DNS updates by for example sending extra UDP packets or switching to TCP/IP. As this theoretically means a doubling or tripling of the number of packets per query, I can believe that. The “real solutions” to DNS problems should lie in the adoption of a truly secured protocol like DNSSEC. As this uses public key crypto (PKC), it would add a processing load to the servers that would kill the DNS servers on the CPU side instead…
The Radio Register has a nice interview with Kunle Olukotun, the man most known for the Afara/Sun Niagara/UltraSparc T1-2-etc. design. It is a long interview, lasting well over an hour, but it is worth a listen. A particular high point is the story on how Kunle worked on parallel processors in the mid-1990s when everyone else was still chasing single-thread performance. He really was a very early proponent of multicore, and saw it coming a bit before most other (general-purpose) computer architects did. Currently, he is working on how to program multiprocessors, at the Stanford Pervasive Parallelism Laboratory (PPL). In the interview, I see several themes that I have blogged about before being reinforced…
I do find it kind of funny when marketing names go bad in unexpected ways of collide in unexpected ways. There is this fairly old Infineon combined DSP/MCU core called TriCore (the name means it is both a RISC, a DSP, and an MCU). It was a nice name, easy to recognize, easy to pronounce, unlike the competition at the time. Today though, we are seeing multicore chips with three cores on the die. So what are these, if not tri-core chips, in analog with single- dual- quad- oct- etc. And this makes it very necessary to use the hyphen. For example, the Freescale recent StarCore 8113 chip with three cores has its press release explicitly headed tri-core with an hyphen. I guess marketing would have liked the more visually pleasing tricore moniker along with dualcore, which looks fairly established.
Ah well, not to mention the fun Infineon will have if it launches a triple-core TriCore device. Maybe in a third generation TriCore 3? The power of three, indeed. TriTriTriCore possibly?
The Register has a few podcasts in addition to their website, and the one called “Semicoherent Computing” has turned into a very nice series of interviews with interesting people from the computer industry. I recently listened to their interview from September 2007 with David Ditzel of Transmeta fame. He had a lot to say about the history of computing, as well as interesting things on where computing is going. Well worth a listen! Particular interesting highlights…
I got another email from my friend with the thesis that processors will become ever more homogeneous as time goes on, while I believe in a relative heterogenezation (is that a word?) of computer architecture with many special-purpose accelerators and helper processors. This argument is put forward in a previous blog post. In this round, the arguments for homogenization are from the gaming world.
Intel has a really neat tool on their homepage: the Intel ARK — Automated Relational Knowledgebase. It is a horrible name for a brilliant tool: it lets you search for processor names, codenames, chipsets, and jump around in a database of processor variants, compatible chipsets, feature lists, and more. Not that I care particularly about Intel chips, but the tool is something everyone selling silicon devices should copy. Being able to quickly figure out just what is inside a certain device (and what is not), and finding related devices and compatible chips is just brilliant for curious people, customers, and supporting services.
Please, everybody else?
The register report “IBM embraces – wtf – Sun’s Solaris across x86 server line” is a very appropriate headline for something quite surprising. The day before this happened, we discussed the announced announcement and said “nah, it can’t be about operating systems”. The idea of IBM in-sourcing Solaris for x86 just felt like the kind of thing that was in the same realm as flying pigs, freezing hells, and similar unlikely events.