In a blog post at Wind River, I describe how the Wind River Helix Lab Cloud system can be used to communicate hardware design to software developers. The idea is that you upload a virtual platform to the cloud-based system, and then share it to the software developers. In this way, there is no need to install or build a virtual platform locally, and the sender has perfect control over access and updates. It is a realization of the hardware communication principles I presented in an earlier blog post on use cases for Lab Cloud.
But the past part is that the targets I talk about in the blog post and use in the video are available for anyone! Just register on Lab Cloud, and you can try your own threaded software and check how it scales on a simulated 8-core ARM!
In a dusty bookshelf at work I found an ancient tome of wisdom, long abandoned by its previous owner. I was pointed to it by a fellow explorer of the dark arts of computer system design as something that you really should read. The book was “Fortress Rochester”, written by Frank Soltis, and published in 2001.
Continue reading “Fortress Rochester”
I just read an article from IEEE Annals of Computing history about the COMIC Color-Matching Analog computer built and sold by Davidson and Hemmendinger, a US firm. It seems the computer is pretty well known inside the colorant industry, actually, and it provides an interesting example of how to do a good-enough solution to break open the market – while leaving the user in control of the process to build faith in the approach.
Continue reading “Color Matching by Analog Computing Around 1960”
When mobile phones first appeared, they were powered by very simple cores like the venerable ARM7 and later the ARM9. Low clock frequencies, zero microarchitectural sophistication, sufficient for the job. In recent years, as smartphones have come into their own as the most important computing device for most people, the processor performance of mobile phones have increased tremendously. Today, cutting-edge phones and tablets contain four or eight cores, running at clock frequencies well above 2 gigahertz. The performance race for most of the market (more about that in a moment) was mostly about pushing higher clock frequencies and more cores, even while microarchitecture was left comparatively simple. Mobile meant “fairly simple”, and IPC was nowhere near what you would get with a typical Intel processor for a laptop or desktop.
Today, that seems to be changing, as the Nvidia Denver core and Apple’s Cyclone core both go the route of a few fat cores rather than many thin cores.
Continue reading “Thin Phone, Fat Core”
I just found and read an old text in the computer systems field, “Why Do Computers Fail and What Can Be Done About It?” , written by Jim Gray at Tandem Computers in 1985. It is a really nice overview of the issues that Tandem had encountered in their customer based, back in the early 1980s. The report is really a classic in the computer systems field, but I did not read it until now. Tandem was an early manufacturer of explicitly fault tolerant and highly reliable and available computers. In this technical report Jim Gray describes the basic principles of fault tolerance, and what kinds of faults happen in the field and that need to be tolerated.
Continue reading “Why DO Computers Fail?”
At the ISCA 2014 conference (the biggest event in computer architecture research), a group of researchers from Microsoft Research presented a paper on their Catapult system. The full title of the paper is “A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services“, and it is about using FPGAs to accelerate search engine queries at datacenter scale. It has 23 authors, which is probably the most I have ever seen on an interesting paper. There are many things to be learnt from and discussed about this paper, and here are my thoughts on it.
Continue reading “Microsoft Catapult – Real Interesting Research at Real Scale”
The Mill is a new general-purpose high-performance processor design from out-of-the-box computing (http://ootbcomp.com/). They claim to beat typical high-end out-of-order (OOO) designs like the Intel Haswell generation by crazy factors, such as being 2.3x faster while using 2.3x less power compared to a Haswell. All the while costing less. Ignoring the cost aspect, the power and performance numbers are truly impressive – especially for general code. How can they do something so much better than what we have today? For general code? That requires some serious innovation. With that perspective, I ask myself where the Mill is really significantly different from what we have seen before.
Continue reading ““The Mill” – It just might Work!”
Apple just released their new iPhone 5s, where the biggest news is really the 64-bit processor core inside the new A7 SoC. Sixty four bits in a phone is a first, and it immediately raises the old question of just what 64 bits gives you. We saw this when AMD launched the Opteron and 64-bit x86 PC computing back in the early 2000’s, and in a less public market the same question was asked as 64-bit MIPS took huge chunks out of the networking processor market in the mid-2000s. It was never questioned in servers, however.
Continue reading “The First 64-bit Phone”
Via the EETimes, I found a very interesting talk by Bristol professor David May, presented at the 4th Annual Bristol Multicore Challenge, in June of 2013. The talk can be found as a Youtube movie here, and the slides are available here. The EETimes focused on the idea to cut down ARM to be really RISC, but I think the more interesting part is Professor May’s observations on multicore computing in general, and the case for and against heterogeneity in (parallel) computers.
Continue reading “David May on Multicore: Heterogeneity not Needed”
It is quite interesting to see how Qualcomm has emerged as a major player in the “processor market” and is trying to build themselves into a serious consumer brand. I used to think of them as a company doing modems and other chips that made phones talk wirelessly, known to insiders in the business but not anything a user cared about. Today, however, they are working hard on building themselves into a brand to rival Intel and AMD. At the center of this is their own line of ARM-based application processors, the Snapdragon. I can see some thinking quite similar to the old “Intel Inside” classic, and I would not be surprised to see the box or even body of a phone carrying a Snapdragon logo at some point in the future. A part of this branding exercise is the Snapdragon Batteryguru, an application I recently stumbled on in the Google Play store.
Continue reading “Qualcomm’s Batteryguru – and Branding”
Probably thanks to the yearly Mobile World Congress, there have been a slew of recent announcements of mobile application processors recently. Everything is ARM-based, but show quite some variety in the CPU core configurations used. Indeed, I think this variety has something to say on the general state of multicore.
Continue reading “Two Cores, Four Cores, Eight Cores – Mobile Variety”
When I grew up with computers, the big RISC vs CISC debate was raging. At the time, in the late 1980s, it did indeed seem that RISC was inherently superior to CISC. SPARCs, MIPS, and Alpha all outpaced boring old x86, VAX and 68000 processors. This turned out to be a historical parenthesis, as the Pentium Pro from Intel showed how RISC-style performance could be mated to a CISC ISA. However, maybe ISAs still do matter.
Continue reading “Does ISA Matter for Performance?”
The 2012 edition of the SiCS Multicore Day was fun, like they have always been in the past. I missed it in 2010 and 2011, but could make it back this year. It was interesting to see that the points where keynote speakers disagreed was similar to previous years, albeit with some new twists. There was also a trend in architecture, moving crypto operations into the core processor ISA, that indicates another angle on the hardware accelerator space.
Continue reading “SiCS Multicore Day 2012”
I recently read the classic book The Soul of a New Machine by Tracy Kidder. Even though it describes the project to build a machine that was launched more than 30 years ago, the story is still fresh and familiar. Corporate intrigue, managing difficult people, clever engineering, high pressure, all familiar ingredients in computing today just as it was back then. With my interesting in computer history and simulation, I was delighted to actually find a simulator in the story too! It was a cycle-accurate simulator of the design, programmed in 1979.
Continue reading ““Eagle” Cycle-Accurate Simulator Anno 1979″
Carbon Design Systems have been on a veritable blogging spree recently, pushing out a large number of posts around various topics. Maybe a bit brief for my taste in most cases (I have a tendency to throw out 1000+ word pseudo-articles when I take the time to write a blog), but sometimes very interesting nevertheless. I particularly liked a few posts on cache analysis, as they presented some good insight into not-quite-expected processor and cache behaviors.
Continue reading “Some Fun Cache Results from Carbon”
When IBM moved their mainframe systems (the S/360 family that is today called System Z) from BiCMOS to mainstream CMOS in 1994, the net result was a severe loss in clock frequency and thus single-processor performance. Still, the move had to be done, since CMOS would scale much better into the future. As a result, IBM introduced additional parallelism to the system in order to maintain performance parity. Parallelism as a patch, essentially.
Continue reading “IBM Mainframe: Parallelism as Patch”
Carbon Design Systems have been quite busy lately with a flurry of blog posts about various aspects of virtual prototype technology. Mostly good stuff, and I tend to agree with their push that a good approach is to mix fast timing-simplified models with RTL-derived cycle-accurate models. There are exceptions to this, in particular exploratoty architecture and design where AT-style models are needed. Recently, they posted about their new Swap ‘n’ Play technology, which is a old proven idea that has now been reimplemented using ARM fast simulators and Carbon-generated ARM processor models.
Continue reading “Carbon “Swap ‘n’ Play” – A New Implementation of an Old Idea”
Once upon a time, all programming was bare metal programming. You coded to the processor core, you took care of memory, and no operating system got in your way. Over time, as computer programmers, users, and designers got more sophisticated and as more clock cycles and memory bytes became available, more and more layers were added between the programmer and the computer. However, I have recently spotted what might seem like a trend away from ever-thicker software stacks, in the interest of performance and, in particular, latency.
Continue reading “Back to Bare Metal”
I was recently pointed to a 2011 SPLASH presentation by David Ungar, an IBM researcher working on parallel programming for manycore systems. In particular, in a project called Renaissance, run together with the Vrije Universiteit Brussels in Belgium (VUB) and Portland State University in the US. The title of the presentation is “Everything You Know (about Parallel Programming) Is Wrong! A Wild Screed about the Future“, and it has provoked some discussion among people I know about just how wrong is wrong.
Continue reading “David Ungar: It is Good to be Wrong”
Fault Injection is a topic that has fascinated me for a long time. Not just the area of software-to-software fault injection, but more so how you inject faults into hardware using hardware (and how to conveniently approximate this using a simulator). I just stumbled on a short interesting note about such hardware-actuated fault injection in a Fujitsu article.
Continue reading “Fujitsu Server Fault Injection Robot”
I just read a quite interesting article by Christian Pinto et al, “GPGPU-Accelerated Parallel and Fast Simulation of Thousand-core Platforms“, published at the CCGRID 2011 conference. It discusses some work in using a GPGPU to run simulations of massively parallel computers, using the parallelism of the GPU to speed the simulation. Intriguing concept, but the execution is not without its flaws and it is unclear at least from the paper just how well this generalizes, scales, or compares to parallel simulation on a general-purpose multicore machine.
Continue reading “GPGPU for Instruction-Set Simulation – Maybe, Maybe not”
Nvidia recently announced that their already-known “Kal-El” quad-core ARM Cortex-A9 SoC actually contains five processor cores, not just four as a “normal” quad-core would. They call the architecture “Variable SMP”, and it is a pretty smart design. The one where you think, “I should have thought of that”, which is the best sign of something truly good.
Continue reading “Nvidia “Kal-El” Variable SMP”
From what little I had heard and read, the IBM AS/400 (later known as iSeries, and now known as simply IBM i) sounded like a fascinating system. I knew that it had a rich OS stack that contained most of the services a program needs, and a JVM-style byte code format for applications that let it change from custom processors to Power Architecture without impacting users at all. It was supposedly business-critical and IBM-quality rock solid. But that was about it.
So when Software Engineering Radio episode 177 interviewed the i chief architect Steve Will, I was hooked. It turned out that IBM i was cooler than I imagined. Here are my notes on why I think that IBM i is one of the most interesting systems out there in real use.
Episodes 299 and 301 of the SecurityNow podcast deal with the problem of how to get randomness out of a computer. As usual, Steve Gibson does a good job of explaining things, but I felt that there was some more that needed to be said about computers and randomness, as well as the related ideas of predictability, observability, repeatability, and determinism. I have worked and wrangled with these concepts for almost 15 years now, from my research into timing prediction for embedded processors to my current work with the repeatable and reversible Simics simulator.
Continue reading “SecurityNow on Randomness”
I previously blogged about the HAVEGE algorithm that is billed as extracting randomness from microarchitectural variations in modern processors. Since it was supposed to rely on hardware timing variations, I wondered what would happen if I ran it on Simics that does not model the processor pipeline, caches, and branch predictor. Wouldn’t that make the randomness of HAVEGE go away?
Continue reading “Evaluating HAVEGE Randomness”
When I was working on my PhD in WCET – Worst-Case Execution Time analysis – our goal was to utterly precisely predict the precise number of cycles that a processor would take to execute a certain piece of code. We and other groups designed analyses for caches, pipelines, even branch predictors, and ways to take into account information about program flow and variable values.
The complexity of modern processors – even a decade ago – was such that predictability was very difficult to achieve in practice. We used to joke that a complex enough processor would be like a random number generator.
Funnily enough, it turns out that someone has been using processors just like that. Guess that proves the point, in some way.
Continue reading “Execution Time is Random, How Useful”
Endianness is a topic in computer architecture that can give anyone a headache trying to understand exactly what is happening and why. In the field of computer simulation, it is a pervasive problem that takes some thinking to solve in an efficient, composable, and portable way.
This blog post describes how I am used to working with endianness in virtual platforms, and why this approach makes sense to me. There are other ways of dealing with endianness, with different trade-offs and overriding goals.
Past Friday, I posted a new blog post in my Wind River blog. It is an interview the PhD student Girish Venkatasubramanian from the University of Florida. He is doing research on virtual machines/hypervisors and how they can be implemented more efficiently by making fairly small changes to the architecture of memory management units.
Continue reading “Wind River Blog: Interview with a Virtualization Researcher”
The recent news that Microsoft has taken out an ARM architectural license has caused a lot of speculation about just what this might mean. There are several quite well reasoned ideas around the web, and I have one idea of my own: sixty-four bits.
Continue reading “Microsoft + ARM = ARM64?”
I have just found what almost has to be the first cycle-accurate computer simulator in history. According to the article “Stretch-ing is Great Exercise — It Gets You in Shape to Win” by Frederick Brooks (the man behind the Mythical Man-Month) in the January-March 2010 issue of IEEE Annals of the History of Computing, IBM created a simulator of the pipeline for the IBM 7030 “Stretch” computer developed from 1956 to 1961 (photo from IBM.com).
Continue reading “Pipeline Performance Simulator Anno 1960”