I had many interesting conversations at the HiPEAC 2017 conference in Stockholm back in January 2017. One topic that came up several times was the GEM5 research simulator, and some cool tricks implemented in it in order to speed up the execution of computer architecture experiments. Later, I located some research papers explaining the “full speed ahead” technology in more detail. The mix of fast simulation using virtualization and clever tricks with cache warming is worth a blog post.
When mobile phones first appeared, they were powered by very simple cores like the venerable ARM7 and later the ARM9. Low clock frequencies, zero microarchitectural sophistication, sufficient for the job. In recent years, as smartphones have come into their own as the most important computing device for most people, the processor performance of mobile phones have increased tremendously. Today, cutting-edge phones and tablets contain four or eight cores, running at clock frequencies well above 2 gigahertz. The performance race for most of the market (more about that in a moment) was mostly about pushing higher clock frequencies and more cores, even while microarchitecture was left comparatively simple. Mobile meant “fairly simple”, and IPC was nowhere near what you would get with a typical Intel processor for a laptop or desktop.
Today, that seems to be changing, as the Nvidia Denver core and Apple’s Cyclone core both go the route of a few fat cores rather than many thin cores.
Via the EETimes, I found a very interesting talk by Bristol professor David May, presented at the 4th Annual Bristol Multicore Challenge, in June of 2013. The talk can be found as a Youtube movie here, and the slides are available here. The EETimes focused on the idea to cut down ARM to be really RISC, but I think the more interesting part is Professor May’s observations on multicore computing in general, and the case for and against heterogeneity in (parallel) computers.
Probably thanks to the yearly Mobile World Congress, there have been a slew of recent announcements of mobile application processors recently. Everything is ARM-based, but show quite some variety in the CPU core configurations used. Indeed, I think this variety has something to say on the general state of multicore.
When I grew up with computers, the big RISC vs CISC debate was raging. At the time, in the late 1980s, it did indeed seem that RISC was inherently superior to CISC. SPARCs, MIPS, and Alpha all outpaced boring old x86, VAX and 68000 processors. This turned out to be a historical parenthesis, as the Pentium Pro from Intel showed how RISC-style performance could be mated to a CISC ISA. However, maybe ISAs still do matter.
Nvidia recently announced that their already-known “Kal-El” quad-core ARM Cortex-A9 SoC actually contains five processor cores, not just four as a “normal” quad-core would. They call the architecture “Variable SMP”, and it is a pretty smart design. The one where you think, “I should have thought of that”, which is the best sign of something truly good.
I just read an interview with Steve Furber, the original ARM designer, in the May 2011 issue of the Communications of the ACM. It is a good read about the early days of the home computing revolution in the UK. He not only designed the ARM processor, but also the BBC Micro and some other early machines.
By chance, I got to attend a day at the UPMARC Summer School with a very enjoyable talk by Francesco Zappa Nardelli from INRIA. He described his work (along with others) on understanding and modeling multiprocessor memory models. It is a very complex subject, but he managed to explain it very well.
Looks like S4D (and the co-located FDL) is becoming my most regular conference. S4D is a very interactive event. With some 20 to 30 people in the room, many of them also presenting papers at the conference, it turns into a workshop at its best. There were plenty of discussion going on during sessions and the breaks, and I think we all got new insights and ideas.
The recent news that Microsoft has taken out an ARM architectural license has caused a lot of speculation about just what this might mean. There are several quite well reasoned ideas around the web, and I have one idea of my own: sixty-four bits.
For my parental leave, I have just bought myself a Lego Mindstorm NXT 2.0 kit. It is not much fun for our youngest, who mostly gets a bit scared by a piece of Lego driving around making noises, but I hope to be able to use it to teach my older child (almost five) to program. Let’s see how that turns out. It looks hard to make the NXT environment provide the kind of Roborally-style programming blocks that I had hoped to create, as I cannot for some reason get a sufficiently custom icon onto custom blocks.
It also presented me with an opportunity to try some domain-specific high-level graphical programming. The programming environment provided for the NXT series of Mindstorms kits is based on LabView from National Instruments, and it really does seem to work. It even features parallel tasks, which I tried to use…
In a funny coincidence, I published an article at SCDSource.com about the need for cycle-accurate models for virtual platforms on the same day that ARM announced that they were selling their cycle-accurate simulators and associated tool chain to Carbon Technology. That makes one wonder where cycle-accuracy is going, or whether it is a valid idea at all… is ARM right or am I right, or are we both right since we are talking about different things?
Let’s look at this in more detail.