On the evening of the last Wednesday in September, we had our first CaSA, Computer and System Architecture Unraveled, event. CaSA is a meetup in Kista (Sweden) for people interested in computer architecture, system architecture, and how software and hardware interact down towards the lower levels of the stack. The topic for the inaugural event was “Core Count Explosion: A Challenge for Hardware and Software”, and it was great in some many ways!Continue reading “The first Computer and System Architecture Unraveled Event in Kista – Great Speakers, Great Fun!”
In my third post based on the Simics RISC-V simple virtual platform, I use the it to demonstrate how the Intel Simics simulator uses multiple host threads to simulate multiple target cores. The RISC-V platform is nice in that it has less noise than more complex platforms, allowing for clear and simple measurements.
The SystemC Evolution Fika on April 7 had threading/parallelism as its theme. There were four speakers who presented various angles on how to parallelize SystemC models. The presentations and following discussion provided a variety of perspectives on threading as it can be applied in virtual platforms and other computer architecture simulations. It was pretty clear that the presenters and audience had quite different ideas about just what the target domain looks like and the best way to introduce parallelism to SystemC. Here is my take on what was said.Continue reading “SystemC Evolution Fika: Parallel SystemC”
This blog post was originally posted at Intel back in 2018, but it has since been retired from the Intel blog system. As it is of general interest (in my opinion), here is a reposting (with a few small updates here and there).
Temporal decoupling is a key technology in virtual platforms, and can speed up the execution of a system by several orders of magnitude. In my own experiments, I have seen it provide a speedup of more than 1000x. Here, I will dig a little deeper into temporal decoupling and its semantic effects.Continue reading “Some Notes on Temporal Decoupling (Reposted)”
The July 2020 edition of the Communications of the ACM (CACM) had a front-page theme of “Domains-Specific Hardware Accelerators”, or DSAs. It contained two articles about the subject, one about an academic genomics accelerator, and one about the Google TPU. Hardware accelerators dedicated to particular types of computation are basically everywhere today, and an accepted part of the evolution of computers. The CACM articles have some good tidbits and points about how accelerators are designed and used today. At the same time, I also found a youtube talk about the first hardware accelerator, the IBM Stretch HARVEST, showing both contrasts with today as well as a remarkable continuity in concept.Continue reading “CACM on DSAs”
A new short blog post on my Intel Developer Zone blog talks about the improved threading simulation core we have added in Simics version 6… and about how a colleague of mine climbed to the top of the highest mountain in Europe and showed a flag with our new Simics icon! Read the story at https://software.intel.com/en-us/blogs/2019/09/10/simics-6-at-the-mountain-top.
The SiCS Multicore Day took place last week, for the tenth year in a row! It is still a very good event to learn about multicore and computer architecture, and meet with a broad selection of industry and academic people interested in multicore in various ways. While multicore is not bright shiny new thing it once was, it is still an exciting area of research – even if much of the innovation is moving away from the traditional field of making a bunch of processor cores work together, towards system-level optimizations. For the past few years, SiCS has had to good taste to publish all the lectures online, so you can go to their Youtube playlist and see all the talks for free, right now!
I love bug and debug stories in general. Bugs are a fun and interesting part of software engineering, programming, and systems development. Stories that involve running Simics on Simics to find bugs are a particular category that is fascinating, as it shows how to apply serious software technology to solve problems related to said serious software technology. On the Intel Software and Services blog, I just posted a story about just that: debugging a Linux kernel bug provoked by Simics, by running Simics on a small network of machines inside of Simics. See https://blogs.intel.com/evangelists/2016/05/30/finding-kernel-1-2-3-bug-running-wind-river-simics-simics/ for the full story.
A new record, replay, and reverse debugger has appeared, and I just had to take a look at what they do and how they do it. “rr” has been developed by the Firefox developers at Mozilla Corporation, initially for the purpose of debugging Firefox itself. Starting at a debugger from the angle of attacking a particular program does let you get things going quickly, but the resulting tool is clearly generally useful, at least for Linux user-land programs on x86. Since I have tried to keep up with the developments in this field, a write-up seems to be called for.
In a blog post at Wind River, I describe how the Wind River Helix Lab Cloud system can be used to communicate hardware design to software developers. The idea is that you upload a virtual platform to the cloud-based system, and then share it to the software developers. In this way, there is no need to install or build a virtual platform locally, and the sender has perfect control over access and updates. It is a realization of the hardware communication principles I presented in an earlier blog post on use cases for Lab Cloud.
But the past part is that the targets I talk about in the blog post and use in the video are available for anyone! Just register on Lab Cloud, and you can try your own threaded software and check how it scales on a simulated 8-core ARM!
On November 3, 2015, I will give a presentation at the Embedded Conference Scandinavia about simulating IoT systems. The conference program can be found at http://www.svenskelektronik.se/ECS/ECS15/Program.html, with my session detailed at http://www.svenskelektronik.se/ECS/ECS15/Program/IoT%20Development.html.
My topic is how to realistically simulate very large IoT networks for software testing and system development. This is a fun field where I have spent significant time recently. Only a couple of weeks ago, I tried my hand simulating a 1000-node network. Which worked! I had 1000 ARM-based nodes running VxWorks running at the same time, inside a single Simics process, and at speeds close to real time! It did use some 55GB of RAM, which I think is a personal record for largest use of system resources from a single process. Still, it only took a dozen processors to do it.
I have a silly demo program that I have been using for a few years to demonstrate the Simics Analyzer ability to track software programs as they are executing and plot which threads run where and when. This demo involves using that plot window to virtually draw text, in a way akin to how I used to make my old ZX Spectrum “draw” things in the border. But when I brought it up in a new setting it failed to run properly and actually starting hanging on me. Strange, but also quite funny when I realized that I had originally foreseen this very problem and consciously decided not to put in a fix for it… which now came back to bite me in a pretty spectacular way. But at least I did get an interesting bug to write about.
Via the EETimes, I found a very interesting talk by Bristol professor David May, presented at the 4th Annual Bristol Multicore Challenge, in June of 2013. The talk can be found as a Youtube movie here, and the slides are available here. The EETimes focused on the idea to cut down ARM to be really RISC, but I think the more interesting part is Professor May’s observations on multicore computing in general, and the case for and against heterogeneity in (parallel) computers.
Probably thanks to the yearly Mobile World Congress, there have been a slew of recent announcements of mobile application processors recently. Everything is ARM-based, but show quite some variety in the CPU core configurations used. Indeed, I think this variety has something to say on the general state of multicore.