Probably thanks to the yearly Mobile World Congress, there have been a slew of recent announcements of mobile application processors recently. Everything is ARM-based, but show quite some variety in the CPU core configurations used. Indeed, I think this variety has something to say on the general state of multicore.
Category Archives: Multicore Software
A few years ago, I built a demo on Simics that used a hacked Freescale MPC8641D target that was forced to scale from 1 to 8 cores. Some interesting experiements could be made using this target, and it was nicely scalable for its time. However, I always wanted to have something just a bit bigger. Say 20 cores, or 100. Just to see what would happen. Finally, I got it.
The Simics QSP target that we quietly launched earlier this Summer is such a scalable target. As discussed in a blog post describing the architecture, it is designed to scale to 128 cores currently. Using this ability, I repeated my old experiments, but trying very large threads counts and target core counts. The results show clearly that the way that I coded my parallel computation program was pretty bad, and I really would like to try to rewrite it using some more modern threading library. All I need is time and a way to cross-compile Wool…
Anyway, the new blog post is here.
I was recently pointed to a 2011 SPLASH presentation by David Ungar, an IBM researcher working on parallel programming for manycore systems. In particular, in a project called Renaissance, run together with the Vrije Universiteit Brussels in Belgium (VUB) and Portland State University in the US. The title of the presentation is “Everything You Know (about Parallel Programming) Is Wrong! A Wild Screed about the Future“, and it has provoked some discussion among people I know about just how wrong is wrong.
I just read a quite interesting article by Christian Pinto et al, “GPGPU-Accelerated Parallel and Fast Simulation of Thousand-core Platforms“, published at the CCGRID 2011 conference. It discusses some work in using a GPGPU to run simulations of massively parallel computers, using the parallelism of the GPU to speed the simulation. Intriguing concept, but the execution is not without its flaws and it is unclear at least from the paper just how well this generalizes, scales, or compares to parallel simulation on a general-purpose multicore machine.
By chance, I got to attend a day at the UPMARC Summer School with a very enjoyable talk by Francesco Zappa Nardelli from INRIA. He described his work (along with others) on understanding and modeling multiprocessor memory models. It is a very complex subject, but he managed to explain it very well.
I just finished reading the October 2010 issue of Communications of the ACM. It contained some very good articles on performance and parallel computing. In particular, I found the ACM Case Study on the parallelism of Photoshop a fascinating read. There was also the second part of Cary Millsap’s articles about “Thinking Clearly about Performance”.
I listened to an interesting FLOSS Weekly interview with Adam Hall and Achim Hasenmuller of VirtualBox. For someone interested in virtual machines and hardware simulation, the interview was full of interested tidbits. I think the best part was the discussion on multiprocessing in Virtualbox.
I recently read a couple of articles on multicore that felt a bit like jumping back in time. In IEEE Spectrum, David Patterson at Berkeley’s parallel computing lab brings up the issue of just how hard it is to program in parallel and that this makes the wholesale move to multicore into something like a “hail Mary pass” for the computer industry. In Computer World, Chris Nicols at NICTA in Australia asks what you will do with a hundred cores – implying that there is not much you can do today. While both articles make some good points, I also think they should be taken with a grain of salt. Things are better than they make them seem. Read More →
I have another blog up at Wind River. This one is about multicore bugs that cannot happen on multithreaded systems, and is called True Concurrency is Truly Different (Again). It bounces from a recent interesting Windows security flaw into how Simics works with multicore systems.
Today here at the MCC 2009 workshop, I heard an interesting talk by David Black-Schaffer of Stanford university. His work is on stream programming for image processing (“2D streams”). Pretty simple basic idea, to use 2D blobs of pixels as kernel inputs rather than single values or vectors. Makes eminent sense for image processing.
Last Friday, I attended this year’s edition of the SiCS Multicore Day. It was smaller in scale than last year, being only a single day rather than two days. The program was very high quality nevertheless, with keynote talks from Hazim Shafi of Microsoft, Richard Kaufmann of HP, and Anders Landin of Sun. Additionally, there was a mid-day three-track session with research and industry talks from the Swedish multicore community. Read More →
Øredev is the premier software development conference in Sweden and Europe (they claim). I gave some presentations there in 2006 and 2007, but since then they have dropped the general embedded software development track and just focused on programming for mobile phones. Most of the material is “general IT”. If you are doing software development on the desktop or for servers, it is a good place to go to learn new things from the general world of computing.
Last year, FLOSS Weekly interviewed Jan Lehnard of the CouchDB project. I put up a blog post on this, noting that it was interesting with a scalable parallel program written in Erlang, a true concurrent language. The interview was interesting, but not very deeply technical. Now, almost a year later, the StackOverflow podcast, number 59, interviewed the founder of the project, Damien Katz. This interview goes a bit more into the technical details and what CouchDB is good for and what not, as well as some details on the use and performance of Erlang.
I have a short article on multicore systems development and virtual platforms in the May 2009 issue of ECN magazine, over at www.ecnmag.com.
Last year in a blog post on video encoding for the iPod Nano, I complained about the lack of performance on my old Athlon. A bit later, I noted that (obviously) video encoding is a good example of an application that can take advantage of parallelism. Yesterday I put these two topics together in a practical test. And it worked nicely enough.
Unknown to most, IBM has one of the world’s longest records of using virtual platforms for software and firmware development and verification. This project has been ongoing since at least the days of the zSeries 900 machines, through z990, z9, and now z10. An excellent article on this virtual platform and its uses is found in the IBM Journal of Research and Development, number 1, 2009, . It is called “IBM System z10 Firmware Simulation”, by Körner et al.
A common question from simulation users to us simulation providers is “can I simulate a machine with N cores”, where N is “large”. As if running lots of cores was a simulation system or even a hardware problem. In almost all cases, the problem is with software. Creating an arbitrary configuration in a virtual platform is easy. Creating a software stack for that arbitrary platform is a lot harder, since an SMP software stack needs to understand about the cores and how they communicate.
Essentially, what you need is a hardware design that has addressing room for lots of cores, and a software stack that is capable of using lots of cores — even if such configurations do not exist in hardware. Unfortunately, since software is normally written to run on real existing machines, there tends to be unexpected limitations even where scalability should be feasible “in principle”.
Here is the story of how I convinced Linux to handle more than two cores in a virtual MPC8641D machine.