On the evening of the last Wednesday in September, we had our first CaSA, Computer and System Architecture Unraveled, event. CaSA is a meetup in Kista (Sweden) for people interested in computer architecture, system architecture, and how software and hardware interact down towards the lower levels of the stack. The topic for the inaugural event was “Core Count Explosion: A Challenge for Hardware and Software”, and it was great in some many ways!Continue reading “The first Computer and System Architecture Unraveled Event in Kista – Great Speakers, Great Fun!”
While I was on vacation, Wind River published a blog post I wrote about the new multicore accelerator feature of Simics 5. The post has some details on what we did, and some of the things we learnt about simulation performance.
A few years ago, I built a demo on Simics that used a hacked Freescale MPC8641D target that was forced to scale from 1 to 8 cores. Some interesting experiements could be made using this target, and it was nicely scalable for its time. However, I always wanted to have something just a bit bigger. Say 20 cores, or 100. Just to see what would happen. Finally, I got it.
The Simics QSP target that we quietly launched earlier this Summer is such a scalable target. As discussed in a blog post describing the architecture, it is designed to scale to 128 cores currently. Using this ability, I repeated my old experiments, but trying very large threads counts and target core counts. The results show clearly that the way that I coded my parallel computation program was pretty bad, and I really would like to try to rewrite it using some more modern threading library. All I need is time and a way to cross-compile Wool…
Anyway, the new blog post is here.
I just found a recent paper on the topic of parallel simulation of computer systems. Christopher Schumacher et al., published an articles at CODES+ISSS in October of 2010 talking about “parSC: Synchronous Parallel SystemC Simulation on Multicore Architectures“. Essentially, parallel SystemC.
In my series (well, I have one previous post about checkpointing) about misunderstood simulation technology items, the turn has come to the most difficult of all it seems: determinism. Determinism is often misunderstood as meaning “unchanging” or “constant” behavior of the simulation. People tend to assume that a deterministic simulation will not reveal errors due to nondeterministic behavior or races in the modeled system, which is a complete misunderstanding. Determinism is a necessary feature of any simulation system that wants to be really helpful to its users, not an evil that hides errors.
Traditional hardware design languages like Verilog were designed to model naturally concurrent behavior, and they naturally leaned on a concept of threads to express this. This idea of independent threads was brought over into the design of SystemC, where it was manifested as cooperative multitasking using a user-level threading package. While threads might at first glance look “natural” as a modeling paradigm for hardware simulations, it is really not a good choice for high-performance simulation.
In practice, threading as a paradigm for software models of hardware circuits connected to a programmable processor brings more problems than it provides benefits in terms of “natural” modeling.
It is a week ago now, and sometimes it is good to let impressions sink in and get processed a bit before writing about an event like the SiCS Multicore Days. Overall, the event was serious fun, and I found the speakers very insightful and the panel discussion and audience questions added even more information.
The two days of the SiCS Multicore Days is now over, and it was a really fun event this year too. I will be writing a few things inspired by the event, and here is the first.
Kunle Olukotun‘s presentation on the work of the Stanford Pervasive Parallelism lab included a diagram where they showed a range of domain-specific languages (DSL) being compiled to a universal implementation language. That language is currently Scala, and in the end all applications end up being compiled into Scala byte codes, which are then optimized and dynamically reoptimized and executed on a particular hardware system based on the properties of that system. Fundamentally, the problem of creating and compiling a DSL, and combining program segments written in different DSLs, is solved by interposing a layer of indirection.
But this idea got me thinking about what the best such intermediary might be for large-scale general deployment.
I listened to episode 35 of FLOSS Weekly that interviewed Brian Aker, creator of the Drizzle fork from MySQL. As most recent episodes of FLOSS Weekly, it is pretty good technical material. What I found interesting was the technical vision behind Drizzle, and how they are aggressively going for quite wide multicore hosts.
I have another short technical piece published about Multicore Debug at the EETimes (and their network of related publications, like Embedded.com). Pretty short piece, and they cut out some bits to make it fit their format. Nothing new to fans of virtual platforms for software development, basically we can use virtual platforms to reintroduce control over parallel and for all practical purposes chaotic hardware/software systems.
Yesterday, I had the honor of being the opponent at the PhD defense of Simon Kågström at Blekinge Tekniska Högskola (BTH, Blekinge University of Technology in English). His PhD thesis deals mainly with the multiprocessor port of an industrial in-house operating system, and a secondary theme was the design of the Cibyl C-programs-to-JVM translator. All of his papers are very well-written and a joy to read, and the engineering work behind it is very solid.
The most important data in the PhD thesis is really just how much work it is to do an SMP port of an OS kernel. And how hard it is to get performance up to good levels even with several years of work. Really emphasizes the point that hard work and perseverance and just lots of calendar time is what it takes to create a good SMP OS. That’s why Solaris and AIX are still years ahead of Linux in this respect — you just need to hit the snags, fix them, retest, and hit the next snag. It takes time to polish, basically.
So, if you have any interest in multiprocessor operating systems, Simon’s work is well-worth a read. Also check out his blog at http://simonkagstrom.livejournal.com/. And by the way, he did pass.
…the discussion, and the need to constantly define our terms (and redefine them, and discuss them when people disagree) makes me wish that the world of electronics, system and software design had some agreement on what the right terms are and what they mean…
I think this is a good idea, but we need to keep the core count out of it…
Sun just bought Montalvo whose hardware I blogged about some while ago. And just like the Apple acquisition of PA Semi, the question of “why” appears. Some analysts blame the simple fact that both Montalvo and PA Semi simply needed to be acquired, since their venture capitalists did not want to put in the next 100 million USD needed to go to silicon (Montalvo) or really expand on the opportunity already at hand (PA Semi). Here is my crazy guess.
The Multicore Expo US 2008 is taking place next week (April 1-3) in Santa Clara, CA. I was originally slated to talk there, but since I am going to the Embedded Systems Conference a few weeks later it was too much travel in too short a time frame to do. I happy that Ross Dickson, a senior technology specialist at Virtutech could take my place. He will do just as good a job as I would, and he also has his own session to present at the Expo.
Our talk will be on how approximate you can be in simulating multicore computers, and still get useful results out from the software running on the simulator. It is something that we at Virtutech have spent a lot of time working on, and we want to bring our results to a wider community. Really exciting to present, and it is a pity that I could not be there myself.
I attended a DATE 2008 open exhibition panel discussion on multicore programming, organized by Gary Smith EDA. The panel was a few people short, and ended up with just Simon Davidmann of Imperas, Grant Martin of Tensilica, and Rudy Lauwereins of IMEC. A user representative from Ericsson was supposed to have been there but he never arrived. Overall, the panel was geared towards data-plane processing-type thinking, and a bit short on internal dissonance.