Wind River Blog: Simics 5 Multicore Accelerator Explained

While I was on vacation, Wind River published a blog post I wrote about the new multicore accelerator feature of Simics 5. The post has some details on what we did, and some of the things we learnt about simulation performance.


Wind River Blog: Testing Multicore Scaling with a Simics QSP

A few years ago, I built a demo on Simics that used a hacked Freescale MPC8641D target that was forced to scale from 1 to 8 cores. Some interesting experiements could be made using this target, and it was nicely scalable for its time. However, I always wanted to have something just a bit bigger. Say 20 cores, or 100. Just to see what would happen. Finally, I got it.

The Simics QSP target that we quietly launched earlier this Summer is such a scalable target. As discussed in a blog post describing the architecture, it is designed to scale to 128 cores currently. Using this ability, I repeated my old experiments, but trying very large threads counts and target core counts. The results show clearly that the way that I coded my parallel computation program was pretty bad, and I really would like to try to rewrite it using some more modern threading library. All I need is time and a way to cross-compile Wool

Anyway, the new blog post is here.

Parallel SystemC Simulation

I just found a recent paper on the topic of parallel simulation of computer  systems. Christopher Schumacher et al., published an articles at CODES+ISSS in October of 2010 talking about “parSC: Synchronous Parallel SystemC Simulation on Multicore Architectures“. Essentially, parallel SystemC.

Continue reading “Parallel SystemC Simulation”

Simulation Determinism: Necessary or Evil?

gearsIn my series (well, I have one previous post about checkpointing) about misunderstood simulation technology items, the turn has come to the most difficult of all it seems: determinism. Determinism is often misunderstood as meaning “unchanging” or “constant” behavior of the simulation. People tend to assume that a deterministic simulation will not reveal errors due to nondeterministic behavior or races in the modeled system, which is a complete misunderstanding. Determinism is a necessary feature of any simulation system that wants to be really helpful to its users, not an evil that hides errors.

Continue reading “Simulation Determinism: Necessary or Evil?”

Threading or Not as a Hardware Modeling Paradigm

gears-modelingTraditional hardware design languages like Verilog were designed to model naturally concurrent behavior, and they naturally leaned on a concept of threads to express this. This idea of independent threads was brought over into the design of SystemC, where it was manifested as cooperative multitasking using a user-level threading package. While threads might at first glance look “natural” as a modeling paradigm for hardware simulations, it is really not a good choice for high-performance simulation.

In practice, threading as a paradigm for software models of hardware circuits connected to a programmable processor brings more problems than it provides benefits in terms of “natural” modeling.

Continue reading “Threading or Not as a Hardware Modeling Paradigm”

SiCS Multicore Days: The Debate Points

It is a week ago now, and sometimes it is good to let impressions sink in and get processed a bit before writing about an event like the SiCS Multicore Days. Overall, the event was serious fun, and I found the speakers very insightful and the panel discussion and audience questions added even more information.

Continue reading “SiCS Multicore Days: The Debate Points”

The JVM as Universal Parallel Glue?

The two days of the SiCS Multicore Days is now over, and it was a really fun event this year too. I will be writing a few things inspired by the event, and here is the first.

Kunle Olukotun‘s presentation on the work of the Stanford Pervasive Parallelism lab included a diagram where they showed a range of domain-specific languages (DSL) being compiled to a universal implementation language. That language is currently Scala, and in the end all applications end up being compiled into Scala byte codes, which are then optimized and dynamically reoptimized and executed on a particular hardware system based on the properties of that system. Fundamentally, the problem of creating and compiling a DSL, and combining program segments written in different DSLs, is solved by interposing a layer of indirection.

But this idea got me thinking about what the best such intermediary might be for large-scale general deployment.

Continue reading “The JVM as Universal Parallel Glue?”

FLOSS Weekly: Drizzle: Aggressive Push to Multicore

I listened to episode 35 of FLOSS Weekly that interviewed Brian Aker, creator of the Drizzle fork from MySQL. As most recent episodes of FLOSS Weekly, it is pretty good technical material. What I found interesting was the technical vision behind Drizzle, and how they are aggressively going for quite wide multicore hosts.

Continue reading “FLOSS Weekly: Drizzle: Aggressive Push to Multicore”

EETimes Article on Multicore Debug

I have another short technical piece published about Multicore Debug at the EETimes (and their network of related publications, like Pretty short piece, and they cut out some bits to make it fit their format. Nothing new to fans of virtual platforms for software development, basically we can use virtual platforms to reintroduce control over parallel and for all practical purposes chaotic hardware/software systems.

Simon Kågström, PhD

BTH logoYesterday, I had the honor of being the opponent at the PhD defense of Simon Kågström at Blekinge Tekniska Högskola (BTH, Blekinge University of Technology in English). His PhD thesis deals mainly with the multiprocessor port of an industrial in-house operating system, and a secondary theme was the design of the Cibyl C-programs-to-JVM translator. All of his papers are very well-written and a joy to read, and the engineering work behind it is very solid.

The most important data in the PhD thesis is really just how much work it is to do an SMP port of an OS kernel. And how hard it is to get performance up to good levels even with several years of work. Really emphasizes the point that hard work and perseverance and just lots of calendar time is what it takes to create a good SMP OS. That’s why Solaris and AIX are still years ahead of Linux in this respect — you just need to hit the snags, fix them, retest, and hit the next snag. It takes time to polish, basically.

So, if you have any interest in multiprocessor operating systems, Simon’s work is well-worth a read. Also check out his blog at  And by the way, he did pass.

Grant Martin on Manycore Multicore MPSoC AMP SMP Multi-X…

Grant Martin is a nice fellow from Tensilica who has a blog at ChipDesignMag. In a recent post, he raises the question of nomenclature and taxonomy for multicore processor designs:

…the discussion, and the need to constantly define our terms (and redefine them, and discuss them when people disagree) makes me wish that the world of electronics, system and software design had some agreement on what the right terms are and what they mean…

I think this is a good idea, but we need to keep the core count out of it…

Continue reading “Grant Martin on Manycore Multicore MPSoC AMP SMP Multi-X…”

Sun buys Montalvo

Sun just bought Montalvo whose hardware I blogged about some while ago. And just like the Apple acquisition of PA Semi, the question of “why” appears. Some analysts blame the simple fact that both Montalvo and PA Semi simply needed to be acquired, since their venture capitalists did not want to put in the next 100 million USD needed to go to silicon (Montalvo) or really expand on the opportunity already at hand (PA Semi). Here is my crazy guess.

Continue reading “Sun buys Montalvo”

Multicore Expo US 2008

The Multicore Expo US 2008 is taking place next week (April 1-3) in Santa Clara, CA. I was originally slated to talk there, but since I am going to the Embedded Systems Conference a few weeks later it was too much travel in too short a time frame to do. I happy that Ross Dickson, a senior technology specialist at Virtutech could take my place. He will do just as good a job as I would, and he also has his own session to present at the Expo.

Our talk will be on how approximate you can be in simulating multicore computers, and still get useful results out from the software running on the simulator. It is something that we at Virtutech have spent a lot of time working on, and we want to bring our results to a wider community. Really exciting to present, and it is a pity that I could not be there myself.

DATE 2008 Panel on Multicore Programming

date2008I attended a DATE 2008 open exhibition panel discussion on multicore programming, organized by Gary Smith EDA. The panel was a few people short, and ended up with just Simon Davidmann of Imperas, Grant Martin of Tensilica, and Rudy Lauwereins of IMEC. A user representative from Ericsson was supposed to have been there but he never arrived. Overall, the panel was geared towards data-plane processing-type thinking, and a bit short on internal dissonance.

Continue reading “DATE 2008 Panel on Multicore Programming”

Multicore Denial-of-Service Attack

In a paper from USENIX 2007 by Microsoft Researchers Onur Mutlu and Thomas Moscibroda present a working “denial of service” attack for multicore processors. The idea is simple: since there is no fairness or security designed into current DRAM controllers, it is quite feasible for one program in a multicore system to hog almost all memory bandwidth and thus reduce or deny service to the others. There is no direct attack on software programs, just stealing the resources that they all need to share for all to work.
Continue reading “Multicore Denial-of-Service Attack”

Something to make use of all those cores: Raytracing

In the INQUIRER article “Intel pushes Raytracing again“, they have an example of an application with almost insatiable appetite for processor cycles and processor cores. Real-time raytracing. With 16 full-size x86 cores, they can match the framerate of a regular mid-range GPU — but with picture quality of raytracing rather than the approximations of rasterizers. So, better quality, using something like 5 to 10 times as many transistors as the GPU would. This application can certainly use almost any amount of hardware, good for Intel 🙂

Montalvo: Heterogeneous x86 Multicore

montalvo-fg.gifCNET (of all places) have a short article on what Montalvo Systems are up to: Secret recipe inside Intel’s latest competitor | CNET The article is a bit short on details, but it sounds like it is finally an example of a same-ISA, different-powered-cores heterogeneous multicore device in the mainstream. The idea has a lot of merit, and it will be very interesting to see the final results once silicon ships. I really believe is heterogeneous designs.

To be critical, trying to compete with Intel might not be the best idea around… but it never hurts to try. Also, the name is not unique, there is already a that is not I think the old name “Memorylogix” was more interesting and less prone to website name collisions (yes, it seems to be the same company that briefly surfaced with some stripped-down x86 processor back in 2002 — I have an MPR article to prove it).

Multithreading Game AI

Over at an online publication called AI Game Dev, there is an elucidating post on how to do multithreading of game AI code (posted in June 2007). Basically, the conclusion is that most of the CPU time in an AI system is spent doing collision detection, path finding, and animation. This focus of time in a few domain-given hot spots turns the problem of parallelizing the AI into one of parallelizing some core supporting algorithms, rather than trying to parallelize the actual decision making itself. The key to achieving this is to make the decision-making part able to work asynchronously with the other algorithms, which is not trivial but still much easier than threading the decision making itself. The threading of the most time-consuming parts turns into classic algorithm parallelization, which is more familiar and easier to do than threading general-purpose large code bases. A good read, basically, that taught me some more about parallelization in the games world.

Wayne Wolf on “The Good News and the Bad News” of Embedded Multiprocessing

In a column called The Good News and the Bad News in IEEE Computer magazine (November 2007 issue), Prof. Wayne Wolf at Georgia Tech (and a regular columnist on embedded systems for Computer magazine) talks about the impact of multiprocessing systems (multicore, multichip) on embedded systems. In general, his tone is much more optimistic and upbeat than most pundits.

Continue reading “Wayne Wolf on “The Good News and the Bad News” of Embedded Multiprocessing”

Mark Nelson’s Multicore Non-Panic and Embedded Systems

Via I just found an interesting article from last Summer, about the actual non-imminence of the end of the computing world as we know it due to multicore. Written by Mark Nelson, the article makes some relevant and mostly correct claims, as long as we keep to the desktop land that he knows best. So here is a look at these claims in the context of embedded systems.
Continue reading “Mark Nelson’s Multicore Non-Panic and Embedded Systems”

The Register reporting from SC’07

The Register has a pretty good report from the Supercomputing (SC) 2007 conference.  Quite knowledgeable, and mostly about the thorny issue of programming massively parallel fairly homogeneous machines likes GPUs and floating-point accelerators. Of course, their commentary has to be commented on. Read on for more.

Continue reading “The Register reporting from SC’07”

Virtualization and Linux on a DSP Processor

A small tidbit that I found interesting due to the targeted platform. LinuxDevices reports that the VirtualLogix VLX-NI virtualization layer that used to run only on x86 platforms now also run on TI DSPs in the C64+ series. Basically, you put their virtualization layer on the DSP, and you can then on the same core run both a Linux kernel and a DSP/BIOS kernel. Thus supporting traditional DSP development and Linux-style development on the same core.

Continue reading “Virtualization and Linux on a DSP Processor”

Valve Source Engine Multicore Port (presentation comment)

Game engine development company Valve has a nice presentation up on their website about how they attacked multithreading. It is a nice example of how to solve multicore programming for a particular domain by the classic layered approach.

Continue reading “Valve Source Engine Multicore Port (presentation comment)”

ARM Cortex-A9, Trango, and Virtualization for Migration

The new version of Trango’s embedded “secure virtualizer” for the ARM Cortex-A9 MPCore is an interesting solution in that it directly applies virtualization technology to the issue of migrating solutions (complete software stacks) from single-core to multicore. The details are a bit sketchy in just how this is done, there is some hardware support in recent ARM architectures, but a little bit of adaptation of a guest OS using paravirtual techniques are likely not a blocker. It also touches on security, implemented using ARM’s trustzone technology. All in all, I think this is a typical example of something that we are going to see much more of.

Power.Org Dev Con: C Domination a Problem for Multicore

I just read a EETimes report from a panel at the Developers Conference (actually, it is more accurately called the Power Architecture Developers Conference, of PADC), about programming multicore processors for the embedded market. Note that I was not there in person, so I can only take the few quotes in the article and comment on them. The main conclusions are that:

  • C/C++ is going to be the dominant language for embedded for the near future. Nothing really surprising at that.
  • C/C++ being dominant means that parallelism in multicore processors, especially shared-memory systems, will be harder to exploit. That is certainly true.
  • Tool vendors have no good idea about what to do next.
  • You cannot expect to get traction with a new language.

In a sense, blaming the market for not having the good sense to adapt new tools to tackle multicore.

I don’t think things have to be that bleak.

Continue reading “Power.Org Dev Con: C Domination a Problem for Multicore”

Comment on Joel Spolsky and Programming to “Moores Law”

Joel Spolsky is always worth a read, and in his post Strategy Letter VI he has a lot of smart things to say about how to consider programming. His basic message is that if you optimize your code too much to work well and fit in the memory of a current machine, by the time that you are done, you find yourself run over by competitors that just assumed machines would be faster and used the same programming time to implement cooler products.

I just have to take issue with this.

Continue reading “Comment on Joel Spolsky and Programming to “Moores Law””