StackOverflow interviews CouchDB

couchdbLast year, FLOSS Weekly interviewed Jan Lehnard of the CouchDB project. I put up a blog post on this, noting that it was interesting with a scalable parallel program written in Erlang, a true concurrent language. The interview was interesting,  but not very deeply technical. Now, almost a year later, the StackOverflow podcast, number 59, interviewed the founder of the project, Damien Katz. This interview goes a bit more into the technical details and what CouchDB is good for and what not, as well as some details on the use and performance of Erlang.

Continue reading “StackOverflow interviews CouchDB”

Parallelism in Action

Shrinking cores

Last year in a blog post on video encoding for the iPod Nano, I complained about the lack of performance on my old Athlon. A bit later, I noted that (obviously) video encoding is a good example of an application that can take advantage of parallelism. Yesterday I put these two topics together in a practical test. And it worked nicely enough.

Continue reading “Parallelism in Action”

EETimes.com – Multicore CPUs face slow road in comms

eetimes logoThe  EETimes article Multicore CPUs face slow road in comms piqued my interest. There is an interesting chart in there about just how slow more-than-one-core processors will be in penetrating a vaguely defined “comms” market place. I can believe that, but I think their comments on the PowerQUICC series require some commentary…

Continue reading “EETimes.com – Multicore CPUs face slow road in comms”

Enea and Freescale Article on SMP OS

Elektronik i Norden just published a technical insight article about the SMP kernels of Enea OSE and Linux, by Patrik Strömblad and Jonas Svennebring.

Continue reading “Enea and Freescale Article on SMP OS”

Adding to Schirrmeister’s Virtual Platform Myth Busting

opinionFrank Schirrmeister of Synopsys recently published a blog post called “Busting Virtual Platform Myths – Part 1: “Virtual Platforms are for application software only”. In it, he is refuting a claim by Eve that virtual platforms are for application-level software-development only, basically claiming that they are mostly for driver and OS development and citing some Synopsys-Virtio Innovator examples of such uses. In his view, most appication-software is being developed using host-compiled techniques.  I want to add to this refutal by adding that application-software is surely a very important — and large — use case for virtual platforms.

Continue reading “Adding to Schirrmeister’s Virtual Platform Myth Busting”

IBM z10 Heavy-Duty Virtual Platform

ibm_z10Unknown to most, IBM has one of the world’s longest records of using virtual platforms for software and firmware development and verification. This project has been ongoing since at least the days of the zSeries 900 machines, through z990, z9, and now z10. An excellent article on this virtual platform and its uses is found in the IBM Journal of Research and Development, number 1, 2009, . It is called “IBM System z10 Firmware Simulation”, by Körner et al.

Continue reading “IBM z10 Heavy-Duty Virtual Platform”

Three Cores make a Crowd — or a Problem

mpc8640d_ppA common question from simulation users to us simulation providers is “can I simulate a machine with N cores”, where N is “large”. As if running lots of cores was a simulation system or even a hardware problem. In almost all cases, the problem is with software. Creating an arbitrary configuration in a virtual platform is easy. Creating a software stack for that arbitrary platform is a lot harder, since an SMP software stack needs to understand about the cores and how they communicate.

Essentially, what you need is a hardware design that has addressing room for lots of cores, and a software stack that is capable of using lots of cores — even if such configurations do not exist in hardware. Unfortunately, since software is normally written to run on real existing machines, there tends to be unexpected limitations even where scalability should be feasible “in principle”.

Here is the story of how I convinced Linux to handle more than two cores in a virtual MPC8641D machine.

Continue reading “Three Cores make a Crowd — or a Problem”

Tying a Thread to a Processor in Linux

This is a small Linux SMP programming tip, which I had a hard time finding documented clearly anywhere on the web. I guess people won’t find it here either, but with some luck some search engine will pick up on this.

Continue reading “Tying a Thread to a Processor in Linux”

Hardware-Software Race Condition in Interrupt Controller

raceconditionThe best way to learn something is to try, fail, and then try again. That is how I just learned the basics of multiprocessor interrupt management. For an educational setup, I have been creating a purely virtual virtual platform from scratch. This setup contains a large number of processors with local memory, and then a global shared memory, as well as a means for the processors to interrupt each other in order to notify about the presence of a message or synchronize in general. Getting this really right turned out to be not so easy.

Continue reading “Hardware-Software Race Condition in Interrupt Controller”

Floss Weekly on OpenMPI

flossweeklyFLOSS Weekly recently podcast an interview with Jeff Squyres of OpenMPI. OpenMPI is an open-source implementation of the MPI programming standard. Jeff makes some interesting points on how this has worked out and why it makes, and what MPI is all about. ´

Continue reading “Floss Weekly on OpenMPI”

“Multicore Debug” Made Top Ten Embedded.com for 2008

embeddedcom-logoEmbedded.com just listed the ten most visited articles on their website during 2008, and my contribution on debugging multiprocessor code was number ten. If you want some more meat around multiprocessor debug, please peruse the various papers and presentations found on my personal website.

Threading or Not as a Hardware Modeling Paradigm

gears-modelingTraditional hardware design languages like Verilog were designed to model naturally concurrent behavior, and they naturally leaned on a concept of threads to express this. This idea of independent threads was brought over into the design of SystemC, where it was manifested as cooperative multitasking using a user-level threading package. While threads might at first glance look “natural” as a modeling paradigm for hardware simulations, it is really not a good choice for high-performance simulation.

In practice, threading as a paradigm for software models of hardware circuits connected to a programmable processor brings more problems than it provides benefits in terms of “natural” modeling.

Continue reading “Threading or Not as a Hardware Modeling Paradigm”

“Nulticore Effect”

Jack Ganssle wrote a column about the failure of multicore to scale, based on an article in IEEE Spectrum. He makes the following claim:

Now a study in IEEE Spectrum shows that even for the classic embarrassingly parallel problems like weather simulations multicore offers little benefit. The curve in that article is priceless. As the number of cores grow from two to 64 performance plummets by a factor of five. Additional processors nullify each other.

Call it the Nulticore Effect.

Continue reading ““Nulticore Effect””

Article in Elektronik i Norden: Virtual Platforms

I have an article appearing in the latest issue of Elektronik i Norden, about using virtual platforms for multicore computer systems. It is framed in the context of the Freescale multicore push, in particular the QorIQ P4080, and addresses the common issues of debug, execution speed, and the need to zoom in on details every once in a while.

A Few Parallel EDA Tools

I keep looking out for interesting examples of parallel  software, and there is constant trickle of these. This past week I spotted a couple of new ones in the EDA field: SPICE simulation and chip timing analysis.

Continue reading “A Few Parallel EDA Tools”

SiCS Multicore Days: The Debate Points

It is a week ago now, and sometimes it is good to let impressions sink in and get processed a bit before writing about an event like the SiCS Multicore Days. Overall, the event was serious fun, and I found the speakers very insightful and the panel discussion and audience questions added even more information.

Continue reading “SiCS Multicore Days: The Debate Points”

What is Efficiency when Cores are Free?

More from the SiCS multicore days 2008.

There were some interesting comments on how to define efficiency in a world of plentiful cores. The theme from my previous blog post called “Real-Time Control when Cores Become Free” came up several times during the talks, panels, and discussions. It seems that this year, everybody agreed that we are heading to 100s or 1000s of “self-respecting” cores on a single chip, and that with that kind of core count, it is not too important to keep them all busy at all times at any cost. As I stated earlier, cores and instructions are now free, while other aspects are limiting, turning the classic optimization imperatives of computing on its head. Operating systems will become more about space-sharing than time-sharing, and it might make sense to dedicate processing cores to the sole job of impersonating peripheral units or doing polling work. Operating systems can also be simplified when the job of time-sharing is taken away, even if communications and resource management might well bring in some new interesting issues.

So, what is efficiency in this kind of environment?

Continue reading “What is Efficiency when Cores are Free?”

The JVM as Universal Parallel Glue?

The two days of the SiCS Multicore Days is now over, and it was a really fun event this year too. I will be writing a few things inspired by the event, and here is the first.

Kunle Olukotun‘s presentation on the work of the Stanford Pervasive Parallelism lab included a diagram where they showed a range of domain-specific languages (DSL) being compiled to a universal implementation language. That language is currently Scala, and in the end all applications end up being compiled into Scala byte codes, which are then optimized and dynamically reoptimized and executed on a particular hardware system based on the properties of that system. Fundamentally, the problem of creating and compiling a DSL, and combining program segments written in different DSLs, is solved by interposing a layer of indirection.

But this idea got me thinking about what the best such intermediary might be for large-scale general deployment.

Continue reading “The JVM as Universal Parallel Glue?”

Google Chrome and Parallel Browsing

Everybody seems to think the launch of the Google Chrome browser is very important and cool. Probably because Google itself is considered important and cool. I am a bit more skeptical about the whole Google thing, they seem to building themselves into a pretty dangerous monopoly company… but there are some interesting architectural and parallel computing aspects to Chrome — and Internet Explorer 8, it turns out.

Continue reading “Google Chrome and Parallel Browsing”

Lego Racers Boardgame — and why Old is Better in Software (mostly)

This might appear as a stretched analogy, but it struck as me as obvious when I tried playing the Lego Racers boardgame with my 3-year old this weekend. The game is ranked pretty low on Boardgamegeek, and deservedly so. The promise and premise is great: use Lego cars to race around a track and pick up new pieces to modify the powers of your car… sounds like great fun. Right? But it is not, and that’s where my analogy with the age of software comes in.

Continue reading “Lego Racers Boardgame — and why Old is Better in Software (mostly)”

Parallel Programming is Not Needed? I don’t quite agree…

This was a refreshingly different post: Too Many Cores, not Enough Brains:

More importantly, I believe the whole movement is misguided. Remember that we already know how to exploit multicore processors: with now-standard multithreading techniques. Multithreaded programming is notoriously difficult and error-prone, so the challenge is to invent techniques that will make it easier. But I just don’t see vast hordes of programmers needing to do multithreaded programming, and I don’t see large application domains where it is needed. Internet server apps are architected to scale across a CPU farm far beyond the limits of multicore. Likewise CGI rendering farms. Desktop apps don’t really need more CPU cycles: they just absorb them in lieu of performance tuning. It is mostly specialized performance-intensive domains that are truly in need of multithreading: like OS kernels and database engines and video codecs. Such code will continue to be written in C no matter what.

The argument at core is that multicore is about performance, and performance optimization is generally something that we do prematurely rather than focussing on how to solve the core problem in the best way. You have to respect Jonathan Edwards, and often this is true: programmers optimize themselves into a horrible design that is also slow.

Continue reading “Parallel Programming is Not Needed? I don’t quite agree…”

SiCS Multicore Days 2008: Talk about Threading Simics (updated)

Shrinking cores

I will give a presentation on how Simics was threaded and how we created a parallel virtual platform system at the SiCS Multicore Days 2008, which takes place in Kista, Sweden, on September 11 and 12. The schedule is now up (so I edited the post and added updated to the title), at http://www.sics.se/node/3182, and my talk is on Friday, Sept 12, at 13.00 in “track 2”. Speaker bios and abstracts are also online.

Even apart from my own humble participation, I think the event itself will be well worth attending. Last year was really good and serious fun! See my writeups from last year: part 1 and part 2 (and a short note on the Rock processor and transactional memory).

CouchDB: A Parallel Program in a Parallel Language

I just listened to another Floss Weekly show, Number 36 where they interviewed Jan Lehnard of the CouchDB project. CouchDB is very interesting, in that it is a database designed for replication, redundancy, and thus massive parallelism. It was initially written by Damien Katz on his own, but now it is an Apache Foundation project sponsored by IBM. The most interesting thing is that Damien decided in 2006 to rewrite the C++ prototype he had in Erlang, and did so in just a few months if I understood my Erlang friends right. So here we have a really good parallel program written in a true parallel language.

Continue reading “CouchDB: A Parallel Program in a Parallel Language”

Swedish Workshop on Multicore 2008: Nov 27-28: CFP!

Shrinking cores

The first Swedish Workshop on Multicore Computing (MCC) will take place in Ronneby on November 27 and 28, 2008. The call for papers is now out, and it is open until September 26. If you have something cool to present or publish about multicore computing, and happen to be here in Sweden, please do submit an abstract!

Disclosure: I am in the program committee for this event.

DNS: Hardware Accelerator Time!

In Episode 157 of Security Now,Steve Gibson and Leo Laporte discuss the recently discovered security issues with DNS. In particular, the cost of making a good fix in terms of bandwidth and computation capacity. Fundamentally, according to Steve, today’s DNS servers are running at a fairly high load, and there is no room to improve the security of DNS updates by for example sending extra UDP packets or switching to TCP/IP. As this theoretically means a doubling or tripling of the number of packets per query, I can believe that. The “real solutions” to DNS problems should lie in the adoption of a truly secured protocol like DNSSEC. As this uses public key crypto (PKC), it would add a processing load to the servers that would kill the DNS servers on the CPU side instead…

Continue reading “DNS: Hardware Accelerator Time!”

GPU Programming: a Good Pattern to Follow?

In the March/April 2008 issue of ACM Queue, there is an article on GPU Programming by Kayvon Fatahalian and Mike Houston of Stanford that I found a very interesting read. It presents and analyzes the programming model of modern GPUs, in the most coherent and understandable way that I have seen so far. The PC GPU has a model for programming parallel hardware that might be a good pattern for other areas of processing. Programmers do not have to write explicitly parallel code, the machinery and hardware takes care of ensuring parallel behavior, as long as the code follows the assumptions made in the model.

Continue reading “GPU Programming: a Good Pattern to Follow?”

What’s the Obsession with C in EDA?

In early July, Cadence announced their new “C2S” C-to-silicon compiler. This event was marked with some excitement and blogging in the EDA space (SCDSource, EDN-Wilson, CDM-Martin, to give some links for more reading). At core, I agree that what they are doing is fairly cool — taking an essentially hardware-unrelated sequential program in C and creating hardware from it. The kind of heavy technology that I have come to admire in the EDA space.

But I have to ask: why start with C?

Continue reading “What’s the Obsession with C in EDA?”

SCDSource Article: Combining Fast and Detailed Models

I have another opinion piece published over at SCDsource.com. The title, “Why virtual platforms need cycle-accurate models“, was their creation, not mine, and I think it is a little bit off the main message of the piece.The follow-up discussion is also fairly interesting.

The key thing that I want to get across is that we need virtual platforms where we can spend most of our time executing in a fast, not-very-detailed mode to get the software somewhere interesting. Once we get to the interesting spot, we can then switch to more detailed models to get detailed information about the software behavior and especially its low-level timing. Getting to that point in detailed mode is impossible since it would take too much time.

This is something that computer architecture researchers have been doing for a very long time, just look at how toolsets like SimpleScalar and Simics with the Wisconsin GEMS system use fast mode for “positioning” and more detailed execution for “measurement”. It is also what is now commercial with the Simics Freescale QorIQ P4080 Hybrid virtual platform. Tensilica also have the ability to switch mode in their toolchain.

See an upcoming post for more on how to get at the cycle-accurate models – this was just to point out that that the article is there, for symmetry with previous posts about my articles popping up in places.

Power Architecture Conference slides online

Power.org LogoThe slides from the Power Architecture Conference in München and Paris are now online (and have been for a few weeks) at the Power.org site for the event. Some interesting things there about Power Architecture in particular but also virtual platforms were an almost main theme of the show.