SICS Multicore Day August 31 – Observations from Uppsala

The SICS Multicore Day August 31 was a really great event! We had some fantastic speakers presenting the latest industry research view on multicores and how to program them. Marc Tremblay did the first presentation in Europe of Sun’s upcoming Rock processor. Tim Mattson from Intel tried hard to provoke the crowd, and Vijay Saraswat of IBM presented their X10 language. Erik Hagersten from Uppsala University provided a short scene-setting talk about how multicore is becoming the norm.

The Rock is a very interesting piece of work. It tries to be both a throughput-oriented design like the Niagara/Ultrasparc T machines, and a single-thread high-performance design. Even though on balance, it is more skewed towards the throughput computing aspect. What is very cool is how they use additional threads to help boost the performance of a main thread using “scout threads” (a concept I saw presented back at ISCA 2004). This makes it possible to use threads to either boost single-thread performance OR do throughput, creating a more flexible design than is usually the case. It is also the first commercial implementation of transactional memory. And 16-way. And due for next year.

So far, Rock seems like a very successful and very visionary project that is trying in yet another way to gain momentum by pure hardware innovation. Just like the UltraSparc T line, Sun is trying to out-invent IBM and Intel/AMD. Who seem to be mostly progressing by just piling on more of the same old features. I really hope this play goes well, if we were down to just IBM/PPC & System Z and Intel-AMD/x86-64 on the server and desktop side, the world would just be too boring.

The Intel and IBM talks on programming were both grounded in the idea that to make people accept a new programming language/API, it has to be an evolution of what the programmers already know. Which pretty much ties us down to C/C++/Java/C# with extensions and modified semantics.

X10 is basically Java with some nicely considered features to support local and global memories and programs that can scale to BlueGene-style massively clustered machines. Tim basically tells everyone to stop inventing new languages and focus on improving existing frameworks like MPI and OpenMP in collaboration with industry. Presented in a very funny style, Tim is a great presenter, and tries hard to get the audience to react. In this crowd, most people agreed. Except the Erlang people, who feel that they do have a better solution to multithreading and multicore than any patched-up language in the C-Java family. I must agree with them, and I do feel that Erlang today is mature enough to serve that purpose.

The panel session at the end was very entertaining, where some people (including myself and Joe Armstrong) tried to ask tough questions to the keynote speakers (and Ulf Wiger of Ericsson). Quite engaging and a rare chance to directly engage with some industry heavyweights who otherwise tend to sit on the other side of the Atlantic.

I think the prize for coolest tech of the day goes to QuviQ, a spin-off from Chalmers doing automated testing tools that really work well for parallel and distributed systems. Their method of minimizing the trace of a failed test case is really interesting, and finds things that no human tester would ever find.

I also presented a talk on “Debugging Multicore Software using Virtual Hardware”, in the breakout sessions. I guess our Tools track was the least visited of the three tracks, but the audience asked some good questions. And there were some good discussions afterwards.

However, to summarize the day, I am a bit disappointed that not more is being done on the hardware side to help people debug their multicore and multiprocessor parallel programs. Transactional memory is all nice and dandy and can help simplify low-level locking primitives for threaded programs. But I would like to see much more in terms of smart tracing, hardware breakpoints and triggers, massive synchronized stops, and similar features. And instructions and features that make parallel expressions simpler. Here, the embedded folks doing things like ARM CoreSight seems to have been much more successful than the server-class designers at Sun, Intel, and IBM. But even ARM do not spend more than 10-15% of the chip area on debug support.

I think it would be interesting to see what would happen if you could spend 25-30% of the chip on some seriously powerful debug features. Full support for remote control of all cores at the same time, lots of bandwidth for debug data and commands, and fat traces of all traffic on and off the chip. Performance and event counters everywhere. That would make the peak performance of chip likely less than a competing chip not spending as much space on debug support — but it would make achieving a high utilization much easier, and that might actually make the debug-intense chip more economical. Would be interesting to try. But I guess nobody would dare to buy such a design.

10 thoughts on “SICS Multicore Day August 31”

Also see my later follow-up post with more on programming parallel machines: http://jakob.engbloms.se/archives/20

Aloha!

En sak man behÃ¶ver fundera *ordentligt* pÃ¥ Ã¤r hur du ser till att skydda dina debugfunktioner sÃ¥ att dom inte blir en fin dÃ¶rr in fÃ¶r den som fÃ¶rsÃ¶ker plocka ut information, styra systemet pÃ¥ ett icke-applikations-planerat sÃ¤tt etc.

Det finns flera attacker mot ex kryptoimplementationer i chip dÃ¤r nyckeln extraherats via scan-kedjor Ã¶ver JTAG.

Att det behÃ¶vs bra stÃ¶d fÃ¶r debug och inte minst fÃ¶r profilering nÃ¤r SoC Ã¶kar i komplexitet Ã¤r dock helt klart. Hur skall man ex kunna utveckla applikationer som utnyttjar ens delar av den tillgÃ¤ngliga prestandan i ett multicore-chip om det inte gÃ¥r att fÃ¥ reda pÃ¥ hur applikationsmÃ¶nster ger upphov till cache-trashing, accesskrockar etc?

Mycket bra poÃ¤ng. Den “lÃ¤tta” lÃ¶sningen Ã¤r varianten dÃ¤r det finns ett vanligt chip som anvÃ¤nds i produkter, och sedan ett annat chip med antingen extra pinnar eller till och med extra kisel pÃ¥ som anvÃ¤nds under utvecklingen. SÃ¥ ser ICE-systemen fÃ¶r Tricore och V850 ut, t.ex. Men det tar bort poÃ¤ngen med att debugga i det system man faktiskt skeppar. Lurigt.

Pingback: Observations from Uppsala » Blog Archive » FTF Paris: Debug connections threat to secure network devices

Note that in http://jakob.engbloms.se/archives/87 I have another panel summary, from DATE 2008. Quite different ideas on the same topic, from a very different viewpoint.

Pingback: Observations from Uppsala » Blog Archive » DATE 2008 Panel on Multicore Programming

Pingback: Observations from Uppsala » Blog Archive » SiCS Multicore Days 2008: Talk about Threading Simics

Pingback: Observations from Uppsala » Blog Archive » The JVM as Universal Parallel Glue?

Pingback: Observations from Uppsala » Blog Archive » What is Efficiency when Cores are Free?

Pingback: Observations from Uppsala » SiCS Multicore Day 2012

Jakob says:

2007 September 5 at 21:17

Also see my later follow-up post with more on programming parallel machines: http://jakob.engbloms.se/archives/20
JoachimS says:

2007 September 25 at 10:44

Aloha!

En sak man behÃ¶ver fundera *ordentligt* pÃ¥ Ã¤r hur du ser till att skydda dina debugfunktioner sÃ¥ att dom inte blir en fin dÃ¶rr in fÃ¶r den som fÃ¶rsÃ¶ker plocka ut information, styra systemet pÃ¥ ett icke-applikations-planerat sÃ¤tt etc.

Det finns flera attacker mot ex kryptoimplementationer i chip dÃ¤r nyckeln extraherats via scan-kedjor Ã¶ver JTAG.

Att det behÃ¶vs bra stÃ¶d fÃ¶r debug och inte minst fÃ¶r profilering nÃ¤r SoC Ã¶kar i komplexitet Ã¤r dock helt klart. Hur skall man ex kunna utveckla applikationer som utnyttjar ens delar av den tillgÃ¤ngliga prestandan i ett multicore-chip om det inte gÃ¥r att fÃ¥ reda pÃ¥ hur applikationsmÃ¶nster ger upphov till cache-trashing, accesskrockar etc?
Jakob says:

2007 September 25 at 10:48

Mycket bra poÃ¤ng. Den “lÃ¤tta” lÃ¶sningen Ã¤r varianten dÃ¤r det finns ett vanligt chip som anvÃ¤nds i produkter, och sedan ett annat chip med antingen extra pinnar eller till och med extra kisel pÃ¥ som anvÃ¤nds under utvecklingen. SÃ¥ ser ICE-systemen fÃ¶r Tricore och V850 ut, t.ex. Men det tar bort poÃ¤ngen med att debugga i det system man faktiskt skeppar. Lurigt.
Pingback: Observations from Uppsala » Blog Archive » FTF Paris: Debug connections threat to secure network devices
Jakob says:

2008 March 16 at 22:58

Note that in http://jakob.engbloms.se/archives/87 I have another panel summary, from DATE 2008. Quite different ideas on the same topic, from a very different viewpoint.
Pingback: Observations from Uppsala » Blog Archive » DATE 2008 Panel on Multicore Programming
Pingback: Observations from Uppsala » Blog Archive » SiCS Multicore Days 2008: Talk about Threading Simics
Pingback: Observations from Uppsala » Blog Archive » The JVM as Universal Parallel Glue?
Pingback: Observations from Uppsala » Blog Archive » What is Efficiency when Cores are Free?
Pingback: Observations from Uppsala » SiCS Multicore Day 2012

This site uses Akismet to reduce spam. Learn how your comment data is processed.

10 thoughts on “SICS Multicore Day August 31”

Leave a Reply