The July 2020 edition of the Communications of the ACM (CACM) had a front-page theme of “Domains-Specific Hardware Accelerators”, or DSAs. It contained two articles about the subject, one about an academic genomics accelerator, and one about the Google TPU. Hardware accelerators dedicated to particular types of computation are basically everywhere today, and an accepted part of the evolution of computers. The CACM articles have some good tidbits and points about how accelerators are designed and used today. At the same time, I also found a youtube talk about the first hardware accelerator, the IBM Stretch HARVEST, showing both contrasts with today as well as a remarkable continuity in concept.Continue reading “CACM on DSAs”
Earlier in July 2019, I had the honor of presenting one of the keynote talks at the 19th SAMOS (International Conference on Embedded Computer Systems: Architectures, MOdeling, and Simulation) conference, held on the island of Samos in Greece. When I got the invite, I had no real idea what to expect. I asked around a bit and people said it was a good conference with a rather special vibe. I think that is a very good description of the conference: a special vibe. In addition to the usual papers and sessions, there is a strong focus on community and social events, fostering discussion across academic disciplines and between industry and academia. There were many really great discussions in addition to the paper and keynote presentations, and overall it was one of the most interesting conferences I have been to in recent years.Continue reading “SAMOS 2019 – Insights, Mechanisms, Heterogeneity, and more”
Integration is hard, that is well-known. For computer chip and system-on-chip design, integration has to be done pre-silicon in order to find integration issues early so that designs can be updated without expensive silicon re-spins. Such integration involves a lot of pieces and many cross-connections, and in order to do integration pre-silicon, we need a virtual platform.
Via the EETimes, I found a very interesting talk by Bristol professor David May, presented at the 4th Annual Bristol Multicore Challenge, in June of 2013. The talk can be found as a Youtube movie here, and the slides are available here. The EETimes focused on the idea to cut down ARM to be really RISC, but I think the more interesting part is Professor May’s observations on multicore computing in general, and the case for and against heterogeneity in (parallel) computers.
The 2012 edition of the SiCS Multicore Day was fun, like they have always been in the past. I missed it in 2010 and 2011, but could make it back this year. It was interesting to see that the points where keynote speakers disagreed was similar to previous years, albeit with some new twists. There was also a trend in architecture, moving crypto operations into the core processor ISA, that indicates another angle on the hardware accelerator space.
Nvidia recently announced that their already-known “Kal-El” quad-core ARM Cortex-A9 SoC actually contains five processor cores, not just four as a “normal” quad-core would. They call the architecture “Variable SMP”, and it is a pretty smart design. The one where you think, “I should have thought of that”, which is the best sign of something truly good.
Past Tuesday, I attended the Freescale Design With Freescale (DWF) one-day technology event in Kista, Stockholm. This is a small-scale version of the big Freescale Technology Forum, and featured four tracks of talks running from the morning into the afternoon. All very technical, aimed at designing engineers.
Last Friday, I attended this year’s edition of the SiCS Multicore Day. It was smaller in scale than last year, being only a single day rather than two days. The program was very high quality nevertheless, with keynote talks from Hazim Shafi of Microsoft, Richard Kaufmann of HP, and Anders Landin of Sun. Additionally, there was a mid-day three-track session with research and industry talks from the Swedish multicore community. Continue reading “SiCS Multicore Day 2009”
About two months ago, Cavium Networks launched their second generation of Octeon chips, the Octeon II. The most obvious difference to the previous generation (Octeon, Octeon Plus) is a new MIPS64 core with much better support for hypervisors and virtualization. There are some other interesting aspects to this chip, though.
Yes, when does hardware acceleration make sense in networking? Hardware acceleration in the common sense of “TCP offload”. This question was answered by a very nicely reasoned “no” in an article by Mike Odell in ACM Queue called “Network Front-End Processors, Yet Again“.
The EETimes article Multicore CPUs face slow road in comms piqued my interest. There is an interesting chart in there about just how slow more-than-one-core processors will be in penetrating a vaguely defined “comms” market place. I can believe that, but I think their comments on the PowerQUICC series require some commentary…
It is a week ago now, and sometimes it is good to let impressions sink in and get processed a bit before writing about an event like the SiCS Multicore Days. Overall, the event was serious fun, and I found the speakers very insightful and the panel discussion and audience questions added even more information.
The Radio Register has a nice interview with Kunle Olukotun, the man most known for the Afara/Sun Niagara/UltraSparc T1-2-etc. design. It is a long interview, lasting well over an hour, but it is worth a listen. A particular high point is the story on how Kunle worked on parallel processors in the mid-1990s when everyone else was still chasing single-thread performance. He really was a very early proponent of multicore, and saw it coming a bit before most other (general-purpose) computer architects did. Currently, he is working on how to program multiprocessors, at the Stanford Pervasive Parallelism Laboratory (PPL). In the interview, I see several themes that I have blogged about before being reinforced…
The Register has a few podcasts in addition to their website, and the one called “Semicoherent Computing” has turned into a very nice series of interviews with interesting people from the computer industry. I recently listened to their interview from September 2007 with David Ditzel of Transmeta fame. He had a lot to say about the history of computing, as well as interesting things on where computing is going. Well worth a listen! Particular interesting highlights…
I got another email from my friend with the thesis that processors will become ever more homogeneous as time goes on, while I believe in a relative heterogenezation (is that a word?) of computer architecture with many special-purpose accelerators and helper processors. This argument is put forward in a previous blog post. In this round, the arguments for homogenization are from the gaming world.
Most of the time when talking about the impact of multicore processing on software, we complain that it makes the software more complicated because it has to cope with the additional complexities of parallelism. There are some cases, however, when moving to multicore hardware allows a software structure to be simplified. The case of Integrated Modular Avionics (IMA) and the honestly idiotic design of the ARINC 653 standard is one such case.
Continue reading “When Multicore makes Things Simpler, like IMA”
An old colleague just sent me an email bringing up a discussion we had last year, where he was a strong proponent for the homogeneous model of a multiprocessor. The root of that discussion was the difference between the Xbox 360 and Playstation 3 processors. The Xbox 360 has a three-core, two-threads-per-core homogeneous PowerPC main processor called the Xenon (plus a graphics processor, obviously), while the PS3 has a Cell processor with a single two-threaded PowerPC core and seven SPEs, Synergistic Processing Elements (basically DSP-like SIMD machines).
In the game business, it is clear that the Xenon CPU is considered easier to code for. This means that even though the Cell processor clearly has higher theoretical raw performance, in practical the two machines are about equal in power since it is harder to make use of the Cell. Which seems to be a fact.
So here, homogeneous systems do appear to have it easier among programmers. However, I do not believe that that extends to all systems, all the time, everywhere.