A month ago, I participated in a seminar at Schloss Dagstuhl in Germany, about “Discrete Algorithms on Modern and Emerging Compute Infrastructure”. Not my usual cup of tea, but it was very interesting and insightful nevertheless. I have attended a Dagstuhl seminar once before, back in 2003.
Continue reading “Schloss Dagstuhl (and a Seminar and Cerebras)”Tag: Nvidia
Embedded World 2019
The Embedded World in Nürnberg is still going strong as the best tradeshow for “Embedded” in the world. This year, I spent time doing booth duty and gave a talk in the Conference part of the event. There was an unusual high number of old friends and business acquaintances around, and it was a great experience overall with many fruitful discussions and connections for the future. However, it seems that there is always something that goes slightly awry with my travel to the show…
Continue reading “Embedded World 2019”Windows 10 Reboot Loop – CUDA & Alienware
Late last year I was trying to do some machine learning work on my brand new Alienware 15 R4 gaming laptop. I had bought the laptop in order to have something portable with sufficient performance to actually do convolutional neural network (CNN) training and inference “on the road”. The GTX 1060 in the laptop is just as powerful as my home desktop machine, and should run Tensorflow and Keras well. I had the setup working on the desktop already, and copied the code over to the laptop. When trying to run the code the first time, I got some rather strange errors that I finally figured out meant that I was missing the CUDA toolkit. I downloaded CUDA version 10, installed, and the machine rebooted into the Windows 10 automatic repair mode.
Continue reading “Windows 10 Reboot Loop – CUDA & Alienware”Thin Phone, Fat Core
When mobile phones first appeared, they were powered by very simple cores like the venerable ARM7 and later the ARM9. Low clock frequencies, zero microarchitectural sophistication, sufficient for the job. In recent years, as smartphones have come into their own as the most important computing device for most people, the processor performance of mobile phones have increased tremendously. Today, cutting-edge phones and tablets contain four or eight cores, running at clock frequencies well above 2 gigahertz. The performance race for most of the market (more about that in a moment) was mostly about pushing higher clock frequencies and more cores, even while microarchitecture was left comparatively simple. Mobile meant “fairly simple”, and IPC was nowhere near what you would get with a typical Intel processor for a laptop or desktop.
Today, that seems to be changing, as the Nvidia Denver core and Apple’s Cyclone core both go the route of a few fat cores rather than many thin cores.
Nvidia “Kal-El” Variable SMP
Nvidia recently announced that their already-known “Kal-El” quad-core ARM Cortex-A9 SoC actually contains five processor cores, not just four as a “normal” quad-core would. They call the architecture “Variable SMP”, and it is a pretty smart design. The one where you think, “I should have thought of that”, which is the best sign of something truly good.
Heterogeneous vs homogeneous systems, revisited
I got another email from my friend with the thesis that processors will become ever more homogeneous as time goes on, while I believe in a relative heterogenezation (is that a word?) of computer architecture with many special-purpose accelerators and helper processors. This argument is put forward in a previous blog post. In this round, the arguments for homogenization are from the gaming world.
Continue reading “Heterogeneous vs homogeneous systems, revisited”