“Processor Performance Insights and Optimization” – Computer and System Architecture Unraveled Event Four

Finally, the fourth CaSA, Computer and System Architecture Unraveled, meetup happened on November 6. It took far too long to get it organized, but we finally did it. The theme was about processor performance analysis and efficient processor implementation, offering two talks from very different perspectives. The location was almost the same as before, on the 19th floor of the Kista Science Tower building. Once more thanks to the sponsorship from Vasakronan and Kista Science City.

Continue reading ““Processor Performance Insights and Optimization” – Computer and System Architecture Unraveled Event Four”

Hardware debug and measurement in the IBM POWER8

ibm power doodle

I have read some recent IBM articles about the POWER8 processor and its hardware debug and trace facilities. They are very impressive, and quite interesting to compare to what is usually found in the embedded world. Instead of being designed to help with software debug, it seems the hardware mechanisms in the Power8 are rather focused on silicon bringup and performance analysis and verification in IBM’s own labs. As well as supporting virtual machines and JIT-based systems!

Continue reading “Hardware debug and measurement in the IBM POWER8”

The JVM as Universal Parallel Glue?

The two days of the SiCS Multicore Days is now over, and it was a really fun event this year too. I will be writing a few things inspired by the event, and here is the first.

Kunle Olukotun‘s presentation on the work of the Stanford Pervasive Parallelism lab included a diagram where they showed a range of domain-specific languages (DSL) being compiled to a universal implementation language. That language is currently Scala, and in the end all applications end up being compiled into Scala byte codes, which are then optimized and dynamically reoptimized and executed on a particular hardware system based on the properties of that system. Fundamentally, the problem of creating and compiling a DSL, and combining program segments written in different DSLs, is solved by interposing a layer of indirection.

But this idea got me thinking about what the best such intermediary might be for large-scale general deployment.

Continue reading “The JVM as Universal Parallel Glue?”