I recently stumbled on a blog post called Building Faster AMD64 Memset Routines, written by Joe Bialek of the Microsoft Security Response Center (MSRC). The blog describes his efforts to improve the performance of the Windows kernel memset() function, across all sizes of memory to set. The reported optimizations are quite fascinating, and could be summed by avoiding branches even at the cost of doing redundant stores. Basically, stores are free while branches are expensive.Continue reading “Microsoft Windows memset Optimization – Stores are Free”
DRAMsys is a simulator for modern RAM systems, built by researchers at Fraunhofer IESE and the Technische Universität Kaiserslautern. Over the past few years, I have heard several talks about the tool and also had the luck to talk a bit to the team behind it. It is an interesting piece of simulation technology, in particular for how it manages to build a truly cycle-accurate model on top of the approximately-timed (AT) style defined SystemC TLM-2.0.Continue reading “DRAMsys – Cycle-Accurate Simulation using Transactions”
I have been working with computer simulation and computer architecture for more than 20 years, and one thing that has been remarkably stable over time is the simulation slowdown inherent in “cycle accurate” computer simulation. Regardless of who I talked to or what they were modeling, the simulators ran at around 100 thousand times slower than the machine being modeled. It even holds true going back to the 1960s! However, there is a variant of simulation that aims to make useful performance predictions while running around 10x faster (or more) – mechanistic models (in particular, the Sniper simulator).Continue reading “Simulating Computer Architecture with “Mechanistic” Models – No more 100k Slowdown?”
I once wrote a blog post about the use of computer architecture pipeline simulation in the IBM ”Stretch” project, which seems to be the first use of computer architecture simulation to design a processor. After the ”Stretch” machine, IBM released the S/360 family in 1964. Then, the Control Data Corporation showed up with their CDC 6600 supercomputer, and IBM started a number of projects to design a competitive high-end computer for the high-performance computing market. One of them, Project Y, became the IBM Advanced Computing Systems project (ACS). In the ACS project, simulation was used to document, evaluate, and validate the very aggressive design. There are some nuggets about the simulator strewn across historical articles about the ACS, as well as an actual technical report from 1966 that I found online describing the simulation technology! Thus, it is possible to take a bit of a deeper look at computer architecture simulation from the mid-1960s.
I had many interesting conversations at the HiPEAC 2017 conference in Stockholm back in January 2017. One topic that came up several times was the GEM5 research simulator, and some cool tricks implemented in it in order to speed up the execution of computer architecture experiments. Later, I located some research papers explaining the “full speed ahead” technology in more detail. The mix of fast simulation using virtualization and clever tricks with cache warming is worth a blog post.
IEEE Micro published an article called “Architectural Simulators Considered Harmful”, by Nowatski et al, in the November-December 2015 issue. It is a harsh critique of how computer architecture research is performed today, and its uninformed overreliance on architectural simulators. I have to say I mostly agree with what they say. The article follows in a good tradition of articles from the University of Wisconsin-Madison of critiquing how computer architecture research is performed, and I definitely applaud this type of critique.
…the discussion, and the need to constantly define our terms (and redefine them, and discuss them when people disagree) makes me wish that the world of electronics, system and software design had some agreement on what the right terms are and what they mean…
I think this is a good idea, but we need to keep the core count out of it…
Sun just bought Montalvo whose hardware I blogged about some while ago. And just like the Apple acquisition of PA Semi, the question of “why” appears. Some analysts blame the simple fact that both Montalvo and PA Semi simply needed to be acquired, since their venture capitalists did not want to put in the next 100 million USD needed to go to silicon (Montalvo) or really expand on the opportunity already at hand (PA Semi). Here is my crazy guess.
There was an interesting little note at the CodeMonkey blog… basically, the Linux kvm kernel hardware virtualization support system now works on IBM z series mainframes. Using the z architecture virtualization support in hardware. Nice to see some attention being put on non-x86 architectures. And a nice historical note that current x86 virtualization extensions were indeed inspired by the s/370 architecture from the mid-1970s. Cool.
The Multicore Expo US 2008 is taking place next week (April 1-3) in Santa Clara, CA. I was originally slated to talk there, but since I am going to the Embedded Systems Conference a few weeks later it was too much travel in too short a time frame to do. I happy that Ross Dickson, a senior technology specialist at Virtutech could take my place. He will do just as good a job as I would, and he also has his own session to present at the Expo.
Our talk will be on how approximate you can be in simulating multicore computers, and still get useful results out from the software running on the simulator. It is something that we at Virtutech have spent a lot of time working on, and we want to bring our results to a wider community. Really exciting to present, and it is a pity that I could not be there myself.
I attended a DATE 2008 open exhibition panel discussion on multicore programming, organized by Gary Smith EDA. The panel was a few people short, and ended up with just Simon Davidmann of Imperas, Grant Martin of Tensilica, and Rudy Lauwereins of IMEC. A user representative from Ericsson was supposed to have been there but he never arrived. Overall, the panel was geared towards data-plane processing-type thinking, and a bit short on internal dissonance.
In a paper from USENIX 2007 by Microsoft Researchers Onur Mutlu and Thomas Moscibroda present a working “denial of service” attack for multicore processors. The idea is simple: since there is no fairness or security designed into current DRAM controllers, it is quite feasible for one program in a multicore system to hog almost all memory bandwidth and thus reduce or deny service to the others. There is no direct attack on software programs, just stealing the resources that they all need to share for all to work.
Continue reading “Multicore Denial-of-Service Attack”
CNET (of all places) have a short article on what Montalvo Systems are up to: Secret recipe inside Intel’s latest competitor | CNET News.com. The article is a bit short on details, but it sounds like it is finally an example of a same-ISA, different-powered-cores heterogeneous multicore device in the mainstream. The idea has a lot of merit, and it will be very interesting to see the final results once silicon ships. I really believe is heterogeneous designs.
To be critical, trying to compete with Intel might not be the best idea around… but it never hurts to try. Also, the name is not unique, there is already a montalvo.com that is not montalvosystems.com. I think the old name “Memorylogix” was more interesting and less prone to website name collisions (yes, it seems to be the same company that briefly surfaced with some stripped-down x86 processor back in 2002 — I have an MPR article to prove it).
There is a nice blog post over at Neatorama with many pictures of early computers. The material is nothing new to someone familiar with computing history, but the pictures collected are very nice indeed.
In a column called The Good News and the Bad News in IEEE Computer magazine (November 2007 issue), Prof. Wayne Wolf at Georgia Tech (and a regular columnist on embedded systems for Computer magazine) talks about the impact of multiprocessing systems (multicore, multichip) on embedded systems. In general, his tone is much more optimistic and upbeat than most pundits.
The Register has a pretty good report from the Supercomputing (SC) 2007 conference. Quite knowledgeable, and mostly about the thorny issue of programming massively parallel fairly homogeneous machines likes GPUs and floating-point accelerators. Of course, their commentary has to be commented on. Read on for more.
The TimeSys Embedded Linux Podcast (also called LinuxLink Radio) is a nice listen about embedded computing using Linux. Sometimes they are a bit too open-source centric, though, and ignore very good tools that live in the classic commercial world. One such example is the recent episode 20 on debugging tools, where they totally ignore modern high-powered hardware-based debugging.
In a report from FTF Paris 2007, Info World makes some interesting comments on security and locking-down of mobile devices. Info World Â» Blog Archive Â» â€˜Flat IPâ€™ mobile networks face new security challenges:
The new version of Trango’s embedded “secure virtualizer” for the ARM Cortex-A9 MPCore is an interesting solution in that it directly applies virtualization technology to the issue of migrating solutions (complete software stacks) from single-core to multicore. The details are a bit sketchy in just how this is done, there is some hardware support in recent ARM architectures, but a little bit of adaptation of a guest OS using paravirtual techniques are likely not a blocker. It also touches on security, implemented using ARM’s trustzone technology. All in all, I think this is a typical example of something that we are going to see much more of.
I just read a EETimes report from a panel at the Power.org Developers Conference (actually, it is more accurately called the Power Architecture Developers Conference, of PADC), about programming multicore processors for the embedded market. Note that I was not there in person, so I can only take the few quotes in the article and comment on them. The main conclusions are that:
- C/C++ is going to be the dominant language for embedded for the near future. Nothing really surprising at that.
- C/C++ being dominant means that parallelism in multicore processors, especially shared-memory systems, will be harder to exploit. That is certainly true.
- Tool vendors have no good idea about what to do next.
- You cannot expect to get traction with a new language.
In a sense, blaming the market for not having the good sense to adapt new tools to tackle multicore.
I don’t think things have to be that bleak.
Joel Spolsky is always worth a read, and in his post Strategy Letter VI he has a lot of smart things to say about how to consider programming. His basic message is that if you optimize your code too much to work well and fit in the memory of a current machine, by the time that you are done, you find yourself run over by competitors that just assumed machines would be faster and used the same programming time to implement cooler products.
I just have to take issue with this.