IEEE Micro published an article called “Architectural Simulators Considered Harmful”, by Nowatski et al, in the November-December 2015 issue. It is a harsh critique of how computer architecture research is performed today, and its uninformed overreliance on architectural simulators. I have to say I mostly agree with what they say. The article follows in a good tradition of articles from the University of Wisconsin-Madison of critiquing how computer architecture research is performed, and I definitely applaud this type of critique.
The main point of the article is that there is a culture in computer architecture research where you are expected to do “cycle-accurate”* simulation of a processor in order to have a result that can be published. The reason for this is that the review process for publications typically end up with a few reviewers requesting detailed simulations to support the results. Even in cases where such detailed simulation might actually be entirely beside the point and not very informative – or even actually destructive and providing negative value.
Note for the *: “cycle accurate” is really a misnomer here, since there is nothing to be accurate against. Nowatski et al make this point very early in their article.
I absolutely recognize the problem they describe from my days in the university and working with academic conferences (it has been a while since I was on a program committee, but it does not seem things have changed much). My friends doing computer architecture did really interesting stuff where processor pipeline simulation really had nothing much to do with it – but still ended up doing detailed simulations in order to get published.
Another problem that the article brings up is when researchers just blindly use a simulator to get power, performance, and other numbers. Without understanding just when and where the tool is really applicable. They need numbers, so they find a way to get numbers. With no calibration of the simulators and no validation towards reality, or reflection if the simulation actually produces meaningful numbers at all. I have seen this rather often in embedded systems research, where tools from computer architecture are just borrowed in order to evaluate software – typically, they are used to get numbers on performance and power consumption for software constructions of different types, or software augmented with a piece of hardware accelerator. In those cases, the numbers are almost surely wrong, since the simulator really has very little relevance for the field in which it is used.
It is critical that you understand a tool that you use! People using a simulator must understand:
- What is the use case it was designed for?
- Does it capture the phenomenon you want to evaluate?
- Is your use case similar? If not, the results are likely meaningless.
- Can you calibrate the results against something physical and existing? If not, you must be careful with interpreting the numbers.
- How is it configured? Do you understand the significance of the parameters?
- Do the results add anything useful to your evaluation?
A big problem that I have seen is that academic researchers in particular and other users too are looking for easy solutions that they can apply without thought, consideration, or too much work. Unfortunately, the real world is rarely “easy” in that respect. If you want to design improved computer hardware, you need to have a thorough understanding of how current things work. You cannot just take a simulator off the shelf, tweak it a bit to model some new idea, and call the result a result.
Another important issue is the validation of simulators. In industry, simulators are used extensively to architect new processors and hardware. That approach does work and it does give us steadily better computers, so aren’t simulators thus proven good by association?
Not quite. There is a big difference between academic research and industrial development and their use of simulators. In industry, simulators usually reflect some existing piece of hardware, which allow the simulators to be calibrated. The predictions of a simulator can be compared to the final product once it comes out, providing a feedback loop to improve the quality of the simulator. In this way, each successive refinement has a firm footing in reality, and does indeed build on past results in the finest traditions of science.
In academia, there is less science to simulators, unfortunately. The popular academic computer architecture simulators like GEM5 do not correspond to any real hardware, and have not been validated in any meaningful way. Ideally, we would have a “computer architecture processor” that was fabbed and run and used, with an accompanying simulation model. In this case, the community would tweak the architecture and eventually fab a new generation… but that is an unachievable utopia.
Furthermore, simulators have bugs and things that they simply do not cover. When I was doing my PhD in the early 2000s, I did spend significant time trying to build accurate models of processors, and looking at what other people had done in the area. The results were not encouraging. Beyond very simple pipelines, it was very hard to get something really accurate – there are so many parallel and interdependent mechanisms in a processor that a small omission can blow up all the results if you happen on a piece of code that is “unbalanced”. Or if the code is simply different from code on which the simulator was tested and calibrated. Stepping outside the calibrated area of a simulation is fraught with danger, but people tend to forget this.
That’s a long problem statement. Is there a solution?
In the article, the authors push hard for what they call “first-order models” – i.e., simpler models that capture the significance of an idea without digging into details too much, capturing the main effect of an idea without bothering too much with secondary effects. With such a model, it is easier to explain the rationale and concept behind a new mechanism or tweak. By creating a model of the essence, you should end up with a simple, understandable, and powerful model that can be used to reason about the idea. If instead everything is immediately turned into code inside a processor simulation framework, you lose the high-level view, and you get lost in details. By removing irrelevant details, an idea can be evaluated on its own – without being affected by the thousands of other design decisions that are embodied in the code of an existing simulator. It actually makes it easier for people to understand the value of an idea.
This makes a ton of sense.
Simulation does have its place, but its use needs to be informed. Simulators have to be at a suitable level of abstraction. Sometimes, it is enough to run at a functional level just counting instructions. Sometimes, it makes sense to count cycles in a detailed pipeline model. Sometimes, you just want to simulate on-chip network and communications delays with stubbed nodes. It depends on what you study and what you want to measure!
Finally, the one thing I did not like about the paper was the title. I personally think that “X considered harmful” should be considered harmful. Dijkstra did a great thing, but most articles borrowing the title since then have not been at the same level. Just leave it.