In a funny coincidence, I published an article at SCDSource.com about the need for cycle-accurate models for virtual platforms on the same day that ARM announced that they were selling their cycle-accurate simulators and associated tool chain to Carbon Technology. That makes one wonder where cycle-accuracy is going, or whether it is a valid idea at all… is ARM right or am I right, or are we both right since we are talking about different things?
Let’s look at this in more detail.
A clock-cycle (CC) model in this discussion is something that attempts to provide a cycle-by-cycle depiction of the behavior of a computer system. Usually, such models are driven by a cycle-by-cycle clock, as that is the easiest way to write and structure them.
A cycle-accurate (CA) model is a CC model where the depiction is “the same” as what would happen in the real system provided they both started from the same state.
What is ARM Doing?
ARM seems to be passing on the tools and technologies they acquired when they bought Axys back in 2004. These tools are CC-oriented, and are aimed at hardware architects (and some really-low-level software work). They make it possible to evolve a target design cycle by cycle in the simulator to get a very accurate picture of the target behavior. I think this fits very well for Carbon, as they generate cycle-driven very accurate models by essentially compiling the actual RTL implementation of a piece of logic, processor, or device into something a bit faster than plain HDL simulation. Carbon models are a natural fit for the Axys tools.
Basically, it sounds as if ARM decided that manually creating CC level CA models for their latest processors for use in the Axys tools (SoC Designer) was too much work and too hard to validate. Thus, they pass the whole thing on to Carbon and seem to expect Carbon to generate CA models for use with SoC designer straight from the actual ARM implementation RTL. Carbon will have the old CC/CA models written by Axys (and later ARM), and then generate new models for new generations of ARM chips like the Cortex A9. I quote:
“The model generation flow will be optimized and validated using the RTL code, ensuring speed and accuracy. The processor models will also leverage the Carbon model application programming interface (API) to offer a direct connection to the ARM RealView(R) Debugger. Carbon-generated models of ARM IP will offer our customers the fastest, most-accurate path for firmware development and architectural exploration.” (press release)
ARM made this decision, Cornish said, because it’s become increasingly difficult and time-consuming to develop cycle-accurate models. “We recognized it would make more sense to work with a specialist like Carbon that has technology for generating models directly from RTL,” he said. (SCDSource News Piece on the deal)
Feasibility of Construction
The core argument here is really how easy or feasible it is to build CA models of a processor core (or any other really complex piece of logic). There are several interesting views to consider.
- The ARM statement is basically saying that building CA models of a processor core is very hard. It is hard to get right, hard to validate, and hard to maintain. So why even try? Better to generate it from the RTL and let experts at doing that do the work.
- In my PhD thesis from 2002, I concluded that building an accurate model of a processor from public information and reverse engineering is very very difficult, and cited a number of computer architecture and real-time systems attempts to build models that all turned out to have accuracy issues. I did not know much about EDA then — and ESL did not really exist. But I think that still holds water: constructing a model of a processor is hard.
- In the SCDSource article, I make the statement that “Building cycle-accurate (CA) models is very difficult, as you need to understand and describe the implementation details of complicated hardware units. … It is quite easy to end up with something that is essentially an alternative implementation to the actual chip RTL. It is especially difficult for third parties, as it requires access to the device and processor core designers to explain the design.” Which is essentially saying that you need to get inside the processor design group to get the information.
- The common knowledge that all great processor design teams, from the DEC Alpha to Intel x86s to AMD Opterons to IBM Power to Freescale Power to Infineon TriCore to Sun Niagara use internal cycle-detailed simulators as their main design tools to prototype and decide how to design pipelines, memory systems, and system platforms. In this case, the simulator comes before the processor, not the other way around.
- Tensilica has, as Grant Martin points out in comments at SCDSource, tools that generate both the processor and an accurate model at the same time from the same information base.
- CoWare’s LisaTek tools for describing and generating application-specific processors also claim to generate accurate models from the LISA source files in a way similar to Tensilica but based on a user describing a completely custom design in a third-party tool. In the case of Tensilica, the tool and the design come from the same company.
So where does this leave us? It makes it clear that in order to build a good cycle-accurate model you need access to internal information and the processor design/processor design team. The CA model can be built either:
- By synthesizing from the RTL, Carbon-style.
- By synthesizing from some more abstract design description, Tensilica or LisaTek-style.
- By the design team as part of the design process.
- By some poor guy working after the fact from specs and test cases.
I think the ARM-Carbon deal (and all practical experience as well) invalidates the fourth variant. Essentially, that is what Axys had to do: build models after the fact, separate from the CPU design flow. This is a property of how ARM design processors and the fact that Axys began life outside of ARM (my guess, nota bene). It is what computer architecture researchers often want to do but fall down on over and over again. In fact, a common question from computer architecture newbies is if Virtutech Simics has correct models of processors like the Intel Pentium4 or Core 2 available to use as starting points in research. It would be nice, but sorry, we do not.
But the other three variants do make sense, and will all result in some kind of decent model. Which one you end up doing depends on the style of your design and quite likely the complexity of the processor and system design. In the end, any truly revolutionary design (think Sun Rock, for example) will need to write a custom simulator as tools will not have the concepts in them to model all ideas. It seems that simple “standard” designs that fit in the categories of “custom RISC” or “custom DSP” and that do not break new ground in computer architecture can probably be designed using tools that allow processor and simulator generation. I think that most heavy-duty general-purpose processor cores will have to do either the design-model or RTL-generation path, while more accelerator-style cores can use the tools approach.
As a final note, there could really be two different problems being addressed here regarding “cycle accuracy”, and that this might contribute to different levels of feasibility:
- Using the simulator to validate and optimize software performance can tolerate some errors in details as long as errors do not accumulate (see for example the “timing anomalies” or “unbounded long timing effects” found in WCET research). It is about understanding the software behavior versus the processor design (or complex accelerator design versus input data), in small focused spots of execution.
- Using the simulator to validate a chip design including buses and other devices that can be bus masters. This ought to require a higher level of accuracy, as the penalty for errors would potentially seem greater. And this is also where ARM’s SoC designer fit in, rather than as a tool to understand the software behavior. The scope here is larger and there is usually no idea of zooming in on detail at particular points in time.
So where does this land us?
I guess that CC/CA models can be built if you have a nice inside track to the design team, and that the only sensible way to use them is as a zoom device for the places in your code where you absolutely need the details. Most of the time (say 90-95-99%) software does not need CC models, but rather something that is functionally accurate and that runs really really fast so that all software can at least be executed. That is something a CC model will never be able to do, at least not for systems using non-trivial operating systems requiring a few billion instructions just to boot…