Over the past few weeks there was a interesting exchange of blog posts, opinions, and ideas between Frank Schirrmeister of Synopsys and Ran Avinun of Cadence. It is about virtual platforms vs hardware emulation, and how to do low-power design “properly”. Quite an interesting exchange, and I think that Frank is a bit more right in his thinking about virtual platforms and how to use them. Read on for some comments on the exchange.
The following appears to be to sequence of events:
- Cadence press release, in September, about their “Palladium Incisive Palladium Dynamic Power Analysis and Cadence InCyte Chip Estimator“, quoting Ran:
- Frank Schirrmeister blogged “On Chameleons, Low Power and the Marketing Power of Copy Editing“, basically saying that what Cadence was selling was something that was bound to the RTL level and thus arriving with estimates pretty late in the design process, after most important architecture decisions had been made. Instead, he proposed a flow using virtual prototypes that contained a sequence of successively better estimates, from the usual initial spreadsheet to estimates actually derived from RTL later in the process (or for IP blocks that already exist). Synopsys is not alone in this, Neosera and ChipVision are after similar ideas. I think this approach makes excellent sense, following the idea that getting some kind of approximate feedback from a complete system early in the process is better than getting lots of details from a small part of a system late in the process.
- Ran Avinun then blogged a reply to Frank, at “The Power of Cadence System Power Flow vs. Viewing from the Top“. His contention there is that virtual prototypes have their uses, but that real designers will be using hardware accelerators, as that provides the key accuracy needed to do real power work. Also, he sees the creation of a virtual platform as a big problem, and cites a number of cases where running the actual semi-final RTL with power simulation was key to project success. Also, Ran sees the time needed to create a virtual platform as a big obstacle.
- Frank then replied to the reply, at “Hammers, Nails and the Spirits That I Called …“… where he points out that Ran has some misconceptions about virtual platforms, admits that the Cadence flow works well, but that it does miss the point of early power estimation before the design is too frozen to be much changed. There is a pretty but hard-to-read diagram in the post, from a 2005 article he wrote while at ChipVision in Germany, pointing out the need to evaluate designs with actual test data from the real world.
Cadence Incisive Palladium Dynamic Power Analysis enables SoC designers, architects and validation engineers to quickly estimate the power consumption of their system during the design phase, analyzing the effects of running various real software stacks and other real-world stimuli. The new offerings also include the Cadence InCyte Chip Estimator, which can now provide what-if power analysis through exploration of different low-power techniques. The InCyte Chip Estimator also generates automatically the Si2 Common Power Format (CPF), which helps drive architectural power specification and intent into implementation and verification.
What do I make of all of this?
I must admit that I think the Palladium hardware simulation accelerator boxes are very cool pieces of hardware, which at least used to be based on custom logic systems that use several cycles of a fixed sized hardware to simulate multiples of the hardware’s based emulation capacity (so 10M capacity system can use 10 cycles per target cycle to simulate 100M, for example). However, I do agree with Frank that these are dependent on having actual RTL in place to be of much use.
Another issue with hardware emulators is their overall availability: compared to the number of PCs available in an organization, they are going to be very limited. As discussed in many different forums, a key advantage of a pure virtual platform is that it can turn any programmer’s PC into a target system running the real target software. Without having to book time on a limited set of physical target machines, and hardware accelerators are such limited-in-supply hardware machines. So a virtual platform is much more available to people within, and especially outside, a design organization. Also, unless you are happy to release RTL for your design to people outside your organization, hardware acceleration is going to do little to help your end users get the most out of your design, pre-silicon.
My final gripe with hardware emulators is their limited scope. They tend to max out at a the borders of a single chip, or less. A virtual platform, on the other hand, has much more room to scale, to include multiple chips, multiple boards, or even complete racks and networks of networks. You cannot really do that in any hardware simulation, as it involves too many billions of gates running too many billions on instructions. The general rule of simulation still applies with hardware acceleration: you need to increase the level of abstraction to handle larger systems.
As to the problem faced by Ran’s customers, having RTL but no virtual platform: what were they thinking of? Seriously, if you want to do design today a virtual platform should be your starting point, not an afterthought. Time and again, we see examples today where using virtual platforms gets chips to customers ahead of time and provides the ability to test ideas before committing to final RTL. It seems that Ran agrees with this need, but his means are different:
“As was stated above, big reason our customers use RTL emulation platforms is for accuracy, and while virtual platforms can offer certain performance, eventually the need to accuracy becomes critical and can not be overlooked, even for initial performance and power estimation analysis. Frank seems to forget in his statement above that the average bring-up time of new virtual platforms takes 6-12 months while the average bring-up time of many emulated designs takes days.”
The time to create a virtual platform is actually pretty short, if you do it at a sufficently abstract level of detail and don’t worry too much about cycle accuracy. Also, that bringing up of an emulation depends on having a detailed RTL-level description to start with… which is not necessarily the case. I must say that the cited six to 12 months for a VP (for a single SoC as discussed here) sounds reasonable to me — if you are building a cycle-level model that tries to emulate the final timing (which might not be really feasible at at all). If you work at a higher level of abstraction like loosely-timed TLM, that time shrinks by a factor of ten or so. I agree that in the end, accuracy is critical – but before you get there, the approximations used by the VP will have gotten you pretty far in terms of software development and architecture testing.
Ran is also afraid of the lack of accuracy:
Now, even if you build this platform successfully 9-12 months in advance, how do you know that your virtual platform representing your real design? How do you connect it to your verification and implementation environment and realistic power information? Frank seems to overlook these things. Looking at the analogy of the story described at the blog above, using a system-level platform that is not targeting the actual hardware for performance analysis and power trade-offs guarantees that the Chamelon will become a snake and you will get bitten.
As with all simulations, virtual platforms needs to be used with care and understanding. It might also once again be a matter of system scale: for RTL simulation, you are looking inside a single chip, and the detailed design to save power there. With a VP, you might be looking at whether a particular OS kernel does even care to try to turn off unused hardware at all… and that might be just as important in the end as being accurate in how functional units turn on and off inside an accelerator.
In today’s software-driven systems that mostly consist of existing off-the-shelf hardware, not any particular SoC that is being designed right now, the large-scale behavior and smarts of the software in a setting containing lots of chips and functions is far more important than optimizations inside a chip.
Since Frank is a virtual platform supporter just like me, I instinctively agree with his points about VPs being pretty fast to develop and available long in advance of actual silicon. I like the way he deals with power in the ARM DevCon presentation cited (do have a look at it), but still there are some lingering doubts and issues…
What I have a hard time understanding is just how detailed the virtual platforms need to be. The use of SystemC TLM-2.0 LT is sensible for speed, but it seems from the DevCon presentation that the main emphasis is on AT-level (and therefore pretty slow) timing-accurate simulations that look at power cycle by cycle in the target. If that is the case, I think we could almost just as well go get ourselves a hardware accelerator, as cycle-level models (even if transaction-driven)
However, Frank also says this which I cannot but agree with: you should not always run around with a hammer and look at everything like a nail — any reasonable chip design process needs both virtual platforms and hardware accelerators, one cannot really replace the other:
When discussing this matter with a friend, he pointed out rightfully so that both Ran’s and my post suffer from “Hammer and Nail-itis”. In fact, he pointed out, the combination of Cadence’s estimators (InCyte), C based synthesis, Palladium, and Synopsys virtual would be pretty powerful! It’s a good thing then that we acquired Synplicity which brought us Synplify high-level synthesis and Confirma FPGA Prototyping to Synopsys, and of course, that we have existing interfaces between our Virtual Platforms and Eve’s solutions.
To me, the lesson from this discussion is clear: A virtual platform should be the starting point of a new design, but once you get down to RTL, hardware acceleration is really pretty useful. You need both, and VP should come first, not second. It is not an either-or issue, rather I expect system and chip designers to use both tools, and the only question is what should come first, which I think is naturally the simulation in the form of a virtual platform. That also allows the chip to be set into a system context, which is otherwise pretty hard before silicon arrives, and something that large system integrators are screaming for.