EETimes: James Aldis on Performance Modeling

James Aldis of TI has published an article in the EEtimes about how Texas Instruments uses SystemC in the modeling of their OMAP2 platform. SystemC is used for early architecture modeling and performance analysis, but not really for a virtual platform that can actually run software. The article offers a good insight into the virtual platform use of hardware designers, which is significantly different from the virtual platform use of software designers.
For a software person like myself, this article offers a well-written insight into the world of hardware design and bus optimization for SoCs.

TI deploys two totally different platforms for hardware and software development, which makes perfect sense.  The goals are so different between a high-speed software development platform and performance-accurate hardware design platform that trying to force them together would likely just create a bad compromise that is bad for everybody.

Additionally, FPGAs are used to create timing-dependent low-level code, where you need both timing accuracy and decent speed.  It is worth noting that the performance model is mostly “dataless” – it models the timing of actions and their dependencies, but not their values and computations.

The different models serve different purposes, require different levels of effort to use, and become available at different times during the project. The SystemC performance model is always available first and is always the simplest to create and use. The virtual platform is the next to become available. It is used for software development and has very little timing accuracy.  TI uses Virtio technology to create this model rather than SystemC.

Given the number of ultimately failed attempts I have seen at making timing and function available in the same model but as orthogonal concerns, this observation in the article is very insightful:

It would appear the choice of two different technologies for the virtual platform and the performance model is inefficient, wasting potential code reuse. However, the two have completely different (almost fully orthogonal) requirements, and at module level almost no code reuse is possible.

Maybe this is an impossible dream in the general case.

One somewhat surprising statement in the article is that there is no real software available to use in the SoC design phase. Often, virtual platforms are sold as being able to use “the real software” when designing hardware. But in the case of TI, the software is mostly written by their customers, with little available for TI to use. Thus, they are forced to design their own test cases to drive the hardware design process.

The requirements on the simulation technology are first and foremost ease in creating test cases and models and credibility of results. The emphasis on test-case creation is a consequence of the complexity of the devices and of the way in which an SoC platform such as OMAP-2 is used: because the whole motivation is to be able to move from marketing requirements to RTL freeze and tape-out in a very short time; and because in many cases large parts of the software will be written by the end customer and not by the SoC provider (Texas Instruments, in this article), the performance-area-power tradeoff of a proposed new SoC must be achieved without the aid of “the software.”

The platform they built is all based on clock-cycle-level interfaces (CC), which is very natural when the primary use case is hardware design.

The primary component optimized in the TI design process is the on-chip interconnect structure, called the “NoC” in the article. Each SoC variant is built from a set of (usually already existing) devices and processor cores. The main work of the integration is designing an appropriate NoC for the SoC. The NoC design is crucial to the actual performance level the final SoC product will have.

By playing with the topology, the level of concurrency, and the level of pipelining in the NOC, it’s possible to create SoCs from the same basic modules with quite different capabilities.

The only real instruction-set simulators used are CC-level models of DSPs, used for software optimization taking but contention into account. No models of the ARM control cores are used. Mostly, processors are represented by stochastic or trace-driven traffic generators that put transactions on buses but do not actually run any real code.

The stochastic processor models are very powerful and provide traffic that is very similar to a real processor.  A very elegant property of such models is that it is very easy to change the parameters of the model to model quite different software/processor scenarios. Compared to writing real test programs for a full ISS, this is much faster and allows for the exploration of more alternatives.

The stochastic models are used along side function-graph breakdowns of software, essentially models that say that an application does A, then B, then C, and that maybe D can happen in parallel. This model of an application is connected to the hardware simulation and can control when things happen and what goes on in parallel. It amounts to a simple model of what an RTOS would do, to some extent.

Configurability is a key theme throughout the OMAP architecture exploration platform. SystemC being what it is, it is limited to configuration at start-up time, but that is perfectly sensible for an architecture exploration use case where you want to setup and platform and test its performance. Dynamic reconfiguration during a run is not that important.  TI has spent a great deal of effort in making the system easy to configure using parameter files.

The article goes into many more fascinating details on the models used.  I can only say one thing: read it, if you have any interest in these kinds of issues.

Good work, James!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.