DVCon Europe 2023 – 10th Anniversary Edition

The 2023 DVCon (Design and Verification) Europe conference took place on November 14 and 15, in the traditional location of the Holiday Inn Munich City Center. This was the 10th time the conference took place, serving as an excuse for a great anniversary dinner. Also new was the addition of a research track to provide academics publishing at the conference with the academic credit their work deserves. This year had a large number of papers related to virtual platforms, so writing this report has taken me longer than usual. There was just so much to cover.

The traditional gingerbread cookie, this time with a golden anniversary theme

Conference Overview

As has been the pattern since 2016, there were two days of DVCon, followed by the SystemC Evolution Day (SCED).  Attendance was good, with 343 total registered participants. Slightly down from last year, probably due to companies restricting travel due to tough times in industry. DVCon is an industrial conference, sponsored by Accellera, and the participation was skewed towards people from industry. The new research track definitely drew in more academics, as expected. The papers presented were a clear step up from last year. After reviewing my photos from the conference, I realize that I missed to take pictures for most of the second day – since I was in sessions listening to interesting papers!

Martin Barnasconi, General Chair for 2023, opens the conference

In full disclosure, I was part of the steering committee this year, with the wonderful title of “specialities chair”. Basically, the free resource who could go run after anything that needed to be done. I also had two papers selected in the engineering track – one on doing software fuzzing with Intel Simics Virtual Platforms, and one on how to use VP models to validate RTL together with my colleagues at Intel Programmable Solutions Group (PSG, aka Intel FPGA).

10th Year Anniversary

This was the tenth time that DVCon Europe was arranged. The first event was held in 2014, with planning starting already in 2013. Several of the steering committee members for 2023 have been there from the very start. To show off such dedication to the conference, we had ribbons that attendees could use to show that they had attended all ten years. I believe I have attended only six or seven times, so I did not merit such a ribbon.

Here are some ten-year ribbon holders:

We also provided a ribbon for first-time attendees, but I saw very few people pick up on that.

The most notable part of the celebration was the party we had on evening of Tuesday. The next morning we started the conference at eight sharp, which might not have been the smartest idea in hindsight. The crowd was pretty thin, but it did pick up by the time the keynote started.

Reading about other people’s parties is pretty boring, but I still wanted to share a few highlights.

People seated

Martin Barnasconi and Mark Burton arranged a pub quiz with nerdy questions about DVCon and Accellera standards. Amazingly, a group from Linz managed to answer all questions correctly!

Pub Quiz form filled in a rather unorthodox way

It was my idea to add a little program to the dinner and to do songs in the way we do it at academic parties in Sweden. Despite the misgivings of some members of the committee, this worked out brilliantly. I had to be on stage shouting in a microphone to lead the crowd, but it worked out. I realized that the custom “Dee Vee Con will Rock You” lyrics had some bugs while I was singing it. But it was still fun. I hope we gave everyone an evening to remember as something a bit different from your average industry conference.

Table setting with give-away coaster/bottle-opener sponsored by AMIQ and the songbook

More photos have been posted on the DVCon Europe website! See https://dvcon-europe.org/dvcon-europe-2023-photos/

Trends and Topics

Here is my take on the main themes and topics that I found interesting this year. Note that I am a virtual platforms and software guy, so I cannot do the deeper RTL-related topics on verification and validation justice.

Virtual platforms – more papers on VP compared to the previous year (counted at least a dozen). The conference was particularly rich in virtual platforms for automotive applications. SystemC was used in most VP papers and tutorials. Another VP technology hot topic was using ARM-on-ARM virtualization to speed up virtual platforms for ARM targets, enabled by ARM-based Macs and Amazon Graviton processors.

Simulator integration or federated simulation – the integration of digital system simulation (from RTL up to TLM VP) with other types of simulators. This came up in many papers and is a key direction of travel for simulation.

As always, running your design, verification, and testing tools in cloud-based solutions was common. Cloud is popular both as a way to provide access to packaged solutions and as a way to scale execution resources up and down.

Cadence booth, showing some AI-based solutions
Synopsys booth, also with AI at the front

Artificial Intelligence and Machine Learning – there was a keynote from AMD about accelerating AI workloads on FPGAs, and a panel on AI in design verification. There are EDA tools shipping with features based on ML – such as spotting issues in trace files by automatic pattern recognition. Another use for “AI” is to make regression runs and tests run more efficiently by predicting the best tests. Still, given the hoopla over ChatGPT and LLMs in the past year, there was arguably less talk about AI than expected.

Something interesting might be happening to RISC-V. While it was used as a motivating architecture in some papers and talks, there seems to be less hype around it this year than last year. Instead, it felt like ARM was back in the spotlight. With clear signs that some RISC-V startups are in trouble, companies have to confront the potential business impact of supplier problems. In particular, what happens if you license a processor core (or other IP) from a vendor that goes bankrupt? Who will support your integration into your SoC and fix issues in the IP?

Complexity – complexity keeps coming up as a key issue. Complexity from designs simply being big, or complexity from massive repetition of the same block over and over again.

The Exhibition

The exhibition hall was very busy this year, many more exhibitors compared to last year. I did not have as much time as I would have like to go around talking to the vendors.

The big three EDA vendors were represented, as expected, complemented by a large number of smaller companies. It felt like consulting companies were the most common, but we also had smaller product companies like Jade and S2C. Obviously, the Mathworks were there as well, showing their solutions for chip and system design.

Standing out in an exhibition where there are same-size booths and no place for enormous builds like at the DAC can be hard, but vtool showed how to do it. Their booth was very colorful, really popping out in a mostly blandly grey-white scheme. Veriest had a strong blue background in their booth, but it did not pull in the eye in the same way a vtool. El Camino had a nice desert theme and showed off some solutions based on Intel FPGAs – including some mechanical samples of giant high-end FPGAs. They had a setup with a machine solving a Rubik’s cube (which I did not get a good photo of).

Having coffee breaks and lunch in the big hall gave the attendees a good chance to look at the exhibition and engage with the vendors.

Research Track

This year, DVCon Europe added a research track to the conference. While it is still an industry conference, having a research track broadens and deepens paper submissions and conference attendance. It is a somewhat sad fact that if a PhD student or other researcher wants academic credit for publishing at DVCon the conference has to publish the papers in the “proper” way. This is what is addressed by having a dedicated academic track with the requisite rigor of review and suitable publishing (in the end, the papers will end up under the IEEE umbrella and available from ieeexplore).

Matthias Jung introducing the research track

 We had six full-length and six short paper presentations from the research track intermingled with the engineering papers (industry papers). Each paper session collected papers on the same theme, regardless of which track they belonged to. This worked well and encourages interaction between industry and academia, which is really what we want to see. For next year, we hope for even more contributions to the research track.

 The best research track paper award went to the paper “Clock Tree Design Considerations in The Presence of Asymmetric Transistor Aging”, by Freddy Gabay, Firas Ramadan, and Majd Ganaiem.

Keynote: Energy-efficient High Performance Compute, at the heart of Europe

The first keynote for DVCon Europe 2023 was presented by Philippe Notton, the CEO of SiPearl. SiPearl is a relatively small and new company that aims at building a high-performance supercomputer-class microprocessor in Europe, for European applications.

Philippe Notton answering questions

It has grown out of the European Processor Initiative, EPI, as is part of the ongoing discussion on making Europe less dependent on processors from outside the continent. Philippe shared some interesting stories from how EPI and SiPearl came to be, and the many different EU initiatives that have led up to the state today.

SiPearl is about to ship their first processor, the Rhea1. This is a high-performance part based on ARM Neoverse V1 cores (the only European licensee of this core). It uses ARM SVE vector instructions and HBM memory to achieve very high compute performance in a reasonable power envelope.

It has been designed into the JUPITER supercomputer at the Jülich supercomputer center in Germany. As an aside, note that the GPU side of that computer is using Nvidia Grace Hopper chips, which means each such node also features a set of ARM cores. Neoverse V2 to be precise. So the JUPITER will have two sets of ARM cores.

SiPearl is a small company, some 70 employees, and as such they have focused on the integration of existing parts rather than building their own microprocessor core from scratch (which would be way more expensive and time consuming). ARM wants to see multiple server vendors use their cores to cover different parts of the market, and SiPearl fits nicely with their direction. That being said, it is not easy to build a well-balanced processor with good power management. Their goal is for a 500W processor that can be combined into multiple-socket systems.

They have validated that the Rhea chips will work with most accelerators on the market: Nvidia, AMD, Intel (Ponte Vecchio GPUs), and Graphcore AI chips. They are looking to also add on quantum computing accelerators.

Philippe spent some time to talk about their design flow (and pointing out that as good design flow presents real value in a company). They use virtual platforms (based on Qemu and SystemC) to build enablement software like APIs where you do need the real platform. If you just need ARM architecture, you can rent ARM instances from Amazon. They have done firmware and software integration to the level of a Linux boot, on the VP.

When it comes to hardware design validation, they basically use a lot of RTL emulation. They have a Siemens Veloce-based emulation setup with 128 cards, enough to handle 4 billion gates or a dual-socket system (according to Philippe). They connected the emulators to outside hardware like SD Cards, and sometimes used the virtual platform to replace processor cores for higher speed and lower emulation usage. Building this flow took two years, out of the four years of the total project.

Using ARM let SiPearl leverage the existing software infrastructure, saving them a lot of time. Indeed, just one third of their company is doing software enablement.

There were some interesting notes on staying European. Today they had to fab at TSMC. In the future, I would assume Intel and TSMC fabs in Europe would be used. On the tooling side, it is really hard. All the big EDA vendors are from the US, and all their cloud-based EDA tool offerings run on US servers. Which means SiPearl could not use them. They had to build their own set of servers (cloud), so that the only thing that leaves the company is the final GDSII to the factory.

If the Rhea1 is successful, I would expect SiPearl to start doing their own cores at some point in the future, to raise the amount of unique value add compared to other ARM-based server chips. But that is just my personal speculation.

Keynote: Pervasive and Sustainable AI with Adaptive Computing

The second keynote was more technical than the first. Michaela Blott is a Senior Fellow at AMD Research in Dublin, Ireland, part of the FPGA side of AMD (i.e., Xilinx). The lab works with AI and communications. Her talk was about how AI algorithms can be implemented more efficiently and flexibly using FPGAs, and she made a number of good points on computer architecture for compute-intense tasks.

Michaela Blott, talking about FPGAs

She made some observations about energy and AI. Meta’s AI cluster is somewhere between 50 and 500 TWh per year. For comparison, Ireland uses 26TWh of electricity per year, and Germany around 537 TWh. Thus, the AI compute from a single large corporation can amount to a whole industrialized country. Current Deep Neural Network (DNN) technology is actually quite primitive and amounts to a brute-force approach. Especially when compared to the energy-efficiency of a human brain taking on similar tasks. It should be possible to find orders of magnitude of efficiency improvements in how we execute DNNs.

Michaela made the point that AI algorithms and techniques are currently in flux. There are “epic discussions” over aspects like which is the best data type to use. FP32, or Int8, or FP8, or BF16, or something else? Dedicated AI architectures like those from Google (TPU), Tesla, Alibaba, Amazon, Cerebras, Graphcore, Grok, and others offer very high efficiency. But it takes time to get them to market, say from 18 to 36 months. What is needed to run AI algorithms might well change in the meantime. An FPGA does not have that problem, as the hardware is reprogrammable (in principle, you lose efficiency compared to the same algorithm implemented in an ASIC on the same production process, but it is much more efficient compared to doing the work in software).

Most of the talk was about how FPGAs can be used in AI inference. Providing lower latencies and lower energy consumption per inference, thanks to the generation of custom hardware setups.

FPGAs base their compute power on a few basic building blocks: LUT (Lookup Tables), RAMs, DSPs, and in next-gen FPGAs, dedicated AIE (AI Engines, basically matrix-multiply units). The AIEs break with FPGA tradition in that they are software programmable instead of being part of the synthesis flow that produces the configuration for the rest of the FPGA.

FPGA inference, as presented, implements each layer of a trained neural network as a custom accelerator. This is optimized for streaming data, unlike GPUs that need big batches to work efficiently. This provides the opportunity for lower latency. By generating custom hardware setups for each layer, you only employ the hardware that is needed, unlike fixed accelerators that have to employ their full data widths. For example, each layer can use its own data representation to drive efficiency – for example, a INT1 multiplier is about 70x smaller than an INT8 multiplier. Data movement is minimized (which is great, as DRAM access is really expensive). The shorter execution time contributes to energy efficiency. A particularly interesting point was about sparsity – deep neural networks are naturally sparse. The human brain is likely also very sparse. FPGAs can do sparsity well – a LUT is approximately a Neuron.

She talked about the Vitis and FINN tools from AMD, and how they were used to implement neural networks on FPGAs. The FINN graph compiler can apply optimizations when converting an ONNX model to something that can run on an FPGA. Quantization can provide a 49x increase in efficiency, and “pruning” the net another 70x. Another tool, LogicNets, can squeeze out even more redundancy from the networks.

Note that there is a fundamental limit to what an FPGA can handle. Some networks simply do not fit, in particular due to the limits of on-chip memory. For example, LLMs are currently out of scope. But many generally useful inference tasks are in scope, from recognizing trigger points in CERN experiments to real-time Malware detection to software-defined radio tasks.

Panel on AI

The first panel of DVCon Europe 2023 was about AI and verification. The title was “‘All AI All the Time’ Poses New Challenges for Traditional Verification”. Led by Paul Dempsey. With participation from Michaela Blott (AMD), Jean-Marie Brunet (Siemens EDA), Daniel Schostak (ARM), and Lu Dai (Qualcomm and Accellera).

Uses of AI seen in EDA tooling and flows:

  • Selecting the tests to run from a large set, for better efficiency. Or select/guess which formal algorithm is the best for a particular design.
  • AI is great for pattern recognition in log files, part of the debug process. This is already in production and present in many commercial tools.
  • AI can drive automation in EDA flows.
  • In principle, AI should be trained to help in debug, to guide a programmer towards problem areas.

Verifying AI. If AI is used in safety-critical applications it needs to be verified. And we need to be very skeptical about the results. AI and LLMs in particular are good at looking good, with no idea of what lies behind the surface. You cannot use concepts like “coverage” to verify an AI algorithm.

The data is key. The training data is key to good AI in EDA tools. Vendors can build good tools, but the data basically has to come from their customers and users. Getting a tool pre-populated with trained algorithms is very unlikely. Design companies will not share the data produced during their flows with the EDA vendors. Too high a risk that valuable information leaks out. Imagine one tool user company sharing data back to the tool vendor, and then other tool users get that data as part of an AI package. How do you know that they cannot attack the data and reverse-engineer the design it was derived from?

Conversely, if we do share data, you need to check that it is good. A player might poison the data it shares as a way to hinder competitors.

The reliance on data can also skew the playing field and make the big bigger and the rich richer. A small startup will not have the same volume or quality of training data as an established player. In an interview after the conference, Lu Dai, Accellera chairman, notes that China might have an advantage in bringing up EDA flows based on AI since the government is able to force companies to share data. 

Panel on Chiplets

The second panel was on “The Great Verification Chiplet Challenge”. Nick Flaherty moderated, with Axel Jahnke (Nokia), Bodo Hoppe (IBM), David Kelf (Breker), and Moshe Zalcberg (Veriest) in the panel. The question was how chiplet use will impact verification.

The current marketing around chiplets is a lot like what we heard around IP blocks a few decades ago. Take existing pieces and combine them into a system, Lego-style (I am very skeptical about the usefulness of the Lego analogy, see this blog post). This time the pieces are physical, so maybe this time will be different. Standards like UCIe and the sheer physicality of chiplets should make interfaces more standardized and less tweakable that what we see with IP blocks.

Some panelists hoped that we would see a marketplace of ready-to-use chiplets that you could just buy and combine. But right now, actual chiplet use is mostly in-house. Big companies build big designs by breaking down what used to be a single SoC into pieces. But it is all their own pieces. It is a manufacturing aspect that can be employed where the economy makes sense. Not a marketplace with standard pieces.

It is hard to see how you can run a plugfest in the same way that is done in the network space. Actually putting physical chiplets together into a package is an expensive process, and it is not possible to simply pull one out and insert a different chiplet.

All panel members pointed out that you still get a system verification problem. Software will run across all the chiplets. In many ways, it looks like an SoC today. It is not clear how chiplets will take away any of the complexity, and it will add new ones. You need to analyze power, thermal, and mechanical aspects of the integration with pieces from different vendors. What happens if a physical component breaks? Can you request more liability from the vendor compared to an IP block that you integrate into your own SoC?

Overall, the sense I got was that the impact of chiplets is still unclear. They obviously add some complexity, but the question is if they can take any away.

Verifying Systems using Software, with Breker

David Kelf from Breker presented a sponsored tutorial, “New Methods in Core and SoC Verification based on RISC-V”, about how you can use (generated) software to validate hardware. The tutorial was not really specific to RISC-V, but RISC-V is a good hook. With RISC-V, he claims that more companies are getting into processor and SoC design, which requires verification environments for processor-based workloads. Existing processor design companies already have this in place, but newcomers in the business need help.

David Kelf, a great fun guy!

The Breker approach to verification is to generate software that runs on top of the target system processor cores. They use Portable Stimulus Standard (PSS) to describe and generate tests for hardware devices. This tutorial was really more about system validation, especially for processor-to-processor communication.

The main focus of this tutorial was system verification. This means verifying the functionality of core-local aspects like the instruction set, memory-management unit, and microarchitecture compute. And system-level aspects involving multiple cores, such as cache coherency, interrupts, memory ordering, atomic instructions, devices accesses, DMA, and all the other fun you get in SoC integration.

The key idea is that you should test all of these mechanisms at the same time, in the same test. It is quite likely that a system works if you do one thing at a time, but what happens when you have atomic instructions, cache invalidations, TLB misses, DMA, device accesses, and timer interrupts being exercised at once? You want intense traffic on the coherent interconnect. The interaction between features is much more likely to expose issues.

Breker provides a range of ready-to-use tests and test generators that do such testing. The tool is interesting in that it generates a C program for each core, and that program interleaves various tests. Once all the cores are released, they will run in parallel and the programmed interactions will take place. The tests can also include aspects like driving the SoC hardware. Like in this example:

There were many different specific tests discussed during the tutorial. To exercise the caches, make the tests do both true and false sharing between cores. Another example swas a test that accessed memory in 1MB strides, which found some corner case in a memory controller.

Unfortunately, I had to jump in and out of the session as I was trying to get help from the hotel to fix the fact that the lights in the room had locked into a binary state. Either all on or all off, and all off was not permissible as it would be a safety hazard. In the end, the whole light control system for the entire conference center had to be rebooted to fix the issue. All since someone leaned against the wrong switch on the wall.

Virtual Platforms – Automotive Use Cases

The use of virtual platforms for automotive applications has become a hot topic in recent years, as the automotive industry becomes more and more driven by software and custom hardware. There were two distinct ideas about how to implement virtual platforms for automotive presented at the conference. There were some good discussions around the use cases.

One key difference between automotive and typical silicon company use cases for VPs is that the bulk of the use is likely in post-silicon. The main benefit seen by automotive OEMs is really in automated and extended testing in a way that is not possible using hardware. This is a classic case for VPs that we sold Simics on even back in the early 2000s, so there is really nothing new here. Just that more people have picked up on it.

VPs can be used to provide better testing than hardware. In particular, scripting and models allow fault-injection testing and testing of security properties.

Part of post-silicon usage is the assumption that users will want to use their standard existing tools with the VP. The VP is not a tool that should be exposed to users, necessarily, but rather used as an execution platform in lieu of hardware, hidden in the background.

There is a general wish from OEMs to use VPs to get involved with early chip architecture, i.e., iterating on the design before it goes to production and making sure they fit the OEM use cases. This is an interesting ask which requires ways to communicate rather sensitive models or at least their results between silicon vendors and their customers. It is also dependent more on performance modeling than VPs that run concrete applications, as often you would not have complete code or applications at such an early stage.

Automotive applications in general imply that the VP will be integrated with simulators for the physical world around the chip. To test control systems and autonomous driving, a world model is absolutely needed.

Automotive VPs – Technology Levels

When it comes to building the virtual platforms for automotive applications, there are three main technology tracks:

  • The “level 1 to 3” virtual platforms that runs code using host-compilation (exemplified by the Synopsys Silver tool, but often something companies cook up internally as it is fairly easy to do most of the time).
  • “Level 4”, which is what I would call a standard virtual platform where you use instruction-set simulators and device models to run target code as if it was the real thing. Synopsys has started to talk about this in two sub-levels, “4a” and “4b”, where 4a basically means simplifying the setup a bit with some paravirtual or simulation-only devices.
  • Virtualization-based” could be seen as a variation on 4a, but it is very different in implementation. Here the idea is to focus on the processor cores, like ARM cores, and then run code as quickly as possible using non-VP tools like Qemu or ARM-on-ARM virtualization. Device simulation is considered secondary and VP timing is not really a primary concern. Such solutions are often driven by user companies rather than silicon providers and reflects a different set of priorities in the solution.

Synopsys provided this very handy map in a recent fact sheet:

The key to me for level 4a or virtualization is the use of features like Virtio GPU support to offload heavy processing of things like graphics and AI code directly to the host. The target system would run a different driver stack compared to the real hardware system, but the results can be provided much faster than if you try to actually simulate a GPU or AI accelerator faithfully on the host processor.

Infineon Aurix Synopsys Virtual Platform

Synopsys and Infineon had a tutorial and several presentations about use cases for and technologies in their joint virtual platform for the Infineon Aurix TC4 family. The set up is a classic VP (level 4b mostly), using instruction-set simulators to run both Tricore code (the main cores) and code for their Parallel Processing Unit (PPU). The PPU appears to be a Synopsys ARC EV 7x core, simulated using an ISS for the core from Synopsys.

ARM-on-ARM Hypervisors as Virtual Platforms

The paper “Reverse Hypervisor – Hypervisor as fast SoC simulator” was presented by François-Frédéric Ozog from Shokubai, and Mark Burton from Qualcomm. It talks about how ARM-based host machines can be used to run ARM-based targets at high speed, focusing on the main cores and their software with helper cores and subsystems being a secondary concern. The system is called Emul4.

The paper describes how this can be made to work, mainly using the Apple MacOS HVF, Hypervisor Framework. They would like to do the same on Amazon Graviton instances using KVM, but KVM is not currently in a state where this works. The HVF is apparently a fairly nice base, providing a general mechanism to call out to user space any time a VM exit happens. For example, accesses to device memory just traps out neatly. KVM for ARM is currently not as friendly.

In any case, to really build a virtual platform, it seems some more work is needed. In particular, access to performance monitors so that it is possible to schedule a break out from the hypervisor at the end of a quantum (like the Intel Simics simulator does it, using a custom driver rather than the standard KVM).

One problem being addressed was how to run software stacks that themselves contain a hypervisor. Apparently, ARM hypervisor support is not completely up to the task of running a hypervisor on top a hypervisor. To handle this, any code that is loaded to be run inside the hypervisor-driven VP is scanned for instructions that have a behavior that differs between the exception level (EL) that the hypervisor is supposed to run at (usually, EL3) and the EL that it is actually run at in the simulation case (EL2, it seems). Such instructions are then replaced with a breakpoint instruction and their behavior simulated after a VM exit. A similar method can be used to patch over small differences between the host and the target.

I was shown a demo of how the platform could use a Virtio-based GPU interface (I think) to simulate the use of hardware acceleration for machine learning tasks like camera processing. The setup used the Carla simulator to simulate a world, from which images were generated and passed over to the virtual platform over a socket interface. The virtual platform then processed the images using the paravirtual GPU instead of simulating the actual accelerator hardware – which is obviously much faster. This paravirtual approach means that you are not really running the same software as the real platform, but it does provide for much better performance in case all that you are interested in is that the job gets done.

Given the issues with the hypervisors, this is very much a work in progress, but does show impressively fast simulation. I suspect it would be slower if integrated into a full virtual platform with many devices and a lot of activity in subsystems. But for just running main processor code, something very close to a standard virtual machine will be fast.

Tutorial on Aurix Virtual Platforms with the Mathworks

Synopsys, Infineon, and the Mathworks presented a joint tutorial on how to use Synopsys Virtualizer-based Virtual Platforms for the Infineon Aurix platform I already talked about. Matlab/Simulink was used to prototype control software and co-simulate the control software with a physics model. The users would start with a physics model and controller model in Simulink, and then migrate the controller part over to actual code. The actual code would then be run on the virtual platform, with Simulink still providing the physics simulation to provide a counterpart to the controller.

They talked about how to generate code that would run on the Aurix PPU processor, using the Mathworks code generators together with the Synopsys Metaware compilers for the PPU (which is a Synopsys ARC processor after all).

The code could be run on real hardware using processor-in-the-loop (PIL) or hardware-in-the-loop (HIL) testing, or on the virtual platform, using virtual PIL (vPIL) and HIL (vHIL). PIL only uses a processor to run the core controller code, while in HIL testing a completely platform is used including real IO and operating system. Physical HIL requires quite a bit of special hardware, while virtual HIL only requires a connection between the simulators. There is a packaged solution to connect Synopsys virtual platforms to Simulink simulations.

Tutorial on SystemC Virtual Platform Modeling

There was a joint tutorial by Nils Bosbach from RWTH Aachen, Mark Burton from Qualcomm, and Lukas Jünger from Machineware that talked about open-source and virtual platform modeling.

The work presented was all part of the SystemC Common Practices project. This is either called SCP or CPS (Common Practices Subgroup, as it is technically a subgroup of the Language Working Group, LWG) depending on the context, all a bit confusing. The idea is to collect reusable functionality that you can use irrespective of what you are modeling, but that makes no sense to reinvent over and over again. Maybe things will be standardized at some point, but currently the idea is to first of all collect code with a nice license on the Accellera github so that people can use it and improve it. The SCP repo currently has some TLM extensions and a logging system that allows you control logging via SystemC CCI parameters. There is also the separate PySysC library to control SystemC from Python.

The bulk of the tutorial was about the Machineware Virtual Components Modeling Library (VCML). The VCML might be on its way into the SCP repo. It provides a diverse set of building blocks and tools. There is infrastructure like register modeling support and SystemC TLM expressions of interfaces like SPI, CAN, Ethernet, I2C, PCI, and serial. It provides some models for commonly-seen hardware like Arm and RISC-V interrupt controllers, I2C controller, GPIO, and serial ports. It provides Virtio models to leverage that paravirtual interface.

Somewhat interestingly, VCML includes a Qemu integration, so that a user can take a VCML platform and integrate Qemu as the instruction-set simulator. Using this does mean that the whole platform becomes GPL v2. There was a paper, “Virtual ECUs with QEMU and SystemC TLM-2.0”, that talked a little bit more about this integration. But it did not show much except that Qemu can run code fairly quickly in case you do not interrupt it too often from the SystemC virtual-platform side.

For debug, it connects to the outside using gdb and the Lauterbach MCD interface. No TCF, at least not yet. For scripting, the language chosen was Lua, which is a bit surprising. Python seems like a better match for the domain to me.

One interesting idea was to model device registers using SystemC Direct-Memory Interface (DMI). The idea is that processors can access device register contents directly with minimal involvement from model code. In case there are side effects, they would be implemented using callbacks for pre/post read/write. For cases like devices that have to compute their return values, a pre-read callback could compute the value and deposit it in the storage area for reading from software. Interesting way to approach the problem, I am more used to having a function call that produces a result using whatever methods it wants.

Parallel SystemC and FPGA with RAVEN

The paper “Accelerating Complex System Simulation using Parallel SystemC and FPGAs” was presented by Stanislaw Kaushanski, Eyck Jentzch, Johannes Wirth, and Andreas Koch. The first two from MINRES, the second two from the University of Darmstadt. The system they presented is called RAVEN, and was also being sold in the exhibition in the MinRes booth.

The RAVEN solution provides a way to run multiple SystemC simulations together or to combine SystemC simulation with RTL in FPGAs. Fundamentally both of these setups involve synchronization between multiple separate simulation engines and the exchange of data between them. Splitting a SystemC simulation into multiple processors or threads, each running their own SystemC kernel, is pretty standard.

The FPGA part is more interesting. The key there is to use a domain-specific language, RAVEN DSL, to describe the interfaces of the RTL. The system can then generate the necessary interface code for the RTL and for the simulator. They use the open-source academic TaPaSCo (Task Parallel System Composer) framework to generate the eventual bitstream to run on the FPGA. They point out that you can rent FPGA time from AWS, and they provide a GUI to control cloud usage of the simulations.

It was a bit funny when I talked to Stanislaw about the solution. He talked about RTL on FPGA as a fast solution – while I am used to any RTL simulation, even on FPGAs, to be a drag on the speed of virtual platforms. The key difference is the use case. He has used it to run compute accelerators that do a lot of parallel work with quite limited communication to the rest of the system. While I am used to RTL being used in validation, where the VP is constantly talking to individual registers in the RTL and being hit with access latencies each time. It comes down to computation vs communication, as always when trying to run things in parallel.

Overtemperature Sensor with Automotive VP

The paper “Virtual testing of overtemperature protection algorithms in automotive smart fuses” by Thomas Markwirth, Gabriel Pachiana, Christoph Sohrmann from Fraunhofer, and Mehdi Meddeb, Gunnar Bublitz, Heinz Wagensonner from CARIAD, presented a combination of a functional ECU VP and models of the electrical/physical bits of an overtemperature fuse. Their own abstract describes it well:

 Our approach involves a co-simulation between the mixed-signal fuse and a thermal reduced-order model of the relevant part of the E/E architecture, using SystemC AMS. While the virtual Electronic Control Unit executes the target executable, the analog and thermal domains are simulated concurrently.

Most of the paper is about the modeling of the physical side of the system. They build a model defined in COMSOL Multiphysics, which is then exported to be run inside a SystemC AMS simulation. To make the model run faster, they apply some model reduction techniques. They use the COSEDA tool to build their SystemC AMS model.

To develop their system, they first build an abstract model of the smart fuse system (including the software component) in SystemC AMS to test different fusing algorithms. Once they have selected an algorithm, they convert it to real software.

The software is run on a Synopsys Virtualizer model of the microcontroller part of the system, connected to the SystemC AMS model of the “analog” parts of the system.  The two models are connected using the Synopsys VSI interface, but they point out that it could also be run inside Virtualizer.

Debugging User-Reported VP Issues with Traces

The paper “Efficient Debugging on Virtual Prototype using Reverse Engineering Method”, by Sandeep Puttappa, Dineshkumar Selvaraj, and Ankit Kumar from Infineon, talked about how traces of devices accesses can be used to debug customer-reported problems in a VP.

They are working with external customers of their chips who run into issues running the virtual platform. When this happens, they need to reproduce the issue locally at Infineon. The standard way to do that is to ship the customer code back to Infineon or to send out an engineer to work with the customer on the customer site. This is cumbersome and sometimes the customer might not want to share their code with their supplier. There is also the potential problem of interactions with external simulators that the customer cannot or will not provide back to Infineon.

Instead, by recording all interactions between a component of the VP and the rest of the platform, the complete behavior of the component can be reproduced in the Infineon lab without access to the customer software or setup. The work was still in prototype form, and shown to work for a single IP block.

The audience did have some question on the practicality of this type of solution. In particular, it might be tricky to get the customer to understand which VP component a certain problem belongs to (in order to record the right pieces). The performance impact of extensive recording remains to be seen, as it would appear likely to be significant. The recording must be complete and contain all inbound events with time-stamps, requiring care in the model instrumentation.

To me, this idea as such is sound. We have the same facility in the Intel Simics simulator at a higher level, where we rather record and replay all asynchronous inputs to a platform. Bringing it down to an individual model level is much more work and implementing it efficiently is going to be interesting.

EPI Computer Architecture Simulation

A paper that I really cannot make up my mind about is “VPSim : Virtual Prototyping Simulator with best accuracy & execution time trade-off for High Performance Computing systems evaluation and benchmarking”, by Mohamed Benazouz, Ayoub Mouhagir, and Lilia Zauorar from CEA List in France. It ties into the European Processor Initiative and thus the processor presented in the keynote by Philippe Notton – it seems this infrastructure was used in the design of the Rhea1 processor.

The system is an ambitious virtual platform construction kit that is based on SystemC and collects SystemC TLM models from a variety of sources. Mostly it seems to be fairly simple home-made models for things like basic IO and Virtio-based disks and networks. A key part is using Qemu as the provider of fast instruction-set simulator models for ARM cores (and maybe also RISC-V).

Performance evaluation is a key goal for the setup, where they separate the functional simulation from the performance simulation by pulling out traces of memory operations from the functional side (i.e., code running on Qemu) and running these through a separate, decoupled, memory system model.

The memory system models the caches for each processor core, plus the network-on-chip interconnect, plus the last-level cache blocks and memory. Including address interleaving to spread the load across the storage elements. It can also model non-uniform memory access effects, NUMA, and the effects of using multiple processor cores on each NoC stop. They run experiments like determining the best shape of the NoC for a given core count, and where to place DDR controllers in the NoC.

What strikes me as odd is that the reported methodology seems to ignore standard computer-architecture simulation practice.

For example, they allow the simulation to run in fast mode until it reaches a region of interest, and then switching on the memory system simulation. I asked them about how far they then ran the simulator to warm the caches, but it seems they just collect results immediately. Which is not how the comp.arch community does it. There are no references to tools like GEM5 or commonly referenced papers on computer architecture performance evaluation. Very strange.

There is no mention of modeling of memory prefetchers, which should have a large impact on caching given that they are supposed to be using server-class cores from ARM. Neither do they seem to do timing push-back from the memory system into the functional simulation. It is not even clear to me that they bother to collect instruction accesses from the functional simulator – this could be a pure data-cache model.

In general, all these simplifications would seem to introduce significant questions about the results. It could be that this is a specific case where you can get away with a simple “just model the memory hierarchy” and still get reasonable results. Maybe it is due to small benchmark codes that mostly run out of the L1 instruction caches, so that data accesses absolutely dominate the memory system.

Hardware Development using Matlab

The paper “A Model-Based Reusable Framework to Parallelize Hardware and Software Development”, by Jouni Sillanpää from Nokia and Håkan Pettersson and Tom Richter from the Mathworks presented another take on hardware and software shift-left. They use a Matlab/Simulink model to describe the hardware and software in the early phases of development, and then generating software and hardware models to run the software in a virtual platform.

Generating a hardware design from a model and generating software from a model are both well-known fairly straightforward techniques. What was interesting here was how they managed the hardware/software interface. They introduced an additional layer that lists the parameters that need to be communicated from the software-side to the hardware side. This is just a set of named values when running simulations, but when moving to actual hardware, it gets converted into a concrete set of control registers (this process is apparently manual currently).

They showed a setup where they inserted the Matlab/Simulink model of a hardware block as a functional block in a virtual platform, talking to the concrete software running on an instruction-set simulator.

The whole setup was based on an impressive number of Mathworks tools, including MATLAB coder, Embedded Coder, and Simulink Coder to generate various forms of code. The final result goes into a Digital Front End (DFE) ASIC produced by Nokia.

Intel Paper: Fuzzing on Virtual Platforms

“Fuzzing Firmware Running on Intel® Simics® Virtual Platforms”, by Jakob Engblom and Robert Guenzel from Intel is about how you can use a virtual platform to perform fuzz testing on firmware. Regular software fuzzing is done in user space, with the fuzzer and the target program running side-by-side in the same operating system instance. This does not work if you want to fuzz firmware, device drivers, boot code, or operating system code.

Assuming that you already have a virtual platform, it makes perfect sense to use the VP to do fuzz testing both before silicon appears and later. The VP makes it easy to fuzz code that would otherwise be difficult to access. It can talk directly to deeply embedded firmware, and it provides the right processor cores and devices to run any code.

The solution presented in the paper is to make a virtual platform present the same interface to the fuzzer software as a user-mode fuzzing target. Behind this interface, virtual platform features and tools are used to implement the actual fuzzing, including how to get inputs into the target software stack, how to detect failures, how to collect coverage information for guided fuzzing, and how to restore the state. Basically, the virtual platform hides the complexities of dealing with low-level code. The implementation is mostly independent of the precise code being fuzzed, but the configuration does have to change to reflect the precise circumstances.

Intel Paper: Model-First Development and RTL-VP Validation

“Closed-Loop Model-First SoC Development with the Intel® Simics® Simulator”, by Kalen Brunham, Anthony Moore, Tobias Rozario, Wei Jun Yeap, and Jakob Engblom from Intel. This paper was about how we do model-first development of new Intel FPGA designs.

Model-first means that we embed fast VP model developers into the hardware development team, so that the model is built together with the hardware. The model serves as an executable specification for both hardware and software development, and the model evolves as the specification evolves. This is an organizational change that is necessary in order to develop a model quickly enough to be relevant for the downstream software and especially firmware designers.

Closed-loop means that the hardware RTL is tested against the virtual platform model. This is a primarily technical problem, where we run Intel Simics simulator virtual platform models together with hardware RTL inside UVM test benches. The VP is used as a predictor to validate the RTL, ensuring that the RTL has the same semantics as the model and thus that the software will work on the actual hardware on first try. The VP is also used as a way to develop the UVM test benches ahead of RTL availability, providing a shift-left to RTL testing.

Best Engineering Paper – EDA Critique

Speaking of best papers, the best engineering paper award was won by a paper called “The Three Body Problem – There’s more to building Silicon than what EDA tools currently help with,” by Ben Marshall and Peter Birch.

This paper is a critique of the way that EDA tools do and do not integrate well and the effect of non-disclosure agreements (NDAs) on community building and knowledge sharing around EDA tools. Like other engineering-track DVCon Europe papers, this will be available on https://dvcon-proceedings.org/ in about three months time.

More Reading

AMIQ posted their summary of the conference, touching on some of the same and some different sessions. https://www.amiq.com/consulting/2023/11/20/highlights-of-dvcon-eu-2023/

One thought on “DVCon Europe 2023 – 10th Anniversary Edition”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.