DVCon Europe 2024 – AI and More – Observations from Uppsala

The 2024 DVCon (Design and Verification) Europe conference took place on October 15 and 16, in its traditional location at the Holiday Inn Munich City Centre. This year there was even more talk of artificial intelligence than last year, and quite a few sessions related to virtual platforms. And lots of other interesting presentations and discussions.

This year’s traditional DVCon Europe cookie

The conference had as many attendees as last year, despite the challenging times and some other events that drew off potential attendees. Precisely, we got 351 registered attendees and 18 exhibitors. It felt just as busy as last year, and just like last year I sometimes missed sessions as there was just so much fun to be had talking to people in the exhibit hall. The research track that was introduced in 2023 was expanded to include more papers, and it feels like it draws in a new crowd of academics that adds to the broad variety of people you meet at the conference.

Even though I left Intel at the end of September 2024, I attended the conference. As a steering committee member and vice chair and keynote chair I had a job to do. And DVCon Europe is always fun!

DVCon Europe 2024 General Chair Mark Burton from Qualcomm opening the conference

Themes

As already mentioned, artificial intelligence, AI, was a very prominent topic in 2024. It came up in both keynotes and the panel, as well as in many exhibitor booths. There were also papers that applied AI to the design and verification problems.

*MathWorks booth with prominent AI – and for a good reason, considering the wide support their tools have for building and implementing AI systems.*

Open-source was also mentioned in several contexts. Open-source software is getting more common in automotive, and there is a clear movement in EDA to make it possible to do design using only open-source tools. In particular, the cocotb framework showed up in many places. The open-source instruction-set simulator Qemu was also well-represented, with several papers and tutorials around it.

Speaking of Qemu, virtual platforms and digital twins were seen in many papers and the panel.

Software is being considered as part of what is validated, and software is used to drive hardware validation. It used to be that DVCon talked a lot about classic test benches connected to RTL, but software is getting more and more common. Talking about software, the concept of software first for automotive popped up in one keynote and the panel. It is a key challenge to the traditional automotive industry that used to be very mechanical-focused.

On the architecture side, RISC-V is still going strong. It is driving a new interest in instruction-set validation, and it is a perfect test object for anyone implementing a technology or technique related to processors or processor simulation.

Verification and Validation. Being a DVCon, most of the papers in the paper sessions were related to verification and validation. As always.

Keynote: Thomas Böhm

The keynote on the first day was provided by Thomas Böhm, from Infineon Technologies, where he is the Senior Vice President & General Manager for Automotive Microcontrollers. He has worked as a business and technical manager in semiconductors for more than twenty years, and actually once worked together with the second keynote speaker (Erik Norden, back when he was at Infineon)!

I loved that he started by showing the location of their office in München, at the Campeon site. I have visited the Intel office on that site many times over the years, and it is a beautiful place.

The biggest surprises in his talk to me was that there are use cases for AI (or rather, Machine Learning or ML) in control, and the concern for quantum computing breaking crypto! Neither of these is what I thought would be present in “microcontrollers”. But “microcontroller” does not mean what it used to…

Thomas posited the following four main drivers for automotive microcontroller architectures today:

Software-defined vehicle (SDV)
Electrification
Cyber security
Safe embedded AI

SDV basically comes down to car innovations driven by software. This means user-facing software (including AI), as well as control and real-time systems.

The software and physical architecture of the car is changing. The number of ECUs in cars is going down, despite cars gaining new complex functions – new domain and zonal architectures drive a consolidation of compute into fewer units. Complexity is moving around – from many separate small ECUs to a few beefier processors running multiple software stacks under virtualization. This is physically simpler, but also kind of the same from a software perspective.

Thomas noted that some people claim a car is just a computer, can run Linux, and treat it like a server. That does not work, there is a need for specialized microcontroller architectures. A general HPC box is 500 to 700 USD, twice the current cost of the entire E/E setup in a car. And it will have real issues with the real-time nature of a vehicle. Performance means both latency and throughput, and control systems need predictable execution times and guaranteed timing. This means using microcontrollers, not general-purpose compute SoCs.

Electrification affects the control systems and overall system architecture. Not just the powertrain, but also the chassis – moving to the x-by-wire, removing mechanical linkages.

Embedded cyber security has become critical. The car is a network on wheels, connected to the outside, and thus subject to threats just like any other computer network.

Safe embedded AI. AI in automotive started in autonomous driving, moving down to smaller problems like basic control laws for the powertrain and chassis. That was quite a surprise to me, but then I have been out of the real-time systems loop for a few years.

Looking at the contents and architecture of microcontrollers, Thomas showed how the drivers of microcontroller innovation changed over the decades:

2000s, it was about increasing performance by going from classic small 8-bit and 16-bit machines to 32-bit.
2010s, security and safety aspects came into focus. Starting to see embedded security hardware. Better NVM access, application-specific accelerators.
2020s, it is about software and AI. Introducing virtualization, security computation acceleration, software updates OTA, accelerating AI and data.

It used to be that controllers had two components:

System management
Real-time control

Today, there are five components:

System management (still there)
Embedded accelerators (for system functions, like networking, protocols, signal processing, …)
Real-time control (still there)
Data processing (complex general compute tasks)
AI inference (needs special accelerators, and development tools to optimize and compress models)

Look at the core counts in that picture! To simplify software, it would be nice with the same ISA across the stack! For Thomas, this makes RISC-V sound attractive. With optional extensions like hypervisors, Ai-optimized data types, and fully custom instructions.

Embedded security has become critical to the safe operation of vehicles. Security has to be architected in from the lowest levels, being part of the feature set of the chips. Features like software-over-the-air (SOTA) requires security and authentication for updates.

Inside the vehicle, traffic on the networks is also being secured in way not entirely unlike what you find in regular corporate networks! To still meet real-time requirements, hardware security acceleration is needed. Thomas said that >20% of network traffic is encrypted (how crazy is that?), and that >50% of network traffic is authenticated already today!

Infineon thus adds two specific hardware subsystems to their microcontrollers: the Cyber-Security Satellite (CSS) – a local security accelerator found at network endpoints to accelerate traffic handling – and the Cyber-Security Real-Time Security Module (CRSM). The CRSM is similar to the Hardware Security Modules, HSM, used in other industries.

With encryption thus being prevalent, Quantum Computing is a real threat that must be mitigated. Thus, automotive has to move to using Quantum Computing-Proof crypto algorithms. With automotive time horizons, this requires some serious forward-thinking, and the hardware has to provide computational headroom to handle future algorithms that are introduced during the fielded lifetime of the system.

As can be seen in this slide, the “microcontroller” that Thomas is talking about is currently represented by the Infineon Aurix family based on the TriCore core. The PPU is a vector engine that is used to accelerate AI algorithms.

Safe Embedded AI was the final topic. The underlying idea is that AI will be used for safety-critical real-time reliable systems! Leading to the question of how statistical algorithms like AI can be used in systems with requirements for “six nines” reliability.

AI in this context sounds more like “machine learning”. The idea being to replace traditional control, safety, and signal processing systems development with data-driven approaches. I.e., work from measured inputs and desired outputs, and then use machine learning to create the “algorithm” controls the system. The resulting system is typically small enough that they can be easily executed inside embedded microcontrollers. This can reduce the cost of development – instead of humans making up new algorithms, use machine-learning from the desired effects.

Machine-learning is very useful to create adaptive systems. For example, battery control. The system in a vehicle can learn how the actual battery in an actual car works, and tune its behavior accordingly. The system can also be tuned by collecting data from the fleet and feeding that back to the development teams. In the end, such approaches can provide better system performance than traditional control!

Another aspect of “AI” is to use “virtual sensors” where the output from certain sensors is replaced by values computed from measurements from other physical sensors. This can make for more robust systems – either by replacing vulnerable physical sensors, or to provide a fall-back for degraded systems.

Making it safe? The ISO 8800 standard is in development right now, trying to define how to prove safe safety-critical AI systems. It will complement ISO 26262 and ISO 21448.

Some other notable comments on future technologies in automotive: RISC-V as an instruction-set is already happening. Mentioning the Rust programming language – it kind of makes sense to use a safer language in this setting, but is it acceptable to current C/C++ programmers I wonder? It could also add “programmer cool” to the space that might make it easier to recruit.

There was also an acknowledgement that the competition from China and car companies there applying true software-first approaches is intense. This was also discussed in the panel on day 2.

Keynote: Erik Norden

The second conference keynote was given by Erik Norden, CTO of Zyphra. When the keynote was agreed on and announced, Zyphra was still in stealth mode, so the company was given as “stealth AI startup” on the conference web page. Just in time for the conference, Zyphra went out of stealth and we are very happy to have got Erik as a keynote speaker. Erik’s background in AI is very impressive. He worked on computer vision acceleration at Nvidia, the Apple Neural Engine, the Google TPU architecture, and AI training acceleration at Intel.

Erik provided an overview of the history of what we might call “high end” AI (to contrast with the embedded AI that Thomas Böhm talked about in the other keynote). A very good orientation. The core of his talk was really a discussion on how software techniques, data, and innovations in algorithms help bring the cost of AI down – and help us get to the next 10x in AI.

The current “AI revolution” has been driven by three interacting factors:

Models/software architecture – CNNs in their current form were introduced around 2012 (DNNs). Transformers arrived in 2017, and new ideas and tweaks keep arriving.
Data – the creation of large datasets was key to the current wave of AI.
Hardware – both client-side and datacenter-side silicon performance and memory capacities have vastly improved. What started on general-purpose CPUs has moved to GPUs and dedicated accelerators, and these have in turn increased in performance.

AI compute requirements is increasing by 10x in volume, per year. Meeting this is hard.

New basic architectures provides a small increase per year, but nothing like what is required
Scale out (using more and more compute units) can provide increased performance, but at a high cost in money
Algorithmic advancements are needed to make up for the hardware. AI does not scale well simply by brute force. The key to Erik’s company Zyphra is really their developments in software and algorithms.

Erik discussed some of the ways that hardware has improved over the years.

AI moved from CPUs to GPUs in 2012 (with Alexnet) and later to dedicated accelerators like the Google TPU.
Interconnects have evolved in both structure (TPUs use a 3D torus) and nature. Silicon photonics, optical switches, etc.
Liquid cooling is becoming the norm as new datacenters are built.
Datatypes are going down in precision: FP32 was used at the start, then to FP16 or Int8, then to BF16, then FP8, and now to various smaller datatypes
Memory: HBM is expensive but used in many large training setups (Google TPU, Nvidia GPU, and more). Cerebras goes for all SRAM. Others go for classic DDR.
The scaling is insane. Erik showed the example of a Google TPUv4 “Pod”, where each pod contains 4096 TPU chips.

Note that what really matters is the performance per total-cost-of-ownership. Google is building the TPU system to optimize the total cost, including the cost to design and build chips, packaging chips onto boards, building racks, and piling racks into data centers. Once built, the next step is to add and optimize the operating costs.

This busy slide summarizes the elements that go into building an AI system today. Note that there are some aspects here that are truly system-level concerns, like the security and privacy and RAS (Reliability Availability Serviceability):

I would say that Erik broke current AI developments into a few categories:

Compound systems
Agent-based systems
Curated data
Smarter algorithms

Compound systems:

Combine different models with different functionality and sizes.
Add in information retrieval from sources like vector databases and the web.
Use multimodal models that can deal with text, images, audio, video, or other sensor data.
Not all queries have to go through the largest and most expensive models.

Software algorithms are evolving to make better use of the hardware, as both the problem space and hardware architecture are better understood. One example is “Flash attention” a restructuring of how transformers/attention computations are structured that makes it possible to stay in on-chip SRAM instead of going out to HBM – while performing the same computation. Sounds very similar to the cache optimizations used in high-performance computing to make best use of the cache.

Zyphra has developed a “tree attention” structure that reduces the communication cost between GPUs when solving big problems, by restructuring how computations are distributed. Another development used by Zyphra is to insert additional “mamba2” blocks between the classic transformers/attention blocks.

In the next generation of systems, we can expect the hardware engineers to look at these software developments and adjust the hardware as a result.

Another current trend is towards more careful selection of datasets. Curated, vetted datasets instead of grabbing random things from the Internet, with higher quality and less redundancy. Zyphra is working on this angle, selecting data for the training of their LLMs based on their new software architecture. They have open-sourced a training set called “Zyda”, for training relatively small LLMs. Better datasets means that a model makes better use of its parameter memory, providing an additional dimension for performance improvements.

Finally, Erik touched on some other current LLM applications and techniques. He mentioned Retrieval-Augmented Generation (RAG), and agent-based systems where chains of models are used to perform tasks.

Given the audience, he also had some points about generative AI for chip design. There are many possible use cases:

Creating chat bots that answer questions from engineers, based on company data and code.
Chat bots that help with coding and tools usage, including command-line generation.
Processing bug reports.
Support design-space exploration, code generation, documentation generation.

There is a bit of a lack of good datasets for hardware design (this was also noted by Sigasi in the exhibition hall, when we discussed the application of LLMs to RTL/HDL coding).

Panel: Digital Transformation in Automotive, Expectations vs Reality

As previously discussed, DVCon Europe had only one panel this year. The goal was to create a less busy program with more time for networking and visiting the exhibition. The topic of the panel was the digital transformation in automotive, with a particular eye to artificial intelligence.

The panel featured Andreas Riexinger from Bosch, Ralph Schleifer from CARIAD, Manfred Thanner from NXP, John Kourantis from ARM, and the keynote speaker Erik Norden from Zyphra (this list is in mirror order from the photo above). They were seated in easy chairs on the stage, creating a very different vibe from previous conferences where we put them behind grey tables. This was an idea I got from my experience at the Embedded Conference Scandinavia previous this year. Maybe we should have had some whiskey or cognac on stage as well to really drive home the living room feel.

The panel roved across several topics.

Software first was first. This is a big topic for the “traditional” automotive companies. There is a sense that Tesla and some Chinese companies are already doing software-first design, and that the other companies have some catching up to do.

It has to be accepted practice to add features that were not designed in from the start, as part of software updates.
The industry cannot use the classic spec-RFQ-delivery model, as that is too slow and does not fit with continuous deliverables.
Software from suppliers cannot be binaries that get stitched together, but has to be developed and delivered with much more cooperation. OEMs have to work iteratively with the ecosystem – not file change requests for every detail, but find more cooperative and classically Agile ways of doing software together.
Change is hard, and the the Chief Software Officer (that companies should have) might have a hard time fighting against Chief-other-things with a hardware mindset.
OEMs have to have both a silicon strategy and a software strategy!
Learn methodologies from data center software practice?
Need more speed in development, it must be fun to work in automotive! To be honest, I think this last point cannot be overstated. I absolutely hear that a lot from people who have been in automotive. Processes are too slow, and it is not at all like working in “true” software organizations.

Collaboration is a key aspect of the future. There must be a consideration of partnerships and ecosystem strategies for automotive software.

Standards has a place to serve to make for more interoperable parts.
With multiple operating systems in a car, robust ecosystems are necessary.
The open-source ecosystem model is attractive for what it has delivered in other markets, but it is quite different from classic automotive. I guess doing open-source well comes back to being software-first.

Digital twins. I.e., simulation-based engineering approaches or virtual platforms.

Shift-left/pre-silicon is a solved problem, technically. The question is business agreements and managing the lifetime. Need to keep it alive for 5-10 years after the first car with a certain hardware ships = total of upwards of 20 years of support for a model. That is not how things are typically done.
Digital twins means different things to different people. A mechanical engineer might not care about running software on a VP, enough with a mechanical system. While a software engineer might have the opposite view.
Cannot just have a single one-model-to-rule-them-all digital twin for a complete car! Plenty of examples from other fields where multiple simulation variants are used. Thus, we will see multiple digital twins depending on the use cases.

Digital twins are intertwined with the idea of running the models in the cloud.

One idea is to use cloud-provided models to get started early on software for new platforms. Could be “just a chip” virtual platform or a whole simulation setup. Or collection of simulators somehow talking to each other across different cloud hosts (this sounds easy in theory, hard to do efficiently in practice).
If you need to protect IP, that might be easier to solve in a cloud setup. I think that makes sense.

AI will be pervasive in a car (AI defined broadly to include classic ML). The comments here echoed what Thomas Böhm said in his keynote.

AI for the user experience, better voice commands, etc.
AI for autonomous driving
AI for fundamental control
You need standards and guardrails to use it well, find ways to control/limit to what it does. How can genAI be used in cars, given the requirements from certification? Not clear how to do that.

AI can also be used in the development process.

For developers, use AI/LLM/RAG to access internal information instead of going to ask senior people.
Generative AI can be used to create training data for ADAS – creating realistic environments for training, probably running in the cloud. Not a replacement for physical tests. But a good way to increase the testing volume for ADAS. Generative AI instead of classic simulation.

As the answer to an audience question, a very interesting point was made: Maybe the problem is not software-first but business-first if you want to take advantage of digital technologies. It can take more time to get contracts in place than to build the technology. Especially for shift-left and software acquisition, where time is critical to its usefulness and value. Industry processes must be made leaner.

The DVCon Challenge

The DVCon Challenge was a new idea for 2024. Organized by RPTU (Rheinland-Pfälzische Technische Universität) in Kaiserlautern, the idea was that developers and students should solve a small power modeling problem in SystemC before the conference, and then a winner would be announced at the conference. Unfortunately, attendance was rather thin, and instead we had a short guessing game at the conference.

*Somehow, this combination of Weizen and high-tech feels prototypically Bavarian.*

The reception we had for the competition went ahead anyway, and the food and drink was good as always. The presenter from RPTU did a really good job with what he had to work with, kudos to him.

Artificial Intelligence in the Exhibition

There were 18 exhibitors this year and the exhibition was busy as always. It was the usual mix – the big EDA vendors, some smaller tool vendors, and quite a few services companies. AI was in almost every booth! As part of the organizing committee, we extend a huge thanks to the exhibitors.

*Busy exhibition floor on the first day of the conference.*

Here are some of the exhibits that caught my eye.

The El Camino Design House had a setup with an FPGA running two real-time image processing algorithms (i.e., AI). One to determine the age of the face seen by the camera, and one to determine the mood. The age determination appears to have been tuned to make visitors happy by showing a rather low-balled number… Still, a nice demo of real-time processing on an FPGA.

Sigasi was going all-in on the semicolon theme. Nice slogan. They had moved their entire user base from Eclipse to Visual Studio code, which is a brave move that I would have hesitated to do. It appears to have worked. They also demonstrated that you could invoke generative AI (i.e., LLMs) from within the environment – but they did not have any LLMs trained on hardware design languages to offer. There is a known lack of training data in that space.

It should be noted that the Mathworks have managed to train an LLM on Matlab code (maybe there is enough Matlab code out in the open that it just works) and is providing this as a tool to their users. If you have a tool with a specialized language, providing a trained LLM for it is probably going to be standard in a few years.

Tiny Tapeout

There is one last thing to mention from the exhibition floor. Matt Venn from Tiny Tapeout stood out in his yellow baseball cap and electronic chain.

Tiny Tapeout is a very cool concept. It takes the idea of shared shuttles/multi-project wafers and divides it one more time, making it very affordable to get your HDL design burnt into a chip. It uses a single chip from a multi-project wafer and splits it into “tiles”. With some multiplex logic, each tile can be accessed and used on its own as a virtual chip. The Tiny Tapeout website contains full information on the chips that have been submitted so far. Designs can be make use of multiple tiles, allowing quite complex things on the order of processor cores to be implemented!

It is cheap enough (as low as 150 USD, apparently) that universities can use it in courses and that dedicated hobbyists can get things manufactured for real! It takes half a year at least to get from design to chip back, so some planning is needed.

The flow is completely based on open-source EDA tools and relies on the open-source Skyware 130nm PDK.

Some Notes on the Papers

In the last part of this blog, I will quickly recap a few paper presentations that I happened to attend and found interesting. In total, DVCon Europe 2024 featured 55 papers presented in five parallel tracks, so there is just no way that a single person can look at even a fraction of all of them. Instead, I recommend everyone to check out the DVCon Proceedings archive for past papers. Papers from 2024 should show up early in 2025, three months after the conference.

Best * Awards

DVCon Europe has always featured a best paper award. Last year, with the introduction of the research track, it was split into research and engineering papers. This year, a best presentation award was added, voted on using an entirely new system where attendees would score presentations that they attended in the conference app.

Lucas Deutschmann from RPTU Kaiserslautern-Landau won the best presentation award for “Formal RTL Sign-off with Abstract Models”. Vishal Chovatiya at al from Infineon received the Best Engineering Paper award for “Addressing Fixed-Point Format Issues in FPGA Prototyping with an Open-Source Framework”. Francesco Enrico Brambilla et al from CERN and KU Leuven won the best research paper for “Virtual Prototyping Framework for Pixel Detector Electronics in High Energy Physics”.

Connecting UVM Objections to Software

“uvm_objection – challenges of synchronizing embedded code running on cores and using UVM” by Yassmina Eliouj, Vasundhara Gontia, Sefa Veske, Shripad Nagarkar, Tobias Thiel, and Joachim Geishauser from NXP is about how they use UVM objections from target embedded software.

“UVM objection” is a standard mechanism in UVM (from what I understand), used to signal when a UVM testbench can consider an execution phase completed. The novelty in this paper is connecting the use of this feature in the testbench to what is happening in the target software running on embedded processors. Essentially, the idea is to let the target software talk directly to the testbench. Like this:

The key question is obviously how the target code breaks out from running inside a processor core to the outside world. This is hidden behind the NXP “CAPI”, i.e., C API, used in their on-target-processor test code. The precise implementation varies depending on the target. They mentioned it could be based on writing magic values to regular memory, or the invocation of mailbox hardware in the simulated hardware. If the same code is run on the host, outside the RTL simulation, System Verilog DPI is used instead.

Nothing particularly complicated, but a good and useful idea and implementation.

Containerized Virtual Platforms

Tim Kraus, Axel Sauer, and Ingo Feldner from Bosch presented a paper about how they used containers to provide easier access to virtual platforms (“Deployment of containerized simulations in an API-driven distributed infrastructure”). The primary goal of the exercise was to make it easy to compare different processors, by executing benchmark code on models of said processors.

Their SUNRISE (Scalable Unified RESTful Infrastructure for System Evaluation) infrastructure looks like this:

Essentially, the system puts a working instance of each model into a container image for ease of deployment. It also provides a set of standard actions in the “Eval API” that are mapped to the commands needed to run the virtual platform model being used. By keeping the action set very small and basic, it is quite easy to map to any reasonable model. The setup can be used for CI/CD in addition to benchmarking, obviously, and it lends itself to parallelizing tests by running many instances in parallel (provided there are licenses available for the case of commercially licensed models).

This was a very narrowly targeted use case, compared to what I expected coming into the session (having written about this topic back in 2019). There was no attempt to support interactive model usage or debug of the code running on the models. It was all batch mode and collecting results.

Reverse-Engineering with Qemu Libtcg

The most unexpected paper at DVCon Europe 2024 must have been “Accurate lifting of executable code using QEMU” by Anton Johansson and Alessandro Di Federico from rev.ng (I guess it means Reverse Engineering?) in Milano. The paper was unexpected in the sense that it is quite far from many other DVCon papers.

Basically, doing reverse execution requires analyzing binaries and reconstructing control flow, data flow, and operations. Existing tools tend to be fairly limited in architecture support, since they all have to implement support architecture-by-architecture. To get around this problem, they are using the Qemu Tiny Code Generator (TCG), wrapped behind a library they call “libtcg” as a general-purpose portable disassembler for static program analysis. The Qemu TCG converts target binaries to its own internal “intermediate code”. And by performing analyses on this code, you get a portable analysis tool! Clever hack.

The paper contained this nice example, showing how they can do control-flow-graph reconstruction on code for the rather exotic Qualcomm Hexagon DSP (exotic in the sense that you cannot expect wide-spread tool support for it):

I think their code is on github.