What’s the Obsession with C in EDA? – Observations from Uppsala

In early July, Cadence announced their new “C2S” C-to-silicon compiler. This event was marked with some excitement and blogging in the EDA space (SCDSource, EDN-Wilson, CDM-Martin, to give some links for more reading). At core, I agree that what they are doing is fairly cool — taking an essentially hardware-unrelated sequential program in C and creating hardware from it. The kind of heavy technology that I have come to admire in the EDA space.

But I have to ask: why start with C?

Productivity by Abstraction

The motivation given in the marketing materials from Cadence is “productivity”, up to ten times more productivity or 90% of design time reduced. The key idea appears to be that C/C++/SystemC is more abstract than Verilog/VHDL, and therefore more design is produced in less time (see Cadence C2Silicon Datasheet).

Similar (in spirit) products have similar motivations, with various numbers of how much better things are. For example, Forte’s Cynthesizer claims “2X-4X faster implementation from spec to netlist over RTL“.

Materials for Mentor’s Catapult tool says it the best:

Using industry standard pure ANSI C++ to describe functional intent, designers move up to a more productive abstraction level for designing complex ASIC or FPGA hardware typically found in next-generation, compute-intensive applications.

And

Single C++ source unites system designer and hardware designer

So the idea from a large part of the EDA community seems to be that using a C-family language offers benefits of abstraction as well as a way to communicate with software people.

Is C the Right Answer?

What is funny with this obsession with raising the level of abstraction is that you keep ending up in the C family of languages. C, which is considered a “high-level assembler” by many people, and even worse, C++ which is a hard-to-parse semantic nightmare that most CS people that I know would rather do without.

I was reminded of an old joke from my early university CS days, about how to shoot yourself in the foot in various languages. The part on C++ is especially telling. From http://www-users.cs.york.ac.uk/susan/joke/foot.htm:

C: You shoot yourself in the foot.
C++: You accidentally create a dozen instances of yourself and shoot them all in the foot. Providing emergency medical assistance is impossible since you can’t tell which are bitwise copies and which are just pointing at others and saying “That’s me, over there.”

So, putting on my Computer Science hat, I find the idea of using C as a raised level of abstraction would have been considered a bad joke when I was an undergraduate. In essence, C codies procedural programming as it was understood in the early 1970s, when compilers were very weak and a language that could do all that assembler could do was badly needed to write operating systems in a high-level language. In many ways, it was a step back compared to Pascal, Fortran, or Cobol. But it had the power to do anything, tended to result in faster code than other languages, and was available for more machines and operating systems than any other language. C++ then added objects (good), templates (good), but also tried to maintain the close-to-the-machine style of C (bad), multiple inheritance (complex), resulting in something very complex but decently useful and still being mostly like old C (despite having innumerable little detail differences in semantics compared to plain C).

So for a lot of good reasons, C/C++ is the de-facto standard language when you actually get down to the gritty job of getting a new language or operating system to run. It is the language to implement run-time systems, operating systems, new interpreted languages, etc. For the embedded space, C/C++ is often the only language available for a particular chip/OS combination. It is usually the best supported with the most compilers, the highest investment in compilers, and the most users of the compilers.

I have worked for a C compiler company called IAR Systems, and I appreciate the great engineering effort, skill, and pure intellectual fun that goes into creating C compilers that generate code that work well on resource-constrained embedded systems (doing C on an Intel 8051 is no mean feat). Compilers, that are good enough to wean embedded people off of assembly language.

But today, in general, I think that C/C++ is not the language anybody choses for a project if the goal is abstraction and greater productivity. Instead, you go for languages that are much more productive and that raises your productivity a few times over plain C/C++. Some typical examples:

Languages that do away with memory management, like Java.
Languages that use virtual machine technology to ease porting across platforms, like Java, C#.net, Prolog, Python, Perl.
Languages that use dynamic typing or even duck typing, like Python and Ruby.
Languages that feature concurrency as a primary design features, like Erlang.
Functional languages with type inference, like ML and Haskell.
Graphical modeling tools that generate skeleton code, like UML.
Graphical domain-specific modeling tools that generate final code, like Matlab, Labview, and VisualState.
Constriant-resolution-based languages, like Oz.
Narrowly focused domain-specific languages, like CoWare LISA and Virtutech DML.
In-house very focused languages, usually not “Turing complete”.

When I was studying CS, the basic assumption was that languages are tools, not religion. Any computer science major worth her or his salt should be able to learn any language in a short time, and you should use the language most appropriate for the task at hand. Using the same language all over the place because it is a “standard” is absolutely inefficient, from a software programming perspective.

In practice, large software systems in the embedded and desktop space tend to be constructed from around ten or so different languages (typically, you find C, C++, Java, some macro expanders like M4, string processing and file generation in perl, funky makefiles, some scripting in shell script, Python, VisualBasic, etc.). This is not anarchy, it is professionalism. If you asked a carpenter to use a hammer for all tasks, he would be pretty sad — different tools are good for different things, and it is the same with languages.

So Why C in EDA?

Today, EDA companies are moving from hardware design into partially software design, as SoC designs become more complex and software becomes a greater part of the overall system value. In this process, C/C++ and SystemC seem to be the language of choice.

I find this strange, considering the proud tradition of language inventions that you find in EDA. VHDL and Verilog were uniquely new things when they appeared, languages to describe hardware on hardware terms, and not software on instruction-set terms. Later, you have more abstract language like Bluespec and HandelC. There is a tradition, it seems, of very large and sophisticated compilers that take complicated inputs and transform them into hardware.

Using C/C++ as the input language really makes no technical sense, as it is very hard to parse and understand well in general. The restrictions imposed by the requirements of synthesis limits the C you can input quite severly, if you ask me.

If one idea for starting with C/C++ was to take a piece of “generic” source code and then compile it either to software or hardware, depending on system partitioning, I cannot see that working too well. Software-C tends to use constructs that are not appropriate for hardware synthesis like pointers and recursion. Hardware-C does not look likely to generate particuarly elegant or simple constructs when compiled to software.

So since the code is still going to be special for doing synthesis, why not use an altogether more elegant language? That seems to be what SystemVerilog is about to some extent, and what BlueSpec, for example, are doing. Or the tools to do hardware from UML.

I really do not understand how C/C++ came to be seen as the answer to what software is. Maybe because EDA companies tend to meet with the lowest-level software engineers? And these engineers are certainly only using C since they are running on very bare hardware and cannot assume the existence of rich run-time environment and virtual machines.

Disciplined C

Note that I do like the core idea of C-based synthesis, which is using C with restrictions, discipline, and coding patterns to ensure quality final code and communicate programmer intent to the compiler.

When I was with IAR, I taught several courses on how to get good small code out of an embedded C compiler (see for example my ESC 2001 paper). Basically, it comes down to writing “boring” C code that looks a whole lot like Pascal, and which does not rely on fancy semantic intricacies or old rules like using “which” rather than “for”. The IMEC CleanC guidelines are quite recent, but follow much of the same ideas to enable automatic parallelization of code for MPSoC designs.

But doing this is really a work-around for a poor initial language, if you look at the problem without preconceptions.

The Sensible Starting Point

What I think would make more sense is to start with some higher level of abstraction and then generate software code or hardware design from it.

This starting point should really be a parallel language, in some way, shape, or form. Using a sequential language like C as the starting point is fundamentally broken in this age of pervasive multicore systems. Future software will be written to run on parallel machines as the common case, and programs should expose the natural parallelism present in the problem being tackled. Taking a naturally parallel problem, packing it into disciplined sequential C, and then having a compiler discover the parallelism again is really a huge waste of effort.

The starting language might not necessarily be explicitly parallel (no need to scream about Occam or Ada tasking), it could just be domain-natural like Labview (which has been proven to be compilable down to parallel code). I do think that local memory + message passing + built-in task handling like Erlang looks like a very good approach for hardware design and software design, as it makes it possible to use sequential descriptions where they make sense and expose natural parallelism where it makes sense.

The starting language should also be as simple as possible in terms of how many different ways you can express things. In C, you can do “i++;”, “++i;”, “i+=1;”, “i=i+1;” to increment a variable. Why have more than one operation for this? The original point in C was to support various machine operations directly in the source language, as you could not trust the compiler to figure things out. Today, compilers can figure these things, so there is no point to be able to express the same thing in more than one very regular and easy-to-read way.

Another important aspect of modern software engineering is to support quick iterations, quick changes, and agile and extreme programming theories. This comes down to languages and environments that work even when systems are incomplete, and where it is easy to stub things out and later put in the details. C/C++ does not do this terribly well due to the use of static typing, static checking, and lack of default implementatioins for things which have not yet been filled in.

The language should also be designed to run on some virtual machine, as that helps portability and understanding programs. It also makes it a whole lot easier to write a reference compiler to integrate the language into simulation environments.

Conclusion

So where does this put me? I think these are my main points:

C-to-hardware synthesis is pretty impressive technology
But why use C?
C is the EDA high-level darling
C is sequential and complicated
C is very low-level from the perspective of a software professional
A better input language should be concurrent/parallel and simple
A better input language should be designed for modern agile and extreme programming styles
EDA companies should really look at the leading edge of software engineering for inspiration, rather than what conservative embedded C programmers are doing.
VM-based languages are good

I guess this goes into the “rant” bin…

Jakob

Just about everything you say is true and easy to agree with. (Other domain-oriented specification languages such as the family of actor-oriented dataflow notations – Mathworks Simulink, Ptolemy, CoWare SPW are also worth a mention). However, the decision to use C is much easier to understand than by looking at the EDA companies marketing literature and the rationalisations about abstraction levels and productivity. Stated pure and simply, C (and C++) IS the market for ESL synthesis, and IS the market for a lot of embedded code writing. Most embedded software developers developing application code for media type applications use C/C++. Many of the standards being implemented start with reference implementations in C/C++. Most of the legacy code for embedded systems is in C/C++. And all this legacy and current practice shows a reluctance to consider alternatives and a persistence in continuing to use C/C++ that is both surprising and not-surprising at the same time. I do know that if there was a concerted and visible movement by embedded systems developers to move to other languages or input notations in a serious way, then it would garner a lot more attention. Periodically there is interest in trying to do something more directly with Matlab and the dataflow actor-oriented languages such as Simulink. However, since most of these toolsets generate C or C++ as an intermediate form, it is all too easy to rely on our old friends of C-based languages and keep building them into the flow.

Those who build flows and tools based on other languages share the optimism of “if we build it, they will come”. But too often in the past the new edifice is built and no-one has shown up! I fondly expect that at some time there will be a sea change and something new will become the new new thing and receive wide adoption. But I recommend that no-one hold their breath waiting for that to happen.

Grant

6 thoughts on “What’s the Obsession with C in EDA?”

Grant Martin says:

2008 July 23 at 22:43

Jakob

Just about everything you say is true and easy to agree with. (Other domain-oriented specification languages such as the family of actor-oriented dataflow notations – Mathworks Simulink, Ptolemy, CoWare SPW are also worth a mention). However, the decision to use C is much easier to understand than by looking at the EDA companies marketing literature and the rationalisations about abstraction levels and productivity. Stated pure and simply, C (and C++) IS the market for ESL synthesis, and IS the market for a lot of embedded code writing. Most embedded software developers developing application code for media type applications use C/C++. Many of the standards being implemented start with reference implementations in C/C++. Most of the legacy code for embedded systems is in C/C++. And all this legacy and current practice shows a reluctance to consider alternatives and a persistence in continuing to use C/C++ that is both surprising and not-surprising at the same time. I do know that if there was a concerted and visible movement by embedded systems developers to move to other languages or input notations in a serious way, then it would garner a lot more attention. Periodically there is interest in trying to do something more directly with Matlab and the dataflow actor-oriented languages such as Simulink. However, since most of these toolsets generate C or C++ as an intermediate form, it is all too easy to rely on our old friends of C-based languages and keep building them into the flow.

Those who build flows and tools based on other languages share the optimism of “if we build it, they will come”. But too often in the past the new edifice is built and no-one has shown up! I fondly expect that at some time there will be a sea change and something new will become the new new thing and receive wide adoption. But I recommend that no-one hold their breath waiting for that to happen.

Grant
Jakob says:

2008 July 24 at 20:26

Thanks for confirming my suspicion that the idea behind using C and C++ was to make use of existing code bases…

That embedded programmers are by and large wedded to C/C++ is well-known, but I see a very strong trend to move to Javam and UML in the control plane and model-driven architectures in control algorithms. Most such high-level tools generate C (and occasionally Java) to be able to compile to something executable, that is the best way to do it.

But what I am uncertain about is whether the code provided by embedded developers, generated from domain-specific languages, or obtained as reference implementation is a actually a useful start for synthesis. In my experience, it would likely not be, as all that code is not designed to fit a restricted easy-to-manipulate subset of C.

To me it would look like you would have to at the very least go through the code and do a lot of rewriting and fixing to make it fit for synthesis. Negating most of the gain of reusing “existing” code.
Grant Martin says:

2008 July 25 at 17:44

Excellent point! A lot of implementation code and reference standard code is pointer-full, irregular, and indeed has to be cleaned up for high level synthesis (HLS). Very little of it was written with HLS in mind. However, let’s also remember that even for new algorithms that an embedded architect or application expert might write for HLS directly, taking care ref. pointers, aliasing and access regularities for array structures, etc., is still likely to be written in C/C++, because that is what the developers are familiar with and trust. Thus shifting to new input mechanisms is a long term project of education, experience and trust.

So the gain in using C/C++ may be more the “mental reuse” by the developers of their worldview, than the actual reuse of the code (although even badly written code can serve as a reference model for rewritten HLS-friendly code, to confirm that it implements the same algorithm).

I also see moves to other languages for control, but control has also not been the forte (pun intended) of HLS, which has concentrated on dataflow oriented signal and media processing algorithms more. Bluespec seems to be an exception here, although its current most natural form is in an HDVL, SystemVerilog.
Amir says:

2008 July 27 at 07:04

Hi Jakob,

C to Hardware is also intended to target the FPGA Accelerated Computing space. For legacy HPC applications this is much less painful than hand conversion to Verilog.

I’ve argued on my blog that the spreadsheet should be the primitive model for parallel computing to replace von Neumann instruction streams. If you look back to last September, you’ll find a similar rant on the use of C for parallel programming. Excel+VBA is by far the most widely understood combined data-flow and control-flow environment. I think that because Excel+VBA dominates this programming model by so much that it’s overlooked by most experts and rarely considered in the same breath as Matlab+Simulink or Labview.

We posted code for a Python-Excel interpreter that performs non-blocking assignments and synchronizes data with your visible range in Excel (see the blog for info). We can only do trivial Excel to Verilog conversions currently so nothing terribly cool on the FPGA side.
Jakob says:

2008 July 27 at 09:45

Thanks Amir!

Excel is definitely something that I have overlooked as a “compute model”, even if to me Excel + VBA sounds pretty horrible to analyze and do some kind of static analysis on 🙂 Also interesting since Excel would seem to have a very opaque type system for data. I will look it up!

/jakob
Jakob says:

2008 August 4 at 20:07

Just found a pretty good online critique of C++, see http://yosefk.com/c++fqa/defective.html … thanks to http://x86vmm.blogspot.com/2008/08/why-i-am-not-c-programmer-ten-years-on.html for the tip.

This site uses Akismet to reduce spam. Learn how your comment data is processed.