ChatGPT and Simics

It is an understatement to say that ChatGPT has been a hot topic since it was launched a few months back. Everyone seems to be seeing what it can do in their favorite domain, so I had to try it on what I work with, Simics and virtual platforms. The results did not live up to the hype some people think the technology deserves, but it was very impressive and a little scary nevertheless. This is the first post in what looks like it will be a series about ChatGPT. Update: Part 2, ChatGPT and Code, is now out. Update to the update: Part 3, a critique of ChatGPT has been published.

The OpenAI FAQ states that ChatGPT is an application of their GPT3.5 language model, making it behave specifically as an interactive assistant (or dialogue). It has been trained on a massive amount of data scraped off of the Internet, producing a model that can produce really good text in English (and a few more languages, unclear just which it has learnt enough about to the useful).  The FAQ makes it clear that the model is a trained model that tries to produce good text, not facts, and it is not a knowledge model:

It is important to keep in mind that this is a direct result of the system’s design (i.e. maximizing the similarity between outputs and the dataset the models were trained on) and that such outputs may be inaccurate, untruthful, and otherwise misleading at times.

This statement is refreshingly honest and important to keep in mind when using ChatGPT.

That said, let’s ask it some questions and see what comes out.

ChatGPT, What is Simics?

The obvious first thing is to ask it what Simics is.

Q: What is Simics?

A: Simics is a full-system simulation platform developed by Wind River Systems (now part of Intel). It allows users to simulate the behavior of entire computer systems, including hardware and software, in a virtual environment. This can be useful for a variety of tasks, such as testing and debugging software, analyzing system performance, and studying the behavior of complex systems. Simics supports a wide range of architectures and operating systems and can be used for both research and commercial applications.

Nice marketing blurb. Sounds close to something I could have written – which is not really all that surprising since I would guesstimate that half the text out there about Simics comes from my hand. This shows the effect of the learning – it is trying to look like existing text about Simics, and that text has a tendency towards a certain style.

The answer also contains some factual errors. Simics was not developed by Wind River Systems, and Wind River is not part of Intel. That is says this is not all that surprising, since it used to be the case that Wind River was a part of Intel – there is quite a bit of old text out there that says this. ChatGPT does not care – it is not a fact digester, or fact checker, and it does not go and check sources for the current status.

What is Simics, Variant 2

Rephrasing the question a little bit, the answer is similar but longer. And has some new faulty information.

Q: What kind of product is Simics?

A: Simics is a software simulation platform developed by the Swedish company Wind River Systems. It is used primarily for system-level simulation of complex hardware and software systems, including embedded systems, mobile devices, and data centers. Simics enables developers to test and debug their software in a virtual environment that accurately models the target hardware, allowing for early detection of bugs and other issues before the software is deployed in a real system. Simics is also used for training and education purposes, as it allows users to experiment with different configurations and scenarios in a safe, simulated environment.

The use case list sounds like a mashup of marketing pitches put out by Virtutech, Intel, Wind River, and other providers of virtual platforms over the years. This kind of information collection could be a good use case for this type of technology, providing a starting point for writing some generic copy. However, you never know if it actually captures the most relevant use cases, or if there is something important that it missed. The writer would have to check each one, and consider what is not there. T Intention and knowledge are really important when writing good texts, and ChatGPT has neither.

The first sentence is slightly off the mark. As noted above, Simics is not developed by Wind River. More interestingly, Wind River is not a Swedish company, and I am fairly sure there is no text out in the world that says so. At a wild guess, ChatGPT mashed together text fragments like “Simics was developed by” and “Simics is sold by” together into a grammatically correct but factually incorrect sentence. Simics was developed at the Swedish Institute of Computer Science (SiCS), so it does indeed come from Sweden. Since 2010, it has been owned and developed by Intel. Wind River sells the product commercially. All of these could be mixed together to produce that sentence.

It could also be noted that Simics has never really been applied to mobile devices.

A final note on the text produced is that the developer of the product is largely irrelevant to this particular question. If you asked a human, the answer would be just “a virtual platform” or “a simulator” or something like that. I think that ChatGPT exhibits a clear propensity for wordiness and extraneous facts. It is simply not a very good writer.

References for That?

One common (mis)understanding is that ChatGPT can be used as a research tool. The output does make it look as if it contains a knowledge base or knowledge concept. The rather “sure of itself” style reinforces this. But it is not a knowledge system, and it is not trying to actually provide knowledge.

Given the factual errors in the above output, I asked it to produce a set of references that it had based its answer on (knowing full well that that is not at all how it works). The results are interesting.

Q: Do you have references for that?

A: Sure, here are some references that provide more information on Simics:

Wind River Systems, “Simics: A Complete Simulator for Virtual System Development.” https://www.windriver.com/simics/

“Simics” Wikipedia page. https://en.wikipedia.org/wiki/Simics

“Introduction to Wind River Simics” whitepaper. https://www.windriver.com/products/simics/introduction-to-simics/

“Simics: System Simulation for Embedded Software Development” book by Daniel Aarno and Jakob Engblom. https://www.amazon.com/Simics-Simulation-Embedded-Software-Development/dp/0124045762

These resources provide detailed information on the features and capabilities of Simics, as well as examples of how it is used in industry and academia.

Well thank you, you found the book we wrote about Simics back in 2014! However… if you actually go to these links and review the contents you realize that they contradict the statement made above. In particular, the Wikipedia page for Simics:

If you look closely, the above output has a bit of weasel-wording in it. It does not actually say that the prior statement was based on this. Rather it says “here are some references about Simics”. This kind of verbal slight-of-hand is quite common in its output, which I guess unfortunately reflects how people write and debate things on the Internet in general. ChatGPT reflects back some of the less ideal aspects of humanity.

Let’s ask the question to the bot:

Q: Is the above text taken from those references?

A: The text I provided is based on my own knowledge and understanding of Simics as an AI language model. However, the information is consistent with the information presented in the references I provided. If you are looking for more detailed or technical information on Simics, I would recommend consulting those sources directly.

Well, no it is not consistent with the information in the sources. The model keeps insisting that it is right. And it talks about its knowledge. Despite the fact that it has no idea what it is talking about. It is just producing text that looks like a dialogue with a human, based on its training data. It sounds good, but sounding good and being correct are two entirely different things. This is a key problem with ChatGPT.

ChatGPT would do well to be retrained using a more humble and self-critical corpus.

Write Some DML Code

Another popular use of ChatGPT is as a software code generator. Having been trained on a large set of programming languages and code, it can generate pretty decent skeletons for many types of code. Not to mention useful malware. My sense is that if there are enough examples of something around, it can mash it all together into something useful and mostly consistent.

This looks like a technically reasonable use of systems like ChatGPT, since domain is quite constrained compared to general knowledge, and errors in the answer will be caught by the compiler and programmer. You cannot get away with the code equivalent of claiming that Wind River is a Swedish company. It would not compile.

So, let’s ask ChatGPT for some Device Modeling Language (DML) code:

Q: Write a simple device model in DML

A: Here’s an example of a simple device model written in Device Modeling Language (DML):

model MyDevice {
    address 0x0 {
        register reg1 {
            size = 8;
            access = rw;
        }
        register reg2 {
            size = 16;
            access = rw;
        }
    }
}

This model defines a device called “MyDevice” with two registers, “reg1” and “reg2”, located at address 0x0. The “size” attribute specifies the number of bits in each register and the “access” attribute specifies that both registers can be read from and written to (rw).

DML is a language used to describe device models in a machine-readable format. It allows for the creation of accurate and efficient device models that can be used in various system-level simulation and verification tools such as virtual platforms, system emulators, and co-simulation environments.

The explanation part is mind-blowing. It is one thing to generate some code, but quite another to explain it in natural language. I have no idea if the explanation or the code comes first, but it seems that programming code generation in ChatGPT is somehow based on a knowledge model, I can’t explain it otherwise. It has clearly been instructed to explain the code that it generates.

The second paragraph of the output gets back to the wordiness observed above. ChatGPT adds on extraneous text that sounds good and is related to the topic at hand, but is not something that was actually asked for. It is irrelevant to the code.

To be honest, it sounds like the kind of text you might see from students who are not very good at writing and who do not know all that much about how to pull off a good argument. Adding random bits of information related to the topic at hand fills the page and maybe if you are lucky the teacher grading it will think you are smart.

However, the code is not valid DML code, but it “sounds like it could be” – you can see how the language model produces text that looks code-ish and features typical features of DML like domain-specific keywords.

When I point out the ChatGPT that the code was not proper DML, it gets defensive:

Q: That DML device model did not look correct

A: I apologize for any confusion caused by the DML device model that I provided earlier. DML is a powerful and flexible language for describing device models, but it is a proprietary language and its syntax and structure may change depending on the tool or the company that provides it.

The example I provided was a simplified version of a device model in DML and it may not be fully compliant with all the features and functionalities of DML. It was intended to give an idea of how DML can be used to describe a device model.

It’s important to consult the official documentation of the specific DML implementation you are using, and to verify the correctness of the model with the tool’s developers or technical support team.

Please let me know if there is anything else I can help you with.

This is a really wonderful way of saying “actually, I have no real idea what I am talking about”. Note the statement that “its syntax and structure may change depending on the tool or the company that provides it.” Well, no. The point of a language definition is that it is the same regardless of that… It is technically correct that DML was a proprietary language back in September 2021 when the model was supposedly trained. But that is no longer the case.

The final sentence is honestly a bit snarky. Like “you are so dumb, but anyway here is my explanation”. Sounds like an exhausted support person faced with a really dumb user. Not like a honest mea culpa.

Asking it again to explain itself, it digs in deeper:

I apologize for any confusion. To my knowledge, DML is not a widely used language and there is only one implementation of DML, which is proprietary to Wind River Systems (now part of Intel) and used in their Simics tool. The DML I provided earlier was based on my understanding of the language from the information available and it might not be entirely accurate.

That “Wind River Systems (now part of Intel)” keeps coming back in this dialogue. It looks like ChatGPT thinks that you have to say that each time you say Wind River.

The not widely used I can certainly agree with in comparison to languages like C or Python. It also shows the fundamental limits of a trained model like this – it can only be as good as its training set. I don’t think it can generalize between programming languages like a human programmer would. But it is hard to know.

Writing a Timer Device

The above example might not have been entirely fair to ChatGPT. You would typically be more specific when prompting it to write code.

Q: Can you write a Simics device model that posts events for a timer?

A: Yes, I can write a simple Simics device model that posts events for a timer in DML. Here’s an example of a timer device model:

device timer {
  attribute uint64_t interval;
  event tick;
  function void start() {
    post_event(tick, interval, 0);
  }
  function void stop() {
    cancel_all_events(this);
  }
  function void tick() {
    post_event(tick, interval, 0);
  }
}

The logic makes sense at a high level. You need to have an interval. You need to post events. The callback for an event posting needs to re-post the event. The syntax and lexical structure are not correct in any programming language, but reading the code like pseudo-code it does capture the essence of a timer model. However, it does not capture the device model aspect. It is entirely missing how software interacts with it. I.e., there are no registers defined.

General TLM Generalities

I have found that ChatGPT tends to produce answers that are a bit vague and general. It is always a bit soft around the edges and lacks the precision a human expert would provide. Sometimes, the answers border on hand-waving, trying to hide a lack of knowledge behind a curtain of words. Given that what are looking at is text is generated from a language model, and not from actual knowledge, this is expected.

For example, asking for some advice on how to build a transaction-level model. Specifically, a network model:

Q: What is included in a transaction-level model of an Ethernet network?

A: A transaction-level model (TLM) of an Ethernet network typically includes the following components:

Network nodes: The nodes in the network, such as switches, routers, and end devices like computers.

Network interfaces: The interfaces on the nodes for communicating with other nodes, such as Ethernet ports.

Communication channels: The physical medium over which the nodes communicate, such as coaxial cable, fiber optic cable, or wireless.

Protocols: The rules and standards that govern communication between nodes, such as the Ethernet protocol.

Message transactions: The transfer of data between nodes, typically modeled as a sequence of request and response transactions.

Timing: The timing behavior of the transactions, including delays, latencies, and clock frequencies.

System behavior: The behavior of the network as a whole, including network congestion, network contention, and other performance metrics.

In a TLM model, the focus is on modeling the behavior of the network at the transaction level, rather than at the signal or bit level. This allows for a higher-level and more abstract representation of the network, which can be more efficient for modeling and simulation purposes.

Good? Not really. While the English is good and coherent, the content is actually misleading. In fact, a TLM model should totally ignore the physical medium and most latencies as well as congestion. The talk about protocols and message transactions is kind of correct, but also totally useless if you want guidance for what to code. I.e., it is not a replacement for knowing your domain and how to solve problems in that domain.

These kinds of answers showcase a core issue with ChatGPT: form hiding deficient function – just having a computer generate that kind of language is so impressive you might forget to stop and question what it actually said. If a human wrote the above, you might be inclined to think that they knew what they were talking about. ChatGPT demonstrates the importance of being skeptical, even when statements sound authoritative.

Monthy Python

For some reason, I wondered what ChatGPT would say about the airspeed velocity of an unladen swallow. It told me this was a famous question from the Monthy Python and the Holy Grail. And that it obviously depends on a lot factors just how fast a swallow is (basically recapitulating a large number of articles you can easily find online discussing the problem).

Then I asked it to recite the famous dialogue. And it did. But with an odd glitch:

Certainly, I can recite parts of the conversation from Monty Python and the Holy Grail. Here’s one of the most famous scenes from the movie, the “Swallow” conversation:

King Arthur: “What is the airspeed velocity of an unladen swallow?”

Guard: “What do you mean? An African or European swallow?”

King Arthur: “I don’t know that.”

[The bridgekeeper is thrown over the bridge by an unseen force.]

King Arthur: “How do know so much about swallows?”

Guard: “Well, you have to know these things when you’re a king, you know.”

That is the correct dialogue. But with the wrong people ascribed to each line!

Somehow, one part of the model remembers the dialogue it was trained on, while another part interferes and inserts the wrong character for each line. The model of “this is how you write a dialogue between two people” gets combined with “a set of lines of a movie”. Bizarre. Utterly bizarre.

A good demonstration of the unknown depths of large language models. There are things going on here that we do not yet understand.

A Reflection on Compute Costs

An additional question that comes to mind is just what it costs to generate the replies from a system like this. Running inference on a giant model is not free. TechCrunch provided this nugget:

According to OpenAI co-founder and CEO Sam Altman, it costs a few cents per chat to run ChatGPT, OpenAI’s viral chatbot — not an insignificant amount considering that ChatGPT had over a million users as of last December.

Hard to map that back to actual compute time, since the cost of doing things in a cloud service depends on many variables. ArsTechnica sites reports that indicate that running a chat-style search bot instead of current searches costs about ten times as much.

I can understand why OpenAI wants to charge for access.

Final Words

ChatGPT is very impressive. What it does is just mind-blowing. That something this intelligent-seeming results from feeding a giant neural network with giant piles of text is nothing I would have ever expected. Sometimes it looks comfortably like other trained systems that look for good matches in the training set, and sometimes it does things that are totally out there.

At a philosophical level, what I believe is going on here is an example of emergent behavior. The system as a whole is exhibiting behaviors that were not predicted by the creators, just from being sufficiently large, complex, and layered. While I don’t see Skynet here, it is clear that the GPT3.5 system has become something more than just a classic machine learning model.

Training the system on human language has somehow imbued it with an ability to simulate human thought processes.  The fact that it is set up by giving it initial prompts instead of writing explicit program code to run it in a particular way is totally fascinating and totally weird. It is science fiction.

That said, the current ChatGPT it is not as useful as many people seem to think. Yes it can produce some very good text given prompts. But fundamentally, it has no idea what it is talking about. You cannot take anything it says at face value. It lacks self-criticism and produces answers that carry the style of confidence and facts – while being possibly totally wrong.

Language models would seem to need to be somehow combined with knowledge models to be truly useful. Something is needed to keep the output factual, stop it when it veers off into territory with weak information, and prevent it from hallucinating. I have no idea what that would look like. But it is needed.

9 thoughts on “ChatGPT and Simics”

  1. I am using ChatGPT as a software consultant. Maybe I asked it 10 questions related to two pieces of code that now runs on our web site. That’s one dollar of consulting, if every question is ten cents. That’s cheap. For both issues I tried other means first. ChatGPT immediately understood what I was after and supplied code.
    I would call the code ”snippets that I could not write, but I can see they’re right”.

  2. I would be curious to see if the example are going to be improved once chatGPT has access to the full SIMICS documentation and the community.

  3. Jakob, shouldn’t we consider GPT behavior very much “human-like”? It makes convincing statements for the stuff it “heard from someone once”.
    It’s very close to what journalists do. If I’m not an expert in the topic they are talking of (say politics, Mars exploration, military strategy or Egyptian ancient characters meaning) I have a tendency to believe them. But once they switch to, say, electronics – I have a red lamp flashing in my head. Obviously the rest topics contain lie either – I just can’t detect it.
    So GPT to me looks pretty much acting in the same way. It provides you with “interpretation” which is based on “some facts” it heard of. I guess the reason may be quite simple – it does what data set covers. The dataset is Internet – thus the result.

  4. Agree with all of the above.

    Except that ChatGPT is not really providing an interpretation of some facts. It has no fact model, it is just generating text that looks like the text in the training set. It is worse than supposed experts in that way, in that it even does not believe in what it says. It just says something the evaluation function said was a good-looking answer in the training.

    So I would not think of it is as human-like. More like human-mimicking. It is an alien that just learnt how to sound human from the Internet. If we deleted that training and only trained on some more focused sets we could likely get very different behavior. If it was only public statements from politicians it would likely tend to evade anything too concrete and stick to high principles. Train it on military manuals only and it would be very concrete and aggressive.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.