I attended a short Tech Tuesday morning session at IBM here in Kista, Sweden, where IBM presented their WatsonX-AI and related technologies. Tech Tuesdays is a monthly technical event organized by Kista Science City, where companies in Kista present some aspect of their technology in a 30-minute session. IBM managed to get an impressive amount of content into that time!
IBM’s AI play is a software and services play. I am quite fond of IBM big-iron hardware like the IBM-z, IBM-i, and Power servers. They were totally missing here. Instead, the idea is to run the AI software on top of RedHat OpenShift. Allowing a user to run on IBM-z, as well as in the public cloud, their own data centers, or edge computers.
IBM is packaging up generative AI/large language models/foundational models in a way that should make it easier for large organizations to adopt the technology. Just what this means was the focus for the session.
The governance part of the IBM AI strategy stands out as quite different from what you usually see from happy AI hackers like OpenAI. It is not just about using models or even indemnifying users, but about finding a way to “govern” AI in a way that makes it more likely to do the right thing and less likely to cause problems.
The presenter talked about two types of AI: “traditional AI” and “foundational models”. The two variants are equally important, but cover different use cases:
- In traditional AI, i.e., big-data machine learning, you start by selecting an algorithm, then train it on your data, and then you apply it. This is applicable when you have a narrow use case, large amounts of data, and low tolerance for risk. A classic AI model will not hallucinate and it will not introduces Nazis into a conversation.
- With foundational models, you start with a pre-trained model and fine-tune it with custom training data. You can get useful results with quite small amounts of training data, thanks to the power of the foundational model. But you have to take your chances on the foundation model as that does not just contain your data. The risk is higher, but it also opens up new applications.
IBM is working to make foundational models trustworthy and usable, by applying governance to AI. By curating and tracking the origins of the training data for a model, it is possible to reduce the risk of unforeseen results and inappropriate output. The governance extends to tracking the output of the models in production, allowing adjustments to be made in case the models do not perform as expected.
Both these aspects make sense given what we have seen from LLMs so far. The training data shows through when using them, and we know that output must be carefully scrutinized for correctness and bias. You can never trust the output of an LLM without an independent fact check of some kind.
Using Large Language Models
They also showed a demo of how an LLM within WatsonX-AI could be used. Their example was a claims processing application for an insurance company, and to me the key aspect being demonstrated was the role of the prompt in making an LLM into a good backend engine. It is well-known that prompts are important, but the IBM system really took it to a new level. Essentially, they have a development environment that structures the process of setting up prompts.
This extends the customization stack I presented in my blog post from the RI.SE AI day:
The two top levels are new:
- A custom application is more than just an LLM – it will contain connections to databases and other systems. The LLM is a subsystem used to perform certain tasks in the application, and it is actually not directly exposed to a user. It hides behind a button like “auto-summarize this application.”
- The prompt engineering is crucial to good results. A simple button in an application can hide a very long and sophisticated prompt. The prompt is not just a few sentences, but rather the equivalent of coding in natural language.
In the example IBM showed, they set up prompts that encapsulated the idea of showing the model examples of input and output and how to reason. In their example, the model was prompted by an example free-text claims form (submitted by a customer) and how it would be translated into a structured document. Essentially, a form of chain-of-reasoning prompting.