Hi Local AI, Draw Me … – Observations from Uppsala

I recently built a new desktop computer, featuring an Intel ARC 770 graphics card (just to be different). The card is supported by the Intel AI Playground, which is a software package that makes it dead easy to run AI/large language models (LLM) locally on my GPU. I was curious as to just what this could do, as compared to the big AI models that run on cloud servers.

Intel AI Playground

The Intel AI Playground software is not easy to find if you do not know it exists. But just searching the Intel site or Googling for it will find it. It is a bit disappointing that the latest version of the program only works on Core Ultra 200V processors (i.e., Lunar Lake). The program does work on Meteor Lake, and therefore also presumably the upcoming Arrow Lake laptop processors (H, HX).

Running AI models locally it means that there are no limitations to their use, apart from how much compute my GPU can generate. Basically, it was a one-time cost to get a GPU plus the electricity used. This is refreshingly unlike cloud-based approaches where you must set up accounts and often have very limited capabilities for free, requiring paid subscriptions for substantial use.

The downside to running models locally is obviously that the models are quite a bit smaller (i.e., have fewer parameters) due to the limited memory and compute capacity you get from a single GPU. The question is just how much that matters.

I decided to test local AI vs cloud by doing image generation.

The Models

The Intel AI playground provides immediate access to a few open-source free models for image generation, text processing, and image enhancement. It is possible to download and use other models by manually pointing the program at the models, but I have not tried that.

For image generation, the AI playground comes with setups for two models from Hugging Face:

For comparison to “Cloud” services, I used the image generation function available from the Bing search engine. This appears to be the DALL-E 3 model from OpenAI.

Draw me a Volvo Car

I started with what I would have expected to be a difficult prompt to get right, as it included a certain brand name and just why would models know that? However, this turned out to be easy for all the models. I am not the best image prompt engineer, but it seems my local models have fairly short input windows anyways so fancy prompts are out of the question.

The prompt I used was:

“[Draw] a Volvo car in a city at night, with neon lights on the buildings. Moody. Dark. Street lights.” (The initial “Draw” is only needed to get Bing to understand what I want).

The Dreamshaper-8 model produced this:

Not bad at all. The model has clearly learnt the distinctive design elements that Volvo designers are at pains to incorporate into all their cars. I guess this proves that such recurring design themes can get captured by machine learning models.

Switching to the Juggernaut-XL-v9 model, something else entirely came out:

The style of the image is quite different, and the learned Volvo design language is from the 1980s. Still, it is recognizably Volvo-esque. Apart from the handle on top of it.

Rerunning the prompt generated a quite different image:

This is much more like a modern Volvo in design. The variation just goes to show that randomness is an integral part of LLM applications.

Asking Bing/Dall-E for an image, the difference is stark. The generated image is far richer in texture and detail, it gets the modern “Thor’s hammer” headlights right, and it even prints the Volvo brand name on several buildings, in a font that is at least reminiscent of what Volvo is using:

This appears as a clear illustration that bigger models are more capable, which is not news to anyone. I like the example, as it makes totally obvious compared to doing the same with text where “better” is harder to see.

Reverse LLM

For fun, I passed the second Juggernaut Volvo image through a Meta-Llama-3.2-90B–Vision-Instruct model to generate a description. I used the SambaNova AI Playground to access this open-source model as it is too big to run locally, at least on this machine.

In text:

The image depicts a Volvo car parked on the side of a street in what appears to be an urban setting, possibly at night. The car is facing away from the camera, showcasing its rear and right side. It has a sleek, modern design with a black body and silver rims. Notably, the car’s rear lights are illuminated, adding a pop of color to the scene.

In the background, several buildings line the street, their windows glowing with a soft, warm light. The overall atmosphere of the image is one of quiet sophistication, with the car being the main focus. The use of neon lights and the dark surroundings creates a sense of mystery and intrigue, drawing the viewer’s attention to the vehicle.

That is pretty good. A bit flowery language maybe, and a bit much of a value judgement.

I think it demonstrates that there are some aspects of “Volvoness” that are apparently universally known by all the models. Guess it helps to be famous. It is also noteworthy that rerunning this query tends to differ in terms of just precisely how the car is described. Sometimes, it is an “SUV”, sometimes “station wagon”, and sometimes, as in the variant quoted here, “car”.

The obvious next step is feeding the description back into the Juggernaut model. Most of them look like something close to the original, but sometimes strange things slip through.

Other generation runs resulted in the model ignoring the bit about “facing away from the camera”. It is a random-driven process, that is fine.

Draw me a Bzkrg Car

One AI behavior that I find fascinating across different techniques and applications is that a model will always give you something, no matter what the input is. In this case, given that it knew about Volvo cars, I wonder what would happen when given a non-existing random-characters string for the car brand.

Thus:

“[Draw] a bzkrg car in a city at night, with neon lights on the buildings. Moody. Dark. Street lights”

Dreamshaper:

Generic-looking car, with maybe a bit of a Ford vibe?

Juggernaut:

Another generic car. Note that the model would like to draw some text on the building in the background, but a small model like this does not seem to have enough information encoded in it to produce real text. It is a good example of the perspective that LLM training is analogous to lossy compression.

Bing:

That is honestly a pretty cool image. Funny to see random string of letters being considered a synonym for “Audi”! The model also switched into Chinese mode for the background neon. Maybe there is some correspondence between my random string and transcriptions of come Chinese text in its training data set. It is a good demonstration of how an LLM can make a very fancy bird from basically no feathers at all.

Resource Consumption

I was interested in seeing just what it took to run these models locally. I used the Intel Arc Control application to see what the GPU was doing:

Activating the model is clearly visible in the GPU clock speed, load, and associated measurements like power consumption and voltage. In the screenshot above, the GPU is shown using about 124W to run the model.

In terms of memory, loading the Dreamshaper model takes about half the 16GB of VRAM on my graphics card (I guess the creators of the AI playground have selected models with this very property). Once a model is loaded, it stays in GPU memory until it is activated with a query. Switching the model used in the AI playground results in a “loading model” progress bar as the new model is loaded into GPU memory.

Increasing the size of the generated image increases memory usage as well as the time it takes to compute (as expected). Asking the Juggernaut model to generate an “HD” 1080 x 1080 picture gave a peak GPU memory usage of very close to 16GB. The base model size is about 12GB, significantly larger than the Dreamshaper model.

4 thoughts on “Hi Local AI, Draw Me …”

Pingback: More Exploration of (Local) AI Models – Observations from Uppsala
Louis Duran says:

2024 December 11 at 23:20

Excellent reporting on using ARC graphics for local AI models.
I have the little brother to that card (A750) and I ran into a number of issues when running the AI Playground software. I will need to spend some time troubleshooting it. I think my Windows configuration was possibly to blame since I had a number of other Intel APIs and different versions of Python lying around.

I was impressed with the speed of training a RESNET50 model using the ARC A750. I trained on the CIFAR10 dataset using both CPU and GPU. My CPU is an i7-14700 and aforementioned GPU. The GPU performed the training in about 1/12th the time.

See here : https://medium.com/@louisduran/training-resnet50-using-intel-pytorch-extensions-cpu-def7d412609f
Pingback: (Local) AI, Please Explain This Code – Observations from Uppsala
Jakob says:

2024 December 12 at 19:33

I also had a problem installing AI Playground after messing around with trying to make the ARC run with ollama. For some reason the installer failed to install the Python requirements.

The solution I found was to dig into the installation folder of AI Playground after the installation and manually installing requirements using the Python hidden inside the env folder. The files are found here on the github repo: https://github.com/intel/AI-Playground/tree/main/service

This site uses Akismet to reduce spam. Learn how your comment data is processed.