(Local) AI, Please Write some Code – Observations from Uppsala

My previous blog post in this series tested the ability of a range of large language models to analyze a piece of C code and determine what a mystery function did. That was interesting and entertaining, but possibly not a particularly “fair” test of the models’ capabilities. Most of time, I think people use “AI” to help write code, not to understand some tricky piece of algorithmic code. Thus, I turn the problem around and ask the models to write code for the algorithm I previously asked them to analyze.

The Prompt

The prompt I used was this:

Generate a C function to find the number of the highest set bit in a 32-bit word

I refer to the previous blog post for details on most models – I will only note details for models that were not previously used.

SambaNova: Llama-3.2-1B-Instruct

The tiny Llama 1B model produces this code:

/**
 * Finds the number of the highest set bit in a 32-bit word.
 *
 * @return The number of the highest set bit in the 32-bit word.
 */
int highest_set_bit_in_32bit_word() {
    // Initialize the counter for the highest set bit
    int highest_bit = 0;

    // Iterate over each bit in the 32-bit word
    for (int i = 0; i < 32; i++) {
        // Check if the current bit is set
        if ((1 << i) & 1) {
            // If the bit is set, increment the counter
            highest_bit++;
            // Break out of the loop since we've found the highest set bit
            break;
        }
    }

    // Return the number of the highest set bit
    return highest_bit;
}

This code has some issues. The function does not take an input argument, and the loop is hilarious. No matter what you send in, it will always return the value 1 (since the condition in the loop will be true for i=0).

The model helpfully (?) tacks on a complexity analysis:

The time complexity of this function is O(n), where n is the number of bits in the 32-bit word. This is because it needs to iterate over each bit in the word once. The space complexity is O(1), as it only uses a constant amount of space to store the counter and the loop variables.

I like the formulation better than what the Llama-3.3-70B-Instruct model produced (see below).

Verdict: Incorrect.

SambaNova: Llama-3.2-3B-Instruct

This model produces a much longer function than its smaller sibling:

/**
 * Finds the number of the highest set bit in a 32-bit word.
 *
 * @param word The 32-bit word to search.
 * @return The number of the highest set bit (1-indexed).
 */
uint32_t highest_set_bit(uint32_t word) {
    uint32_t highest_bit = 0;
    uint32_t bit_position = 0;

    // Loop through each bit in the word
    while (word) {
        // Check if the current bit is set
        if (word & 1) {
            // If it is, update the highest bit and its position
            if (highest_bit == 0) {
                highest_bit = bit_position + 1;
            }
            // Break out of the loop since we've found the highest bit
            break;
        }
        // Move to the next bit
        word >>= 1;
        bit_position++;
    }

    return highest_bit;
}

Note the definition of the return value in the comments. It claims to use a 1-based index to allow reporting “no bit set” as zero. Not a bad idea, but it would require some care when using the resulting value. This definition of the output is shared with most of the other Llama invocations from the SambaNova cloud. But not when I run the same models locally!

The complexity of the code impresses me; with its convoluted logic and multiple local variables you could almost expect a compilation error. But no, it compiles and runs fine.

However, the code does not find the highest bit set. It finds the lowest bit set.

Verdict: Incorrect.

SambaNova: Llama-3.1-8B-Instruct

The Llama-3.1 8B model produces this code:

/**
 * Finds the number of the highest set bit in a 32-bit word.
 *
 * @param n The 32-bit word to search.
 * @return The position of the highest set bit (0-indexed).
 */
int highest_set_bit(int n) {
    int count = 0;
    while (n) {
        n &= (n - 1);
        count++;
    }
    return count;
}

The argument type is wrong – a signed type is not a good idea. This Llama selects zero-based indexing, unlike its siblings. The code counts the number of set bits in the word, not the position of the highest bit set.

Verdict: Incorrect.

SambaNova: Llama-3.1-70B-Instruct

The Llama-3.1 70B model produces this code:

/**
 * Returns the position of the highest set bit in a 32-bit word.
 *
 * @param word 32-bit unsigned integer
 * @return Position of the highest set bit (1-indexed)
 */
uint32_t highest_set_bit(uint32_t word) {
    uint32_t position = 0;
    while (word >>= 1) {
        position++;
    }
    return position;
}

This code claims that it provides a 1-based index. However, the result is actually zero-based. The model generated an incorrect comment. In some way, a very human mistake, but a human programmer would have tested the function with some quick spontaneous tests with various values and quickly realized the mistake.

Verdict: Almost correct. The erroneous comment does not allow me to give it a full grade, even if the implementation is what I wanted.

SambaNova: Llama-3.3-70B-Instruct

SambaNova just added support for the very recent Llama-3.3-70B-Instruct model after the previous blog came out.

Its output:

/**
 * Find the position of the highest set bit in a 32-bit word.
 *
 * @param n The input 32-bit word.
 * @return The position of the highest set bit (1-indexed).
 */
int highest_set_bit(unsigned int n) {
    if (n == 0) {
        return 0; // No set bits
    }

    int position = 0;
    while (n > 1) {
        n >>= 1; // Right shift by 1 bit
        position++;
    }

    return position + 1; // 1-indexed position
}

Note that just most other Llama models, it wants a 1-based index.

The model also outputs a complexity analysis:

The time complexity of this function is O(log n), where n is the input number. This is because the while loop runs until n becomes 1, and the number of iterations is proportional to the number of bits in n. For a 32-bit word, the maximum number of iterations is 32.

Interesting way to define the complexity. I would have said that this function is O(N) – where N is the size of the word in bits. As there are better ways, like the search-tree function I used as a test case in the previous blog.

Verdict: Correct.

SambaNova: Llama-3.1-405B-Instruct

The big Llama produced this function:

/**
 * Find the position of the highest set bit in a 32-bit word.
 *
 * @param n The input 32-bit word.
 * @return The position of the highest set bit (1-indexed), or 0 if no bits are set.
 */
uint32_t highest_set_bit(uint32_t n) {
    uint32_t position = 0;
    while (n) {
        n >>= 1;
        position++;
    }
    return position;
}

One-based indexing again. Correct types of the input value. A correct function. Just like in previous experiments, the big model performs the best.

It gets better. At the end of the output, it provides a second implementation!

Here’s a more efficient version of the function using a technique called “bit scan reverse” or “bsr”. This function uses a compiler intrinsic to directly find the position of the highest set bit.

And the code:

/**
 * Find the position of the highest set bit in a 32-bit word.
 *
 * @param n The input 32-bit word.
 * @return The position of the highest set bit (0-indexed), or -1 if no bits are set.
 */
uint32_t highest_set_bit(uint32_t n) {
#ifdef _MSC_VER
    unsigned long index;
    _BitScanReverse(&index, n);
    return index;
#elif defined(__GNUC__)
    return 31 - __builtin_clz(n);
#else
    uint32_t position = 0;
    while (n) {
        n >>= 1;
        position++;
    }
    return position - 1; // subtract 1 to make it 0-indexed
#endif
}

That is the truly right way to implement the functionality on a modern machine. Compiled with mingw on my machine, it does invoke the CLZ instruction on my Intel host.

Verdict: Correct, and with a bonus optimization!

Intel AI Playground: Microsoft-Phi-3-Mini-4k-Instruct

After writing the previous blog post, I installed version 1.22 of the Intel AI Playground. Turns out there was a distribution available for discrete ARC GPUs (which I erroneously states was not the case). Thus, this test is run using a new version of the program, but what should be the same model as before, the Microsoft-Phi-3-Mini.

The phi-3 generates this code:

#include  

// Function to count the number of 1's in the binary representation of a number
int popCount32(unsigned int num) {
   int count = 0;
   unsigned int n = num;

   // Iterate through each bit to count the number of 1's
   while (n) {
       count++;
       n &= (n-1); // clear the least significant 1-bit
   #ifndef NDEBUG
       printf("Current count: %u\n", count);
       printf("Remaining bit (binary): %u (0b%u)\n", n, n); 
   #endif
   }

   return count;
}

First of all, note that the empty include statement on the first line is exactly what I got from the model. Second, that is not the function I requested. The explanation it provides is also worth sharing:

To find the number of the highest set bit (also known as the “population count”, “histogram”, or “unbiasing”) in a 32-bit word using C, we can use a function that iterates over each bit of the input number from the highest to the lowest. This method involves checking each bit from the most significant bit (MSB) towards the least significant bit (LSB) and counting how many bits are set to 1 before encountering the MSB. This approach guarantees correct results and is efficient for general-purferency.

Interesting verbal sleight of hand – starting from the correct endpoint to then redefine it incorrectly. The final word is also funny. Purferency? Some kind of portmanteau of “purpose” and “efficiency”?

Verdict: Incorrect.

Intel AI Playground: Qwen2-1.5B-Instruct

The new version of the Intel AI Playground provides some additional text models out of the box. The first addition is Qwen2-1.5B-Instruct, as hosted on Huggingface.

This tiny model produces something extraordinary:

#include 

unsigned int highestSetBit(unsigned long long int value) {
    unsigned int i = 0; // Start from the least significant bit
    unsigned int result = value; // Keep the original value

    // Loop until the least significant bit has been processed
    while ((value >> i) != 0) {
        // Perform the AND operation
        result &= value >> i;

        // If the current bit is not set, break out of the loop
        if ((result ^ value >> i) == 0)
            break;
        
        // Shift left to check the next higher bit
        i++;
    }
    
    return i;
}

Same empty initial include statement, could that somehow result from the viewer in the AI playground? Hard to tell. But what happens next is fascinating. Following the logic, the code is a complicated way to always return zero, just like the smallest Llama managed to always return one.

Note that the argument type is incorrect, as “long long” should be at least 64 bits in size regardless of the platform, and the prompt specifically asked for a 32-bit word.

The compiler does not see the constant behavior initially, due to the odd typing of the input “value” and local “result” variable. However, if the types are corrected, it realizes that the function is a constant:

Verdict: Incorrect.

Ollama: Mistral-7B-Instruct

Mistral generates this function:

unsigned find_highest_set_bit(unsigned num) {
    unsigned result = 32; // Initially assuming no set bit is found

    // Perform a right shift and AND operation with 1 to check each bit from right to left
    for (int i = 0; i < 32; ++i) {
        if ((num & (1 << i))) {
            result = i;
            break;
        }
    }

    return result;
}

There is a problem with this code. It finds the lowest set bit, not the highest set bit! The loop should go from highest to lowest, not lowest to highest.

Verdict: Close, but not quite there.

Ollama: Llama-3.2-3B-Instruct

The code from the small Llama when run locally:

unsigned int find_highest_set_bit(unsigned int n) {
    unsigned int position = 0;
    while (n != 0) {
        // Clear the lowest set bit
        n &= (n - 1);
        // Increment the position counter
        position++;
    }
    return position;
}

This is a nice piece of code… but it is not correct. This function computes the number of bits set in the word, not the number of the highest bit. Interestingly, that is exactly what the model wanted the highest-bit function to do in my previous test. It is also the same output as from the Phi-3 model above.

The output is also different from running the same model in the SambaNova cloud, but that is to be expected.

Try 2.

I gave the model another chance. This one produced a slightly more verbose implementation that also aims for 1-based indexing.

/**
 * Returns the position of the most significant set bit in a 32-bit integer.
 *
 * @param n A 32-bit integer.
 * @return The position of the most significant set bit (1-indexed).
 */
int highest_set_bit(uint32_t n) {
    int position = -1;
    uint32_t mask = ~n;

    // Iterate over all bits in the 32-bit word
    while (mask != 0) {
        if ((mask & 1) != 0) {
            position++;
        }

        // Shift the mask one bit to the right
        mask >>= 1;
    }

    return position + 1; // +1 because positions are 1-indexed
}

This code counts the number of zeroes in the word! The inversion of the “mask” variable is the cause for this.

Verdict: Incorrect, on both tries.

Ollama: Llama-3.1-8B-Instruct

The locally executed Llama-3.1-8B-Instruct model provides two implementations of the function, from a single prompt:

int find_highest_set_bit(uint32_t value) {
    int i;

    for (i = 31; i >= 0; --i) {
        if (value & (1 << i)) {
            return i;
        }
    }

    // No bits were set
    return -1;
}

That implementation is correct, if a bit clumsy. It is similar to the code that Mistral generated, but with the loop running in the correct direction.

The function then provides a second implementation as part of the same reply:

This function uses a simple loop to iterate through the bits of the input word from right to left. When it finds the first bit that is set, it immediately returns its position. However, you can also use bitwise operations in combination with some knowledge about how shifting works to make this more efficient:

And the code:

int find_highest_set_bit(uint32_t value) {
    int i = 0;
    while (value) {
        if (value & 1) {
            return i;
        }
        value >>= 1;
        i++;
    }

    // No bits were set
    return -1;
}

This function finds the lowest bit set, just like the function from Mistral. It is also not clear why this would be more efficient than the previous code – maybe due to the accumulated shifting of the “value” variable instead of successive variable-length shifts of a mask?

With those two functions and the function from the same model in SambaNova cloud, the Llama-3.1-8B-Instruct model generated three functions, all different, and only one of them correct. Hard to say that a model is particularly reliable with this kind of variation.

Verdict: This invocation of the model must be said to be 50% correct.

Ollama: Qwen2.5-coder-7B

Qwen was the best small model in the analysis test. Using it for code generation produces this code:

// Function to find the position of the highest set bit in a 32-bit unsigned integer
int find_highest_set_bit(unsigned int num) {
    if (num == 0) {
        return -1; // No set bits
    }

    int pos = 0;
    while (num > 1) {
        num >>= 1; // Right shift by 1 bit
        pos++;
    }
    return pos;
}

That is a correct function, even if the loop termination condition can be debated. It does not matter for the result, but it complicates reading the code is my opinion. Given the results for the other small models, once again the “coder” model proves that it is pretty good at code.

Try 2.

Given the lack of stability of certain models, I had to test this by prompting the model a second time for the same problem. The code it returned is almost the same:

// Function to find the number of the highest set bit in a 32-bit word
int highestSetBitPosition(unsigned int num) {
    if (num == 0) return -1; // Return -1 if the input number is zero

    unsigned int pos = 0;
    while (num > 0) {
        num >>= 1; // Right shift the number by 1 bit
        pos++;     // Increment the position counter
    }

    return pos - 1; // Subtract 1 to get the zero-based index of the highest set bit
}

That is the same algorithm, with a subtly different handling of the looping and “pos” variable. It still returns the right result.

Verdict: correct, on both tries, but with variations in details.

OpenAI: ChatGPT-4o-mini

The free ChatGPT-4o-mini model provides this function:

int highest_set_bit_index(unsigned int num) {
    if (num == 0) {
        return -1;  // No bits are set
    }
    
    int index = 31;  // Start with the highest possible bit position in a 32-bit word

    // Loop through the bits from 31 down to 0
    while ((num & (1 << index)) == 0) {
        index--;  // Move to the next lower bit position
    }

    return index;
}

That implementation is correct, if a bit clumsily coded in my opinion.

The “mini” model is guesstimated by the Internet to be around 8B parameters. If that is true, its performance is very impressive. It would seem that the Qwen model and this mini ChatGPT are at similar level of coding at similar model sizes. However, OpenAI does not disclose the size of the model or the overall workflow it is part of.

Try 2.

Just to check, I reran the same query in the same session with ChatGPT, and got this code:

unsigned int highest_set_bit_value(unsigned int num) {
    if (num == 0) {
        return 0;  // No bits are set, so return 0
    }

    // Use bit manipulation to find the highest set bit
    num |= num >> 1;
    num |= num >> 2;
    num |= num >> 4;
    num |= num >> 8;
    num |= num >> 16;

    // Now, the highest set bit is at the highest position, so we return it
    return num - (num >> 1);
}

This code implements something different – it produces a value that contains only the highest bit of the initial input. For example, an input of 255 results in a value of 128. This is close, but not what the prompt asked for. It is like the model skipped the part of the prompt that asked for the “number of the highest set bit” and read “highest bit set”.

Verdict: Correct on one attempt, but failed on a second attempt.

Edge Copilot

The CoPilot in Edge produces this code:

#include <stdint.h>

int highestSetBitPosition(uint32_t val) {
    if (val == 0) return -1; // No bits are set

    int position = 0;
    while (val >>= 1) {
        position++;
    }
    return position;
}

I like that it uses the “uint” family of types. It is essentially the same as the code generated from Qwen, but more compactly written and with a different termination condition for the loop.

Verdict: correct.

Zyphra: Zamba2-7B

Zyphra is a startup in the AI field that is aiming to build more efficient LLMs by tweaking all aspects of the software, model design, model components, and training data. The goal is to get better performance from smaller models. They offer access to two models in their MAIA cloud service, and this test was carried out using their larger Zamba2-7B model.

The model produces this function:

int find_highest_set_bit(unsigned int number) {
   int count = 0;
   while (number > 0) {
       number &= (number - 1);
       count++;
   }
   return count;
}

That is a bit count function, not a highest-set-bit function. The explanation generated by the model is worth sharing:

This function uses a while loop to repeatedly clear the least significant set bit in the number until all bits are cleared. The count variable keeps track of the number of iterations, which corresponds to the position of the highest set bit.

Note that the position of the highest set bit is 0-based, so the result will be 0 if all bits are cleared.

The function of the loop is correctly described, but not what it counts.

Verdict: incorrect.

Google: Gemini 1.5 Flash

Google makes the Gemini 1.5 Flash model available for free for interactive experiments. The size of the model is not documented, but some speculation on the Internet guesses at 32-ish billion parameters.

In any case, Gemini generates this code:

#include <stdint.h>

int highest_set_bit(uint32_t x) {
    if (x == 0) {
        return -1; // No bits set
    }

    int bit_position = 0;
    while (x >>= 1) {
        bit_position++;
    }

    return bit_position;
}

That code works. I like that it uses proper stdint types.

Verdict: Correct.

Summary

The table below summarizes the results:

Model	Score (0 to 5)	Comment
SambaNova: Llama-3.2-1B-Instruct	0	Returns constant 1, forgot the argument.
SambaNova: Llama-3.2-3B-Instruct	2	Returns lowest bit set, 1-based indexing.
SambaNova: Llama-3.1-8B-Instruct	1	Counts number of set bits.
SambaNova: Llama-3.1-70B-Instruct	4	Correct code, incorrect comment.
SambaNova: Llama-3.3-70B-Instruct	5	Correct code, 1-based indexing.
SambaNova: Llama-3.1-405B-Instruct	5	Correct code, 1-based indexing. Bonus compiler-intrinsic variant.
Local: Microsoft-Phi-3-Mini-4k-Instruct	1	Counts number of set bits.
Local: Qwen2-1.5B-Instruct	1	Returns constant 0.
Local: Mistral-7B-Instruct	2	Returns lowest bit set.
Local: Llama-3.2-3B-Instruct	1	Counts number of set bits. Try 2 counted the number of zero bits.
Local: Llama-3.1-8B-Instruct	4	One proposed function is correct, one returns the lowest bit set.
Local: Qwen2.5-coder-7B	5	Correct. Try 2 also correct.
OpenAI: ChatGPT-4o-mini	4	Correct. Try 2 returned the value of the highest bit.
Edge Copilot	5	Correct.
Zyphra: Zamba2-7B	1	Counts number of set bits.
Google: Gemini 1.5 Flash	5	Correct.

Conclusions

This test encompassed sixteen different models. Of those, eight generated a correct function at least in one its tries.

As a general rule, larger models produce better results. However, the results from the coding-tuned Qwen model and ChatGPT-4o-mini indicate that it is possible for small models to do well. Using a model tuned for coding for coding makes sense, duh.

Repeated prompting in the same model tended to produce varied code, including mixing correct and incorrect and incorrect implementations. You cannot really trust that a model that produces correct code once will do so repeatedly. It might – but it also might not. Nothing in the output will tell you, you have to read the code.

The conclusion is that one cannot rely on “AI” in general to generate correct code for a certain problem. Each output has to be tested and validated by an independent agent (i.e., programmer), and it makes sense to test and measure the success of models before taking them into production.

Study Limitations

I did not take the time to subject each model to multiple tries – taking the first attempt at face value seems to be a common way to use LLMs. I also have not tried to produce a super-specific good prompt, but instead rely on a slapdash approach that I also believe is fairly typical.

The result is still entertaining and elucidating.

The Prompt

SambaNova: Llama-3.2-1B-Instruct

SambaNova: Llama-3.2-3B-Instruct

SambaNova: Llama-3.1-8B-Instruct

SambaNova: Llama-3.1-70B-Instruct

SambaNova: Llama-3.3-70B-Instruct

SambaNova: Llama-3.1-405B-Instruct

Intel AI Playground: Microsoft-Phi-3-Mini-4k-Instruct

Intel AI Playground: Qwen2-1.5B-Instruct

Ollama: Mistral-7B-Instruct

Ollama: Llama-3.2-3B-Instruct

Ollama: Llama-3.1-8B-Instruct

Ollama: Qwen2.5-coder-7B

OpenAI: ChatGPT-4o-mini

Edge Copilot

Zyphra: Zamba2-7B

Google: Gemini 1.5 Flash

Summary

Conclusions

Study Limitations

Leave a Reply