Testing Mistral Le Chat (Coding and Understanding Code)

I am coming back to my project of testing AI models, local and in the cloud, on a few coding problems that seem surprisingly difficult. The models I used in my previous posts (analyze code, write code, reason about code) were mostly from the US or China, even though I did try the French Mistral-7B model. In this post, I test the full-power cloud-based Mistral Le Chat, as well as the midsize local Codestral model. Time to try the European AI!

Mistral deserves credit for the punny name. Chat is French for Cat, and the pixelated logo is a capital M that also looks like a cat face.

I have found no authoritative data on just how big the Le Chat model is. I asked it, and it claimed it was the same as the 7B Mistral model, which it most definitely is not based on how good the results are.

Explaining my Favorite C Function

I started with the standard prompt I have used in previous experiments, using the cloud-based Le Chat:

What does this code do?

uint8_t mystery (uint32_t val) {
    uint8_t  k = 0;
    if (val > 0x0000FFFFu) { val >>= 16; k  = 16; }
    if (val > 0x000000FFu) { val >>= 8;  k |= 8;  }
    if (val > 0x0000000Fu) { val >>= 4;  k |= 4;  }
    if (val > 0x00000003u) { val >>= 2;  k |= 2;  }
    k |= (val & 2) >> 1;
    return k;
}

The reply from Le Chat was very fast (it felt as fast as SambaNova). The reply is also one of the better I have seen:

The function mystery determines the position of the highest set bit in the input value val and encodes this position in the variable k. The encoding is done by setting specific bits in k based on the position of the highest set bit in val. The result is an 8-bit value that represents the position of the highest set bit in a compact form.

In particular, it was way better than the small local Mistral model I tried previously. That model produced a wrong answer, plain and simple.

Explaining Broken Code

This is an experiment I tried first with the QwQ reasoning model. Basically, use a bad function and see what the LLM says about it – in particular, one with dissonance between the function name and its actual function.

What does this code do?

unsigned int find_highest_set_bit(unsigned int n) {
    unsigned int position = 0;
    while (n != 0) {
        // Clear the lowest set bit
        n &= (n - 1);
        // Increment the position counter
        position++;
    }
    return position;
}

Mistral le Chat does a really good job of explaining the function and the discrepancy between name and functionality. It also offers a fixed version of the function, which is not something I asked about. QwQ did the same; guess LLMs are trained to be helpful and not just produce a succinct answer:

This C function, find_highest_set_bit, is intended to find the position of the highest set bit (i.e., the bit with value 1) in an unsigned integer n. However, there is a logical error in the implementation. Let’s break down what the code does and then discuss the issue: […]

To correctly find the position of the highest set bit, you can use a different approach. Here is a corrected version of the function:

[…]

There is an interesting choice being made by the model here. Which interpretation of the function should you trust? The function name or the implementation?

The LLMs lean towards following the function name and adapting the implementation – where you could just as well rename the function to match the implementation. If I was asked to help fix this, I would ask the whoever is responsible for the code just what the expectation is. An implementation carries more weight than a name, but on the other hand the name of the function is the published interface. However, the comments point in the direction of bit counting… tough call.

Here is the complete output of the proposed fix, showing the nice formatting of the le Chat system:

The LLM did rename the function when asked to take that approach:

Note that if you closely, there is a detail in the generated code that indicates a typical problem I have seen previously with LLMs. They do not care much about types. The original code took a “uint32_t” and returned a “uint8_t”. But this new function does not. The “unsigned int” used for the argument may or may not be the same as “uint32_t”, but it is most definitely not the same return type.

Writing Code

To test the code-creation capabilities, I used the same prompt as before:

Generate a C function to find the number of the highest set bit in a 32-bit word

Mistral le Chat generated the same function as it proposed as the corrected version of the misnamed function above. It uses 1-based indexing (i.e., the lowest bit is numbered 1, and returning 0 in case no bit is set), just like the Llama variants I tested:

int highestSetBit(unsigned int n) {
    int position = 0;

    // Check each bit from the most significant to the least significant
    while (n > 0) {
        n = n >> 1; // Right shift the number by 1
        position++;
    }

    return position;
}

The position is 1-based, meaning the least significant bit is considered position 1.

If the input number is 0, the function will return 0, indicating no set bits.

Rerunning the prompt gave a version that returned -1 for no bits set and a zero-based index.

Codestral: Explaining the Mystery Function

Codestral is a model from Mistral that is advertised as optimized for coding. It is a 22-billion-parameter model that I can run locally using my regular processor (it is too big for my GPU) – as before, I used ollama to run the model.

The answer to “what does this function do” starts out correct – but the end of the sentence is strange:

This function named `mystery` takes an unsigned 32-bit integer as input. The goal of the function is to determine the position of the most significant set bit in the given value and return that position along with other bits which are turned on before this position.

“Along with other bits”???

The rest of the output, for completeness:

Here's how it works:
1. `val >>= 16` : If the most significant byte (bits 24-31) is not zero, then the number is greater than 0x0000FFFFu. In that case, the input value is right shifted by 16 bits and the bit position counter `k` is set to 16.

2. `val >>= 8` : If after possible right shift in previous step, the most significant byte (bits 16-23) is not zero, then the number is greater than 0x000000FFu. In that case, the input value is right shifted by 8 bits and the bit position counter `k` is updated with bitwise OR operation with 8.

3. `val >>= 4` : If after possible right shift in previous step, the most significant nibble (bits 12-15) is not zero, then the number is greater than 0x0000000Fu. In that case, the input value is right shifted by 4 bits and the bit position counter `k` is updated with bitwise OR operation with 4.

4. `val >>= 2` : If after possible right shift in previous step, the most significant two bits (bits 10-11) are not zero, then the number is greater than 0x00000003u. In that case, the input value is right shifted by 2 bits and the bit position counter `k` is updated with bitwise OR operation with 2.

5. `k |= (val & 2) >> 1;` : This statement checks for the second most significant bit in the current val after all possible right shifts, if it's set then it'll be reflected in the `k`. It does this by performing a bitwise AND with 2 to isolate that bit, and then right shifting the result by 1.

6. Finally, function returns value of `k` which now represents position of the most significant set bit and other bits which are turned on before this position.

The cloud-based Le Chat must be a bigger model than Codestral, based on the quality of the result.

Codestral: Explaining Broken Code

Asking Codestral to explain the broken code – i.e., the code with the misleading name – results in a really low-quality reply. The model just says that the function does what the name says – and then continues with an explanation that matches the code. And then claims the result is what it is not.

This function named `find_highest_set_bit` is used to find and return the position of the highest order set bit in a given
unsigned integer `n`. The position is counted from right to left, starting at 1.

Here's how it works:

The while loop runs as long as `n` is not equal to zero. In each iteration of the loop, it clears the lowest set bit in `n`
using a bitwise AND operation with `(n - 1)`. This operation removes the rightmost set bit from `n`.

The next paragraph basically says that the function counts the number of bits set (exits after all bits have been cleared):

After clearing the lowest set bit, the position counter `position` is incremented. When `n` becomes zero (meaning all bits have
been cleared), the loop exits and the function returns the value of `position`.

But that insight is promptly discarded:

So, this code effectively finds and returns the position of the most significant set bit in the binary representation of the
given number `n`. If `n` is 0, the function will return 0 as there are no set bits.

Fascinating, as always. This is so typical LLM – logical incoherence hiding behind coherent-sounding language.

I tried this twice in separate sessions, resulting in similar (wrong) results.

Codestral: Writing Code

Moving on to writing code. This should be the sweet-spot for Codestral, since a “coding” model is really more about generating code from a prompt than generating explanations of tricky code.

This is the code it generated:

int msbPosition(unsigned int n) {
    int pos = 0;
    while (n != 0) {
        n = n >> 1;
        pos++;
    }
    return pos;
}

That was on my second try. I first tried to generate the code in the same session where I just asked it to analyze the broken function (mismatched name and implementation). In that context, it gave me the same function I just fed it to analyze! Demonstrating how the contents of the context window can really affect the output of an LLM (surprising no one). Quitting the session and starting a new session gave a correct result.

Codestral: Writing Unit Tests

Unit test creation is a popular application for LLMs. I have not had occasion to try it myself in the past, but this seemed like a good test case.

This is the prompt I used:

Can you write a set of unit tests for this code in C?

	uint8_t highest_bit_set (uint32_t val) {
    uint8_t  k = 0;
    if (val > 0x0000FFFFu) { val >>= 16; k  = 16; }
    if (val > 0x000000FFu) { val >>= 8;  k |= 8;  }
    if (val > 0x0000000Fu) { val >>= 4;  k |= 4;  }
    if (val > 0x00000003u) { val >>= 2;  k |= 2;  }
    k |= (val & 2) >> 1;
    return k;
}

The resulting generated code is impressive at first glance. The LLM generates an implementation based on CUnit, including a set of test cases. The code is complete enough to compile on first try!

I was most curious about the quality of the test cases, to be honest, as that is more difficult than just building a test harness. Here is what the model produced:

// Function to test highest_bit_set
void test_highest_bit_set(void) {
    CU_ASSERT(0 == highest_bit_set(0));
    CU_ASSERT(31 == highest_bit_set(0xFFFFFFFFu));
    CU_ASSERT(4 == highest_bit_set(0x00000010u));
    CU_ASSERT(8 == highest_bit_set(0x00000100u));
    CU_ASSERT(16 == highest_bit_set(0x00010000u));
    CU_ASSERT(3 == highest_bit_set(0x00000007u)); 
    CU_ASSERT(28 == highest_bit_set(0xF0000000u));
}

This looks good on the surface and is more useful than I expected. However, the code passed 5 of the 7 tests (in case you cannot tell, the last two test cases are the broken ones).

Looking at the failing test cases in detail there might be something interesting going on.

The value “7” has three bits set. The value “0xF000_0000” has 28 zero bits in it. This could be considered as “LLMs are bad at math”… but it could also be that the LLM is a bit fuzzy on what the function is supposed to do and providing some other “measures related to bits” that are reasonable but not directly applicable to this function.

Le Chat: Writing Unit Tests

Going back to the cloud Le Chat, I gave it the same prompt. It came up with something quite different in form, but similar in content.

// Unit tests
void test_highest_bit_set() {
    // Test case 1: Highest bit set is at position 0
    assert(highest_bit_set(0x00000001) == 0);

    // Test case 2: Highest bit set is at position 1
    assert(highest_bit_set(0x00000002) == 1);

    // Test case 3: Highest bit set is at position 31
    assert(highest_bit_set(0x80000000) == 31);

    // Test case 4: Highest bit set is at position 15
    assert(highest_bit_set(0x00008000) == 15);

    // Test case 5: Highest bit set is at position 7
    assert(highest_bit_set(0x00000080) == 7);

    // Test case 6: Highest bit set is at position 3
    assert(highest_bit_set(0x00000008) == 3);

    // Test case 7: All bits set
    assert(highest_bit_set(0xFFFFFFFF) == 31);

    // Test case 8: No bits set
    assert(highest_bit_set(0x00000000) == 0);

    // Test case 9: Highest bit set is at position 23
    assert(highest_bit_set(0x00800000) == 23);

    // Test case 10: Highest bit set is at position 11
    assert(highest_bit_set(0x00000800) == 11);

    printf("All tests passed!\n");
}

Impressively, all these test cases are correct! That is more than I expected, to be honest.

Codestral Processor Usage

It is fun to look at the Task Manager when running AI models, to see how much CPU they use. For some reason, the Codestral model runs on the E-cores of my 14700K! It seems odd since these core are objectively slower than the big P-Cores. On a second run, Windows decided to first use the P-Cores and then switch to the E-cores. Fascinating. Maybe the Intel Thread Director decides that since there is no AVX-512 available, the E-cores with AVX2 offers the best match for the workload. Or something. Strange it is.

Summary

The results for Mistral le Chat and the local Codestral model are summarized below. Pretty good, except for explaining code.

Model	Score (0 to 5)	Comment
Le Chat – explain code	5	Correct explanation
Le Chat – explain function with bad name	5	Correct explanation, offers a “corrected” version
Le Chat – write code	5	Solution works, 1-based indexing
Le Chat – write tests	5	All tests work
Local: Codestral – explain code	4	Correct explanation with some weird extra at the end
Local: Codestral – explain function with bad name	0	Wrong answer
Local: Codestral – write code	5	Got it right in a clean session
Local: Codestral – write tests	4	The generated framework works, and 5 of 7 test cases are correct

Looks like Le Chat is a perfectly viable competitor in the LLM space.