I have recently got back to developing training labs for the Simics simulator (and related technologies). During the process of developing a new accelerator model using as many of the latest frameworks and APIs as possible, it was basically guaranteed that I would hit some bugs and unexpected behaviors. That is a natural part of and benefit from creating training materials in the first place. It also provides a good illustration of two fundamentally different ways to look at software development. One is to play it safe and get things done in known ways, and the other is charge ahead, try the unknown, and see what happens. Damn the torpedoes, bugs are a benefit. No bug reports, no glory. In this post, I will share some recent examples of just coding ahead and breaking thing.
Looking for Trouble?
I find myself writing code in two different modes. If the goal is to solve an immediate problem, I tend to fall back to proven patterns and methods. No point in going on an adventure for something that is supposed to be used “for real”. Get it done in a way that seems most likely to work.
On the other hand, when the purpose is to build training materials, test the documentation, or try new functionality, being adventurous is part of the game. There are a few different philosophies that intersect here:
Classic testing principles – do the unexpected, try everything not explicitly forbidden by the spec, act dumb and try to code like a naïve programmer. Even if something is forbidden by the spec, try it anyway to test the error handling. If the specification is not clear, misunderstand it on purpose. I wrote a couple of pieces about this type of thinking on my Intel blog in 2017. As the joke goes… “a tester/QA engineer walks into a bar and orders 1 beer. 0 beers. -1 beers. Orders a lizard. Orders an emoji… etc.” I totally subscribe to that principle. Try the unexpected, try the likely to fail. And then go totally off the script (See this variant of the joke, for example).
Nice-looking examples – which is the more interesting case. When writing code that is supposed to be used as examples by users, I like the code to be as simple and consistent and “obvious” and “clean” as possible. I don’t like having to write comments that explain that “due to some inconsistency in the product, you have to do it this way”. Instead, I want to do things in regular and direct ways. The logic should be clear and straightforward, the flow easy to follow, and the code should never contain workarounds or clever tricks. The frame of mind for writing such code is to assume that the underlying system is slightly better than it actually is. There will inevitably be the occasional need to add logic to really create a working solution or handle some corner case, but the code is still better.
Since the alternative (which I see quite often in practice) is to write code based on prior experience of what did not work, and steering well clear of such obstacles. This tends to result is code that is long and winding – the very opposite of full steam ahead, since the author is very well aware that there might be torpedoes somewhere. Don’t check for potential errors and bad input values until proven necessary.
Writing Nice Code
Nice coding means reading and the documentation and extrapolating from existing example code, even when my long-honed “spidey sense” tells me that the result is unlikely to work. Don’t let the intuition stop you! As long as the compiler produces an executable file, the code deserves a try. There is no point in working around suspected issues until they have been proven to be real problems. Full code ahead!
When creating examples it is a good idea to at least initially ignore important aspects of the solution – this helps build good arguments as to why the code is written in a certain way. It is also fun in its own right, since you never know what will happen. An error, a crash, or something funny. A sufficiently complex system will have a lot of interesting but unintended behaviors that are worth exploring.
Out of Block List
ArsTechnica recently published an article about a limitation in Google’s gmail – you could apparently only put 1000 addresses into the list of spam senders to block. I think most users and especially the programmers (I am guessing here, I do not know any of them) would never get close to this limit since their spidey-sense would kick in and have them think about alternative strategies long before this point. But for a user with no concerns about the underlying implementation, why not just add address after address after address? That is how you build a nice example, with a single mechanism to handle all spam senders.
Failing at Making Threading Fail
When adding the compute offload thread for the accelerator model for the Mandelbrot tutorial, in the first step I decided to skip some of the locking when accessing the local memory used for descriptors and results. Such locking is necessary for correct functioning in the general case. Thus, I was quite curious to find out what would happen if I skipped it.
Fully expecting a failure, I ran a lot of simulations with the known-broken model, and nothing happened. Highly disappointing, but not really unreasonable. Actually getting the simulator to a state where a missed lock would cause a crash requires many stars to align right. That is threaded code in a nutshell – it is very hard to make crash when you are trying to make it crash, and very easy to make it crash when you do not. This is also indicative of a key pedagogical problem with threaded software: that it can be hard to create code with interesting and realizable failure modes.
Negative Values aren’t
In the display unit of the Mandelbrot setup, there is some code that handles changes the size of the displayed image. Size changes have two effects: internally, there is a dynamically allocated buffer that holds the current pixel values of the display. There is also the communication of the size from the display unit to the graphics console that handles the display window (or display via VNC). This code was initially written to be as simple as possible, without any explicit tests for “good” sizes. Of course there should have been, but it was too interesting to see what would happen without checks. That is what a naïve programmer might do, and thus it is worth trying out.
The size is set as a pair of unsigned 32-bit integers in device registers.
The happy-case examples in the workshop used images that have “reasonable” sizes such as 1000×1000 pixels or 1200×800 pixels. However, if an attempt is made to configure the area with a size like -1 x 100 pixels (sending in -1 from a register-setting simulator command-line command), a crash resulted. “-1” was converted to 0xffff_ffff as an uint32. Which made the code attempt to allocate a “rather large” buffer, running out of memory and getting killed by the host OS.
Setting the size to zero by zero pixels would seem safe, as it just means nothing can be displayed and the code allocates zero bytes. However, this was not expected by the graphics console, and it triggered an assertion as it assumed that no graphics device would do such a dumb thing. This shows the importance of looking at how different modules interact, when you only control one side of the interaction – there is unfortunately no guarantee it is all documented. Finding the logic your code needs to add can be found by simply coding ahead in the simplest possible way first.
Going off the Field
I also took a bit of a detour into register and bit field coding in the Simics Device Modeling Language (DML). Here the testing instinct took over; it turned out that it was possible to get around checks in the compiler with sufficiently complex register declarations. This allowed me to put bit fields outside of the register they were declared in, such as bit 66 of a 64-bit register. I discovered this when putting together some arrays of groups of bit fields.
This kind of code is highly unlikely in the real world, since you would normally just encode an actual hardware specification into DML using a generator. And catching broken hardware designs at this level should not be the job of the virtual platform modelers.
In any case, the code:
And the result from the first register, showing some very long bit fields that definitely do not fit inside 64 bits (this is a broken metadata display; there is no way to actually write to these bits since the register is only mapped for eight bytes no matter what and forcing a nine-byte value in can’t be done given the interfaces used):
Clownfish Mode
Full code ahead does require that you check the results of said code. There might be subtle issues raised by the simple code that only show up if you look closely at the results. And think about what they mean. And think about what a user might ask you if you run the example in a class. Can I explain all that goes on?
In my blog post about the Mandelbrot tutorial from before DVCon Europe, I have a screenshot showing a very colorful fractal.
This is an error. Which I did not realize at the time since I did not think through what it was that I was seeing. The code was trying to create smooth color gradients, not clown-fish-like color bands.
I only realized the error when I pulled out a memory access trace from the code building the color table, to demonstrate Simics memory access tracing. The code is supposed to take a start and an end value, each of the pattern 0x00RRGGBB, and interpolate between the start and end RR, GG, and BB separately. Thus, I would expect to see a series of values where each is basically six hexadecimal values and each byte stepping up or down on its own. Instead, I saw this:
…
[bp.memory trace] [trace:7] qsp.macc.local_memory 'w' access to p:0x2e4 len=4 val=0x10f7f7
[bp.memory trace] [trace:7] qsp.macc.local_memory 'w' access to p:0x2e8 len=4 val=0x10f9f9
[bp.memory trace] [trace:7] qsp.macc.local_memory 'w' access to p:0x2ec len=4 val=0x10fbfb
[bp.memory trace] [trace:7] qsp.macc.local_memory 'w' access to p:0x2f0 len=4 val=0x10fdfd
[bp.memory trace] [trace:7] qsp.macc.local_memory 'w' access to p:0x2f4 len=4 val=0x10ffff
[bp.memory trace] [trace:7] qsp.macc.local_memory 'w' access to p:0x2f8 len=4 val=0x21d49b7
[bp.memory trace] [trace:7] qsp.macc.local_memory 'w' access to p:0x2fc len=4 val=0x4299370
[bp.memory trace] [trace:7] qsp.macc.local_memory 'w' access to p:0x300 len=4 val=0x635dd28
[bp.memory trace] [trace:7] qsp.macc.local_memory 'w' access to p:0x304 len=4 val=0x84226e1
[bp.memory trace] [trace:7] qsp.macc.local_memory 'w' access to p:0x308 len=4 val=0xa4e7099
[bp.memory trace] [trace:7] qsp.macc.local_memory 'w' access to p:0x30c len=4 val=0xc5aba52
[bp.memory trace] [trace:7] qsp.macc.local_memory 'w' access to p:0x310 len=4 val=0xe67040b
[bp.memory trace] [trace:7] qsp.macc.local_memory 'w' access to p:0x314 len=4 val=0x10744dc3
[bp.memory trace] [trace:7] qsp.macc.local_memory 'w' access to p:0x318 len=4 val=0x1280977c
[bp.memory trace] [trace:7] qsp.macc.local_memory 'w' access to p:0x31c len=4 val=0x148ce134
…
At a closer look, that does not look right at all from address 0x2f8
and on. Seven and eight-digit numbers should not happen. This is not a smooth color scale, but rather total chaos with values jumping up and down seemingly randomly. How come? I went back the code:
uint32_t c_red(uint32_t c) {
return (c>>16) & 0xff;
}
uint32_t c_green(uint32_t c) {
return (c>>8) & 0xff;
}
uint32_t c_blue(uint32_t c) {
return c & 0xff;
}
uint32_t color_interpolate(uint32_t start, uint32_t end, double proportion) {
uint32_t red = c_red(start) + (int)( (c_red(end)-c_red(start))
*proportion);
uint32_t green = c_green(start) + (int)( (c_green(end)-c_green(start))
*proportion);
uint32_t blue = c_blue(start) + (int)( (c_blue(end)-c_blue(start))
*proportion);
return (red << 16) + (green << 8) + blue;
}
Should have known better than this, honestly. It comes down to how C deals with unsigned and signed values in expressions. When the “end” value of a color is smaller than the “start” value, a negative value is produced. Which is cast into an unsigned 32-bit value, creating a value that fills all bytes of the word. Oops. The corrected code uses signed ints throughout, since it needs to temporarily operate in the range of -255 to +255 in the “multiply by proportion” part, to produce a result that is between 0 and 255.
When the code was fixed, the rendering of the Mandelbrot looks like this:
Smoother, but maybe a bit boring. Guess I should test the effect of a “clownfish mode” using bands on purpose.