Every once in a while I need to build demo setups to show debugging in action. As I have blogged before, finding a good bug when you need one isn’t always easy. The solution is to try to invent artificial bugs, and I was very happy when I managed to stage a buffer overrun in a VxWorks program.
It is pretty very nice demo in which you first start a period program A, which prints the value of an incrementing counter every target second. You then run a supposedly unrelated program B, resulting in the values that program A prints to become corrupted. Perfect to show off reverse execution and data breakpoints in reverse as you go from the point where the corrupted value is printed to the piece of code that overwrote the variable.
But then I ported the demo to a new platform… and the bug didn’t work anymore. My bug had caught a bug and was now not working, or at least not they way I expected it to. What had happened?
Very simple. I changed the compiler I used. Since my bug relied on an unspecified behavior in C, the change was totally valid and really expected. Still, it was interesting to see how things played out… in the end, we got a different bug from the same code thanks to the change.
The code is essentially the following, with some simplifications that make it easier to read for those not familiar with VxWorks, and ignoring all the code to start tasks initially.
// Global variables int iDataArray[100]; int myWdISRcount; WDOG_ID myWatchDogId; // Periodic task - program A void myWdISR(void) { /* Increment ISR invocation count */ myWdISRcount = myWdISRcount+1; printf("wd Fired %d times\n",myWdISRcount); /* Start off next invocation */ wdStart ( myWatchDogId, WD_INTERVAL, (FUNCPTR) myWdISR, (int) NULL ); } // Overwrite code - program B int myCompletelySafeRoutine(void) { uint32_t *a,i; a = iDataArray; // This loop writes one word beyond the // limit of the iDataArray for (i=0; i<=100; i++) { a[i] = 0x7fffffff; } return OK; }
In the original setup, compiled for a Power Architecture target, iDataArray ended up right before myWdISRcount in memory. Thus, the buffer overflow changed the value of the counter from something like 10 to 0x7fffffff. Very noticeable in the printouts from the periodic task.
When I changed to an x86 target (using a compiler from the same family, but obviously with a different code generator since the target was different), the variable order in memory changed and it seems that we got iDataArray placed last. Suddenly, the effect of running the safe code was that nothing happened at all. A bit annoying for a demo. Some small source-code changes and a recompile later, the effect was instead to crash the target with a triple fault (page fault inside a page fault handler). Seems the program now managed to corrupt some kernel state. While impressive as a bug, it was not quite what I was looking for.
I then changed the compiler type to compiler 2, and the data layout changed once more. This produced a very useful bug, but it took me a while to actually understand this. Now, when program B was run, program A stopped. This looked like a bug in my program, and I actually started trying to fix this – until I realized that this was the bug I was looking for. Running program B kills program A is just as good a bug as corrupting a counter value, after all. In this case, the array overrun hits the myWatchDogId variable, and when that gets corrupted, the wdStart call ignores the request since it does not recognize the ID it gets.
So, in the end, I got a bug that was just as good as the first, and arguably a bit more intruiging. It is still obviously a contrived example – but I think that a good demo or lab exercise can be artificial as long as it gets the point across. Judging from how people who have done the lab reacts, the goal seems to have been
achieved.
The moral of the story is really that compilers are free to change things which are explicitly implementation-defined or not specified at all in the C standard. That is a good thing as it gives the compiler freedom to optimize the code. If you want to control how variables are laid out in memory, I guess you have to resort to linker scripts or similar – but that was too much pain for me in this case. Just changing things around until I got a good bug, and then freezing the binary (and not recompiling it ever again) is a sufficient strategy.
You could put all your global variables in a struct—that way, the compiler isn’t allowed to reorder them. It is allowed to pad between fields, but with an int filed immediately following an int array field, that shouldn’t happen.
Yes – but that would seem to be too obvious. It would make the bug perfectly portable, though. The idea I had was to make two “obviously distinct” things interact, and putting their respective globals in a struct would be a bit strange then. Not that the code really makes any sense anyway 🙂
Of course there is a bug right here on the line: for (i=0; i<=100; i++)
the <= should be < or 100 should be 99.
Pretty basic.
I usually think at a C, C++, Java, C#, SQL , …etc level and I do not usually care what happens at a "lower" level. Good C code should be able to run on any C machine.
Think about this : you can pretty much type anything on the Google search input line and not cause any crashes.
And also why are you talking about 2 different programs; the 2 functions above are part of the same executable ( just 1 program/process ) executed by the OS/processor.
You want 2 programs you need to put every function in separate files and do separate compilation units but i think you know this.
Yes it is basic – and that is the point. Common type of error, remember that this for a demo. It is intentionally incorrect. Interesting bit is what happens when you do overrun the buffer – that depends on the compiler used.
No – this is an unprotected memory setup for an RTOS. The two functions run as a separate threads (actually, the first function is run as a periodic interrupt service routine, the second from an actual VxWorks task). That they are part of the same compilation unit in C is how these things work.
They only need to be in separate files if you want to compile, link, and load them separately. In this case, that is not desired, to keep things simple.
Just one more thought:
You state :
————————————–
// Global variables
int iDataArray[100];
int myWdISRcount;
WDOG_ID myWatchDogId;
————————————–
Now that you post this piece of code on the internet they are truly global 🙂
Usually global variables are global to the process that defines and implements them and accessible only to functions within the process.
In my understanding program == process.
But i see what you are saying that the 2 functions from the file above are actually the bodies of 2 threads run by the OS.
You gotta keep them separated!