Shaking a Linux Device Driver on a Virtual Platform

To continue from last week’s post about my Linux device driver and hardware teaching setup in Simics, here is a lesson I learnt this week when doing some performance analysis based on various hardware speeds.

First some background.

A key idea in the setup is to use the approach of assuming some processing time for the hardware accelerator, rather than creating detailed code and determining the actual processing time for a particular implementation. Given some assumed time, we can then see how it impacts program performance. This is a way of designing hardware where we look to how fast something needs to be to have a positive impact, rather than trying to make it as fast as possible. It also lets us analyze how performance in hardware is seen when using a complete OS stack and a real device driver rather than simple bare-metal software (which tends to show the performance in the best possible light). Essentially, it is loosely timed design-space exploration.

Initial tests of the driver used very short completion times, on the order of 1 microsecond. The read() call at this point simply waited for the hardware completion flag to become true, and then returned the results. That is not the kind of behavior that a driver should have, since if the hardware gets some kind of hiccup, we will be stuck looping inside a kernel context. Instead, I implemented a blocking read variant that would put the calling process to sleep until a result arrives.

In order to test that my driver did the sleep function correctly, I changed the processing delay into the level of seconds… and promptly found a set of issues that forced several rewrites of the code. The most important was the need to switch to a software flag for completion rather than relying on the hardware flag, and the implementation of an interrupt handler to get a notification from the hardware.

Then, on Friday, I demonstrated the setup along with some new performance analysis tools to go with it to some students testing the setup. And the test program suddenly stopped working, obviously hanging at the first call to read() without ever getting unblocked.

The reason was a classic race condition: the code in the write() device driver call that sent input data into the hardware device waited until after the writing was complete (and then some more) before clearing the operation complete flag. Here is the relevant piece of code:

for(i=0;i<words;i++) {
  write_register(SIMPLE_INPUT, kbuf[i]);
}
*f_pos = 0;
kfree(kbuf);
clear_completion_state();

With a sufficiently short delay to completion, the completion interrupt fired, was handled, and set the completion flag before the write() function even got to clear_completion_state(). After this, the test program called read() to read the result, and was blocked as the completion flag was not set. The interrupt to signal completion from the hardware had already triggered and its result deposited in the software flag, which had then been promptly overwritten inside write(). Thus, inside read(), the flag never became set, and the process waited forever.

The fix is obvious: just move the clearing of the flag to before the writing to the hardware begins.

To generalize from this brilliant example of concurrency carelessness, this is a really good accidental demonstration of the power of varying timing in a virtual platform to shake code and find timing-related bugs in a manner much more efficient than possible on physical hardware.

Had I described the exact (or even approximate) timing of a particular hardware implementation, this kind of bug would not have been found and the driver code would not have been as robust. An implementation relying on a very short completion time could check the hardware operation complete flag directly, but that broke down when the delay was long. The buggy implementation above worked fine with a long completion time, but broke down with a short. The fixed implementation works across a span of times from 10 ns to 10 s or more, which is all you can ask for I think.

A short fun Simics note on this: changing that timing parameter is a run-time change. It is possible to change it during a run, from the Simics command-line, using a simple one-line command:

simics> sd0->time_to_result = 10.0e-9

It is really nice working with a system like that!

One thought on “Shaking a Linux Device Driver on a Virtual Platform”

One thought on “Shaking a Linux Device Driver on a Virtual Platform”

Leave a Reply