Hardware-Software Race Condition in Interrupt Controller

raceconditionThe best way to learn something is to try, fail, and then try again. That is how I just learned the basics of multiprocessor interrupt management. For an educational setup, I have been creating a purely virtual virtual platform from scratch. This setup contains a large number of processors with local memory, and then a global shared memory, as well as a means for the processors to interrupt each other in order to notify about the presence of a message or synchronize in general. Getting this really right turned out to be not so easy.

I started out with a simple model where each processor had an interrupt location mapped in global memory, and writing to this location would interrupt the processor. As a bonus, the written value was communicated to the receiving processor. Then, the processor being interrupted would acknowledge the interrupt to its local interrupt controller by writing into a local address.  Worked like a charm in simple tests.

It broke completely when I started sending messages from multiple nodes to the same node… if an interrupt from node B reached node A when A was busy processing an interrupt from C, the interrupt from B would simply be ignored. There was no queuing, no fairness, no arbitration. The software could not solve this, since in order to create a lock around the global interrupt location for a processor, it needs some kind of global signaling mechanism. Which was what this interrupt system was supposed to provide.

I must have had some suspicion that something was not quite right, as I had equipped the interrupt controller with a counter for interruptions raised vs interrupts cleared. This monotonically increased, indicating accumulated non-noticed interrupt attempts.

One obvious solution that did not work either was to provide a way to check that an interrupt was successfully sent. Since the interrupt send register for a processor was put in a shared global memory space, a processor that wrote the interrupt send register and then read the status register would have no way to guarantee that the status it read actually dealt with the interrupt it had tried to send. It would be very likely to read the status resulting from some other processor’s interrupt attempt. Basically, it would be doing non-protected access to a shared mutable area… known not to be a good idea.

Another solution would be to use an atomic load-and-store operation that would store a value in a register and then return a value to the processor as well. However, I have never seen this supported for device space, even if atomic operations of this type is available on most machines for regular memory.

So it was back to the drawing board. It is clear that in order to do interrupts in a multiprocessor, it must be possible for any processor to interrupt any other processor without the message getting lost due to simultaneous actions in other processors. How to solve this?

And why did I just not copy an existing design or read a book to tell me how to do this? The problem is that I have not managed to find any good readable text on this kind of subject: how does a multiprocessor (shared-memory or local memory, does not matter really) really handle interrupts and coordinate the code that is actually running locally on each individual processor with that running on other processors — at the lowest level. A description of the hardware-software interaction design needed to make this work must exist somewhere, but I have not managed to find it, and I suspect that in many cases this is just passed down as lore from one generation of system designers to the next. If someone knows a good text on this subject, please do point it out to me!

My first design was to use N x N registers for an N-processor machine. Essentially, each processor would have a bank of registers with one register for each other processor, indicating the sending processor. Thus, if processors A and B decide to interrupt C simultaneously, they would write into two different locations, and C could scan its register array to tell that both A and B were calling. However, this eats memory space pretty quickly, since it requires 2 times N squared registers:

  • N registers local to a processor, to read out the message sent in.
  • N registers for each processor,  to write messages to. This can be either a local set for each processor, or a put in global memory.

In essence, this is the design of the OpenPIC controller common in PowerPC land. It codes the processors using bits rather than full registers, but it works with a local set of data for each processor where it can set bits to interrupt any other processor.

A colleague of mine pointed out that the SPARC systems do things a bit differently. There, you have a single register into which you send the number of the receiving processor, and a status flag to tell you if you were successful in sending. The sending software is thus responsible for retrying if the remote side is busy. This scales nicely to quite large systems, since there is no need to represent or manage interrupt registers many hundreds of bits wide — the vast vast majority of which would not be used anyway at any particular point in time. What you lose is the ability of a single processor to do arbitary multicast interrupting, which I don’t think is that commonly neede (though it might well be, this is a bit of a dark art).

Since both these controller registers are present in memory that is local to a processor, there is no need to worry about races between different processors interrupting the same target processor simultanenously. The hardware interrupt bus will work out so that only one wins, and the software on only one processor will see  a successful flag status and continue. The others will spin, or do more sophisticated waits if needed.

In the end, the code for sending an interrupt that I used was this:

void interrupt_cpu(int cpu_num, int message) {
  *my_intr_dest = cpu_num;
  *my_intr_send_data = message;
  while(*my_intr_send_status == 0) {
    *my_intr_send_data = message;
  }
}

Note that I still send a 32-bit message, mostly since that is handy in educational and demo setups that are not completely limited by what current hardware does. In this design, writing to the message register is what triggers the interrupt (or an attempt to send an interrupt, rather) on the other processor. The hardware (or in my case, the virtual hardware model) does the rest, in a way that is guaranteed to deliver all interrupts safely to its end point, eventually. But without any complex buffering in the hardware itself, that is best handled in the software which has an easier time managing state. This also lets the software use other strategies, such as possibly using a busy interrupt as a signal to try some other processor that is less busy.

Anyway, it was an interesting experience to try this, and seeing how hardware devices and software interact in a concurrent machine to create races. Not just software, but also hardware, must be designed right to avoid races from occuring. And races caused by hardware are quite impossible to work around in software at times.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.