In my previous blog about the Ghostwrite vulnerability in the Alibaba T-Head C910 RISC-V-based processor, I noted that the authors of the paper had found more than just that one bug. The additional bugs are worth their own write-up, as they offer some more examples of what looks to be poor testing.
RISCVuzz – The Tool
The paper that described the GhostWrite bug is called “RISCVuzz: Discovering Architectural CPU Vulnerabilities via Differential Hardware Fuzzing”, by Fabian Thomas, Lorenz Hetterich, Ruiyi Zhang, Daniel Weber, Lukas Gerlach, and Michael Schwarz. It does not seem to have been formally published in any journal or conference, but is available from the Ghostwrite web page.
The idea presented in the paper is clever: to compare different hardware implementations of the same instruction set by feeding them the same instruction sequences and comparing the results. If different implementations agree, it is likely that all are correct. If they differ, one or more of the processors tested would have to be wrong. This can be applied to any instruction set, but they did it with RISC-V – which has the benefit that there is likely an unusually large number of entirely different implementations to choose from.
They apply the methodology to five 64-bit RISC-V cores available today in hardware form:
They found no issues in the SiFive processors. But they did find issues in all the T-Head processors. There were also some bugs in the Qemu model for RISC-V that they did not say much about (would have liked to know more there).
Getting this to work in practice, using user-space Linux programs to do the testing, is not entirely trivial, as you might imagine. Read the papers for the details, I will not repeat them here. Quite a few clever tricks were used.
Broken Instructions
The C910 suffered from the Ghostwrite issue as already discussed.
The C906 and C908 had something the authors call “halt and catch fire”, which is a bit overly dramatic. Basically, they found instructions that when executed hang the processor cores, requiring a hardware reset to get the platform back.
I really liked this illustration of where in the instruction space the bugs were found:
The three different bugs are in quite different categories. One, Ghostwrite, is in the Vector set. The C906 is in a documented vendor extension. The C908 is in the regular instruction set.
The bugs are just as strange as the vector write bug on the C910. For example, this is description of the C906 problem:
The core of the sequence is the th.lbib instruction from the custom XTheadMemIdx extension. This vendor extension provides additional memory operations such as increment-address-before-load (th.lbib). The halt occurs in combination with using the same register for source and destination operand, a subsequent CSR read, and any subsequent interaction with the register provided to the instruction.
The paper notes that the use of the same register as source and destination is documented as being invalid. But that does not mean that the processor must not handle the case, or that test cases can ignore it.
Negative Testing
The C908 is a very clear example of that the processor test suites must be missing negative test cases. The C908 issue occurs when “illegal” bits are set in an instruction:
The instructions discovered by RISCVuzz on the C908 correspond to the vector mask store/load instructions vsm.v and vlm.v. Setting any of the bits 29 to 31 in the encoding of these instructions crashes the machine. Note that these bits should be all unset, i.e., zero, when the instruction is assembled correctly.
Really, the processor should have raised an undefined instruction exception or something. And there should have been a test case to test that did happen. Instead, I guess they just have a bunch of positive tests (i.e., tests that check that the instructions function as intended when used as intended). Negative testing is super-important for robustness.
Even More Negative Testing
One finding reported in the paper was that the C906 and C910 processors were not entirely compatible with the RISC-V ISA spec:
ISA Incompatibility: The C906 and C910 are not fully compatible with the ISA specifications. These CPUs do not ignore writes to bits 8 to 10 of the fcsr register. Both the C910 and the C908 support a subset of the vector extension. This manifests itself in some of the instructions doing nothing, others doing unexpected things (cf. Section VI), and some not being implemented at all. Interestingly, the subset of instructions also differs between the two CPUs.
Another good example of the importance of negative testing. You cannot just check the positive results of an operation – i.e., in this case that the flags that are supposed to change do change – but also the negatives – i.e., that the flags that are not allowed to change do not change.
It is another indication of a rush job and weak test suites. The behavior fits with a guess that the same (weak) test suite is run on all the cores, but that the C908 has a different implementation in some part of the decoder that accidentally does the right thing. Alternatively, the cores were developed by teams using different test suites. Which is really not very reasonable.
Summary
I found this paper quite fascinating. The technology used is cool, and the results they report point at a pretty bad engineering organization (sorry to say, but that is the only conclusion that can be drawn), and in particular poor testing. I happen to love testing and breaking things, and the T-Head processors provide some very nice illustrations for why you need to think about testing from both positive and negative perspectives.