The recent news that a hacked version of Apple Xcode has been used to insert bad code into quite a few programs for Apple iOS was both a bad surprise and an example of something that has been hypothesized for a very long time. For the news, I recommend the coverage on ArsTechnica of the XCodeGhost issue. It is very interesting to see this actually being pulled off for real – I recall seeing this discussed as a scenario back in the 1990s, going back to Ken Thompson’s 1983 ACM Turing Award lecture.
The fact that a compiler can inject additional code into a program apart from its source code should not be novel to anyone – it is a key part of any modern compile chain. If you did not allow this, there would be no basic services like the C library, and even things like program prologues and epilogues could be considered suspect. That this was used for malicious purposes, and used with success, that is new.
On the surface, I we look at what made this particular hack possible, I would like to say that it was pure stupidity and laziness on the part of the programmers. Along with a blatant lack of security thinking. Apparently, it was perpetrated by posting a hacked version of Apple Xcode to forums in China – and programmers there downloading and installing it that way rather than the official way from Apple. Since downloading it from Apple takes a long time compared to a local cache. However, they also had to accept that the program apparently did not have a valid signature from Apple, which should raise warning flags. I assume that if you re-post the Xcode installer, it will still carry a proper Apple signature – so that could still be reasonably safe. But ignoring signature warnings is never a good idea in this day and age.
The original idea by Ken Thompson was even more aggressive though – he pointed out that the compiler (or any other program really that was involved in generating programs) could self-propagate hidden code. If we assume that the compiler is used to compile itself, it can insert the code that inserts the code into the new compiler. Indeed, the lesson of Ken’s lecture is quite wonderful:
The moral is obvious. You can’t trust code that you did not totally create yourself. (Especially code from companies that employ people like me.) No amount of source-level verification or scrutiny will protect you from using untrusted code.
It is clear that programs that generate things that you run need to be vetted somewhat more closely than has been traditionally done, at least for cases where the programs they generate have a short and direct path to profit. Such as mobile applications and desktop software. What a pain. In 1983, it was still a semi-possible task to vet the code of the UNIX C compiler. Today, the libraries and tools and features in a toolset like Xcode or even just a compiler like gcc or LLVM is just overwhelmingly enormous. Still, the existence of multiple versions of compilers does offer the possibility to use one compiler to compile the other, assuming that any self-propagating code would have a hard time getting functional in an unknown setting.
Footnote 1: While researching this a bit, I found out that something similar had happened to a Delphi compiler back in 2009!
Footnote 2: If you read the footnotes to the ACM lecture, look at number 4.