Breaking and Detecting Simulators and Emulators

binary codeI have read a few news items and blog posts recently about how various types of software running on top of virtual machines and emulators have managed to either break the emulators or at least detect their presence and self-destruct. This is a fascinating topic, as it touches on the deep principles of computing: just because a piece of software can be Turing-equivalent to a piece of hardware does not mean that software that goes looking for the differences won’t find any or won’t be able to behave differently on a simulator and on the real thing.

F-Secure led me to a discussion of the Tinba and Dyre malwares and how they detect running inside of sandboxes. Back in 2008, I wrote a piece on how virtual machines could be made hard to detect, which quickly devolved into a discussion about timing attacks and other sophisticated methods to detect virtual machines and simulators. In the case of these two malwares, however, they go for really simple methods. Tinba looks for the size of the disk it is given, and assumes that a small disk must be a VM or sandbox. Simple, and probably quite effective in practice. Dyre (according to a write-up by Seculert) used something similar but even simpler – just checking if the host machine has a single core. Apparently, this simple heuristic is sufficient to thwart most popular malware analysis sandboxes. No sophistication needed, just a simple observation on how simple most analysts make their sandbox environments. Now that this trick is known, it should get harder to do it this way. It is easy to understand why analysts did use simple virtual machines with a single core, small memory, and a small disk for analysis, since throwing more hardware resources at the problem basically increased the cost for little apparent gain. Until now.

A friend pointed me to a blog post/explanation page from the mGBA project, discussing how a Nintendo port of classic NES games to the Game Boy Advance (GBA) console went to great lengths to break GBA emulators. See https://endrift.com/mgba/2014/12/28/classic-nes/. The story is great and the tricks are cool. It is hard to see why those particular techniques would be in a game except as a way to thwart something. But the question is just why. As the mGBA author says:

I’m not really sure why Nintendo went all out with these games, considering that these are just ports of NES games. Full featured NES emulators have existed for many years, with good ones popping up even as early as 1997. While it’s true that the Classic NES Series games were new to being played on a portable device, emulating Game Boy Advance on other portable devices would be years off.

The Classic NES series for GBA were released in 2004, so maybe this was inspired by the older emulators? One thing that is quite striking to me is that the team would have needed some kind of test system to see if their anti-emulators hacks worked in order to develop them. So just what were they using? Since all these techniques did work and did mess up emulation, did the Nintendo engineers have some proof-of-concept in-house emulator to try them on? Or is it the case that these were the six out of a hundred different techniques thrown into the code that happened to work out, and that there are tons of other hacks that never got noticed since they did not have the intended effect? There is a great computer programming mystery here, and I doubt we will ever find out.

The most interesting technique of the group was the “prefetch”, which I rather think of as being “cache coherency”. The idea is that on hardware, a write to an instruction a few steps ahead would not affect execution on hardware, since the instruction would already have been fetched when the write hits. But on a step-by-step emulator, you would do the write all in one step, and then pick up the modified instruction, and get the modified result. The key here is really the defined coherency between data writes and instruction reads. I did not quite find good data on the architecture of the ARM7 in this respect, but in general ARM processors seem to require explicit cache flushes to make modified code work – which is the standard modern design. The exception to that rule is the Intel IA series of processors, where self-modifying code was already so common when pipelines and caches were introduced that the architecture is defined to support self-modifying code. Which happens to make simulation of the IA architecture harder, as simulator caches now have to look out for writes that hit instructions that have been translated… An ARM emulator with cached translations would probably accidentally do the right thing for this trick.

Finally we have the recently discovered VENOM security issue, where a bug in the QEMU floppy-disk simulator let code in a guest break out of the virtual environment. This is certainly an emulator-breaking hack. It is also a good example of why repurposing code that started life as a local emulation and development tool that could assume a fairly benevolent environment (original QEMU) into a critical piece of infrastructure facing arbitrary and potentially hostile guest code (KVM using QEMU for some device emulation) is not necessarily a good idea. Code written to be a local dev tool is quite different from that written to be used as a runtime tool. The bug itself seems to be a simple matter of not checking that the values passed in as floppy command parameters are within reasonable bounds (see the patch). Typical thing that does not matter if you are trying to just get an OS to run, but that matters greatly when you run untrusted code.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.