Yes, when does hardware acceleration make sense in networking? Hardware acceleration in the common sense of “TCP offload”. This question was answered by a very nicely reasoned “no” in an article by Mike Odell in ACM Queue called “Network Front-End Processors, Yet Again“.
The article is highly recommended for its long historical look at network processing and network processing offload. As the balance between speeds of networks, processors, memory, and interconnects between network cards and the rest of the system has changed over the years, it is an idea that occasionally (four or five times since the 1970s) has made sense. However, in the end, Mike thinks that it usually does not, and for a machine with multiple cores and a modern fast interconnect, it is hard to see how a hardware accelerator can actually help speed things up much when the coordination between the hardware and the software is accounted for. Even if there would appear to be a big bottleneck somewhere today, we can be sure that it wil be removed in the next generation of hardware, rendering the market window for an accelerator quite short.
I read this article as another great motivation for the need to carefully consider the functional design of the hardware-software interface for acceleration devices. For simple data-pumping or media-processing units, this looks easy. For something as complex as TCP/IP processing, it is not. I think the key is that for TCP, we have something that is much more like control-plane processing than data-plane processing, and that is harder to efficiently integrate between hardware and software. Also, there is not really that much work left to offload once data copies have been architected in the right way (and I read Mike’s article to say that we now know how to do this in a sufficently few-copies way that software is close to optimal in architecture).
From a market perspective, it would also indicate that the acceleration circuits that are in common use today are by definition those that make sense. Having hardware-accelerated graphics and video decoders does seem to help build more efficient and attractive computer systems, as do cryptography accelerators. With this view, it will be interesting to see which of all the accelerators found in modern networking SoCs like those from Freescale and Cavium will survive the test of time. I am willing to put a small bet that pattern-matching engines for traffic inspection is one of them. Apart from that, hard to say.
So go read that article before you start designing your next brilliant accelerator for a common expensive operation.
It also reminds me of a whitepaper I wrote early this year on how to evaluate performance of a hardware accelerator in the context of a full system with a full software stack, considering the details of the hardware-software interface.
Aloha!
I agree that it was a great article. However, I think it should at least reference Sutherlands Wheel of Reincarnation[1], esp if one (like you do) think of video/gfx acceleration.
It is also worth noting how and to what degree acceleraion can be added – generality vs performance. x86 processors are gaining instructions for acceleration of security algorithms and they could easily be moved to embedded space which probably would be better than adding a specific AES core – for example.
[1] Sutherland On the design of Display Processors
http://cva.stanford.edu/classes/cs99s/papers/myer-sutherland-design-of-display-processors.pdf
Sounds like the argument put forward by Tensilica for their ASIP cores. I think an x86 core in general is a bit too big and inefficient for embedded…
Aloha!
Define what you mean by embedded? Is network appliances embedded – if so x86 is already there en masse. There are a lot of VIA-based and Atom-based boards for embedded space being used for things like firewalls, storage appliances, photo copiers and industrial control. Most of them are not battry powered, but clearly embedded devices.
Embedded as in mobile phones? Give Moore’s law a few turns and use the shrink to reduce size, power and cost and x86 will probably be there too. Yes, it’s a scary thought but from a SW and product viewpoint it makes sense.