Yes, when does hardware acceleration make sense in networking? Hardware acceleration in the common sense of “TCP offload”. This question was answered by a very nicely reasoned “no” in an article by Mike Odell in ACM Queue called “Network Front-End Processors, Yet Again“.
The article is highly recommended for its long historical look at network processing and network processing offload. As the balance between speeds of networks, processors, memory, and interconnects between network cards and the rest of the system has changed over the years, it is an idea that occasionally (four or five times since the 1970s) has made sense. However, in the end, Mike thinks that it usually does not, and for a machine with multiple cores and a modern fast interconnect, it is hard to see how a hardware accelerator can actually help speed things up much when the coordination between the hardware and the software is accounted for. Even if there would appear to be a big bottleneck somewhere today, we can be sure that it wil be removed in the next generation of hardware, rendering the market window for an accelerator quite short.
I read this article as another great motivation for the need to carefully consider the functional design of the hardware-software interface for acceleration devices. For simple data-pumping or media-processing units, this looks easy. For something as complex as TCP/IP processing, it is not. I think the key is that for TCP, we have something that is much more like control-plane processing than data-plane processing, and that is harder to efficiently integrate between hardware and software. Also, there is not really that much work left to offload once data copies have been architected in the right way (and I read Mike’s article to say that we now know how to do this in a sufficently few-copies way that software is close to optimal in architecture).
From a market perspective, it would also indicate that the acceleration circuits that are in common use today are by definition those that make sense. Having hardware-accelerated graphics and video decoders does seem to help build more efficient and attractive computer systems, as do cryptography accelerators. With this view, it will be interesting to see which of all the accelerators found in modern networking SoCs like those from Freescale and Cavium will survive the test of time. I am willing to put a small bet that pattern-matching engines for traffic inspection is one of them. Apart from that, hard to say.
So go read that article before you start designing your next brilliant accelerator for a common expensive operation.
It also reminds me of a whitepaper I wrote early this year on how to evaluate performance of a hardware accelerator in the context of a full system with a full software stack, considering the details of the hardware-software interface.