I recently stumbled on a blog post called Building Faster AMD64 Memset Routines, written by Joe Bialek of the Microsoft Security Response Center (MSRC). The blog describes his efforts to improve the performance of the Windows kernel memset() function, across all sizes of memory to set. The reported optimizations are quite fascinating, and could be summed by avoiding branches even at the cost of doing redundant stores. Basically, stores are free while branches are expensive.
Continue reading “Microsoft Windows memset Optimization – Stores are Free”