I have just found what almost has to be the first cycle-accurate computer simulator in history. According to the article “Stretch-ing is Great Exercise — It Gets You in Shape to Win” by Frederick Brooks (the man behind the Mythical Man-Month) in the January-March 2010 issue of IEEE Annals of the History of Computing, IBM created a simulator of the pipeline for the IBM 7030 “Stretch” computer developed from 1956 to 1961 (photo from IBM.com).
For those unfamiliar with the Stretch machine, it was a supercomputer developed by IBM which introduced many of the performance techniques and basic computer technologies that we all use today (most of them handed down to us via the IBM System/360). For example, it was the first to use 8-bit bytes and 64-bit floating point. It also introduced memory protection, memory interleaving, and instruction prefetching.
More relevant for my blog is the fact that the Stretch used the world’s first pipelined main processor, complete with interlocks to maintain program-order semantics. When developing this pipeline, Frederick Brooks claims that IBM developed a program to simulate the pipeline. This simulator was used to test the performance of the pipeline design on various test programs (this was before they were called benchmarks), and tune the design accordingly. The simulator was created by Harwood Kolsky. There is no firm date for the pipeline simulator, but based on the development time of the Stretch, it can be dated somewhere around 1960.
Thus, the simulation-driven approach to computer architecture is about 50 years old by now. Should have gone to ISCA and used this as an excuse for a party I guess…
It is also interesting to note that the Stretch computer acquired a co-processor in 1962, to do cryptology work. This machine was the one-off IBM 7950 “Harvest” and was tailored for the needs of the NSA in the US. It was a seriously special-purpose hardware unit adding a few instructions to the Stretch machine, and beating any other machine at the time by about 50 to 200 on the particular NSA workloads. Sounds like the kind of performance claims that Tensilica and other application-customized processors claim. 50 years ago.