I once wrote a blog post about the use of computer architecture pipeline simulation in the IBM ”Stretch” project, which seems to be the first use of computer architecture simulation to design a processor. After the ”Stretch” machine, IBM released the S/360 family in 1964. Then, the Control Data Corporation showed up with their CDC 6600 supercomputer, and IBM started a number of projects to design a competitive high-end computer for the high-performance computing market. One of them, Project Y, became the IBM Advanced Computing Systems project (ACS). In the ACS project, simulation was used to document, evaluate, and validate the very aggressive design. There are some nuggets about the simulator strewn across historical articles about the ACS, as well as an actual technical report from 1966 that I found online describing the simulation technology! Thus, it is possible to take a bit of a deeper look at computer architecture simulation from the mid-1960s.
The ACS-1 Machine (Design)
The design for the first machine from the ACS project, the ACS-1, is rather notable. In 1965-1966, they designed a superscalar multiple-issue out-of-order processor. It featured multiple issue of up to 5 instructions at once, speculative execution past two branches, and out-of-order pipelined execution of instructions supported by register renaming.
The ACS-1 was designed with a very RISC-like instruction set with just two instruction lengths (24 and 48 bits), and an emphasis on compiler-friendliness – rather unique concepts at the time, and this was a precursor to the later IBM 801 that led to POWER and PowerPC (in rather convoluted ways from what I read).
The computer had 32 integer and 32 floating point registers (conveniently, you could fit a 9-bit opcode plus 3 5-bit operands in a single 24-bit word). With an emphasis on register-register operations rather than the more common memory-based operations of the day, it looks very modern.
The ACS-1 compiler team came up with ideas like loop unrolling, function inlining, instruction scheduling, and even profile-based optimization. Once again, standard techniques today, but this was mostly new 50 years ago!
The ACS-1 never made it to market. Due to internal politics at IBM, costs, market and marketing shifts, and the intervention of Gene Amdahl, the machine was converted to be compatible with the IBM S/360 family. The resulting AEC-360 machine dropped the RISC-like instruction set and odd-ball word size, and was then cancelled since it seemed no longer necessary in the market. IBM had managed to make sufficiently fast machines in the S/360 series that there was no pressing need for the planned ACS/AEC machines anymore.
Before the ACS-360 or AEC-360 was killed, it had also generated some interesting ideas. In particular, it added a “second instruction counter” to keep the pipeline filled when a program could not generate enough independent instructions. Basically, simultaneous multithreading with the original goal to hide stalls from memory accesses and control dependencies.
Simulation in the ACS Project
Still, this is really about the use and design of simulators for the ACS project. The simulator that Lynn Conway built for the ACS project appears to have started as a task for a recent university graduate, but in the end turned into the key tool for the entire project. The simulator was used to test the instruction set and evaluate the performance effects of design changes. It was used to predict absolute performance, to allow comparisons to other computers of the day.
The simulator became in essence an executable specification for the design. In a 2010 Computer History Museum presentation by some members of the ACS project, Lynn Conway (starting from around 33 minutes in) notes that:
The architecture team started to rely on the simulator to document some of the rather detailed design decisions especially as simulation runs began occasionally uncovering glitches in the design…
Even more to the point of an executable specification:
you could see that something was kind of happening there in terms of the user of a simulator to actually be a kind of running document of the design decisions.
This is very modern view of how simulation can be used in exploratory architecture work – as an executable specification that captures accumulated knowledge and detailed design decisions. I wrote a bit about this on my Intel Developer Zone blog recently, see Getting to Small Batches in Hardware Design using Simulation – the ACS project is an excellent example of how simulation is used to do design with quick feedback and small batches.
I find it a bit unfair that when the AEC-ACS comparison was being done in 1968, the AEC performance was estimated from manual analysis of small pieces of code. There was no simulator for the design, so that was the only way to do estimates. However, this was compared to real simulations of code for the ACS, with a real toolchain behind it. Which I think is bound to give a more realistic result than a manual estimate, since the simulator will capture more effects and details, and thus more bottlenecks.
Before the AEC project was killed, it seems that the ACS-1 simulator was converted to simulate the ACS/360. The materials I have available have no real details on this, but it makes sense to convert and continue to use what as indeed an excellent tool.
The Simulator of the ACS Project
The simulator was built on a simulation framework that already existed, created by Don Rozenberg and Bob Riekert. This framework is described in a 1966 technical report (Don Rozenberg, Lynn Conway, and Bob Riekert: ”ACS Simulation Technique”, IBM Technical Report, March 1966). The framework is worth a discussion in its own right. It starts as a very general event-driven simulation framework, which is then applied to simulate a computer system. The introduction is marvelous; the fourth paragraph lays out the idea of an event-driven simulator:
The report goes on to describe the implementation of the general-purpose simulator framework. The nomenclature used isa a bit different from what we find today. Where current simulation frameworks tend to talk about posting events on a queue, the ACS simulator used “CAUSE” to mean “post”, and the term “calendar” for the event queue.
Implementing the simulator was also a bit more difficult than it is today. The programming system used lacked support for function pointers, which is a mainstay of simulator implementation today (either directly in C, or as dispatch tables in C++ and similar). Instead, they had to achieve the same effect using either compiler peculiarities that got very close to function pointers, or computed GOTOs in a main dispatch loop.
In design philosophy, the simulator feels similar to SystemC or CoFluent. The computer architecture is expressed as flows between blocks, with models of queues and contention. For example:
We recognize the concepts here: transaction generators, queues, simulation of occupancy in parallel execution units. The simulator is cycle-driven, by posting events on every cycle. Time is actually expressed as floating point numbers, which is used to post “at the end of a cycle” by posting at .8 cycles into the future. A separate simulation process STATS posts itself on every cycle and prints the current architecture status to allow the execution of the simulation to be traced.
The eternal question for a computer simulator is how fast it is. The ACS-1 simulator was not all that great…
The simulator was written in FORTRAN IV (H), and ran on an IBM S/360 Mod 75 under OS/360. It operated at a rate of approximately 10 simulated instructions per second; typical programs thus ran at a rate of about 20 instructions per second.
20 IPS – the actual ACS machine was planned to run at around 80 MHz (12.5ns cycle time), and with an instructions-per-cycle of greater than 1 for HPC codes. So the slow-down in terms of simulated time over real time was more than a million.
However, that might not be a good basis for comparison to current computer architecture simulators – the ACS was designed to be a total leap forward, so let’s assume an ACS-1 would have been about 100x faster than the model 75 the simulator ran on. This would put the slow-down of the simulator at somewhere between 10,000x and 100,000x – which is pretty much typical for a cycle-accurate simulator even today. I think this is a reasonable alternative way to evaluate the performance – we do not find 100x performance differences between two successive generations of computer processors today. Thus, I think it is fair to say that the technology as such was about as good as what we have today in terms of slowdown.
There is a lot of material actually available about the ACS project and its simulator, and it is fascinating to take a look at how things were done more than 50 years ago. Obviously, some things look very dated – typewriter-written reports, languages without function pointers, very primitive IO facilities, incredibly slow computers. However, other things pretty much follow the same principles and pattern as current computer architecture practice.
I think technology tends to be like that: some things (languages, speed of computers, size of memory, …) change all the time and change rapidly, but basic principles tend to be surprisingly stable once a good pattern has been discovered.
The use of simulation in the ACS project is documented in several articles.
- Don Rozenberg, Lynn Conway, and Bob Riekert: ”ACS Simulation Technique”, IBM Technical Report, March 1966.
- Mark Smotherman and Dag Spicer: “IBM’s Single-Processor Supercomputer Efforts”, Communications of the ACM, December 2010.
- Mark Smotherman, Edward Sussenguth, and Russell Robelen: “The IBM ACS Project”, IEEE Annals of the History of Computing, January-March 2016.
- Brian Randell, “Reminiscences of Project Y and the ACS Project,” IEEE Annals of the History of Computing, July-September 2015.
- The IBM ACS System, recorded talk from the Computer History Museum, 2010
Random Historical Analogy
Since I am a history buff too, I just happened to think of an analogy to the deep principles versus great change argument I make above…
Say you took a Roman centurion and showed him a modern army, the weapons and tactics (and accepted behaviors) would be completely new to him. But the concept of as structured army with logistics and training for the troops would be familiar – an organized civilized army has to do certain things, and that has been the same since Assyrian times. The work of the classic US-style drill sergeant would be entirely familiar to the centurion – making proper soldiers out of civilians.