When IBM moved their mainframe systems (the S/360 family that is today called System Z) from BiCMOS to mainstream CMOS in 1994, the net result was a severe loss in clock frequency and thus single-processor performance. Still, the move had to be done, since CMOS would scale much better into the future. As a result, IBM introduced additional parallelism to the system in order to maintain performance parity. Parallelism as a patch, essentially.
At least that is the idea you get when the story is told in an IEEE SE-Radio interview with Jeff Frey of IBM. I highly recommend this podcast episode for anyone interested in the software and hardware architecture outside of the well-known mainstream.
Digging a bit further, it seems that it took IBM until 1997 and the G4 S/390 processor to catch up with the single-thread performance of the last of the BiCMOS machines, and in 1998 with the G5, they surpassed it by a factor of two (see IBM’s S/390 G5 Microprocessor Design by Slegel et al). That means they basically had to spend five years in flat-line performance, relying on parallelism to scale. Impressive bet by the company, and presumably only possible due to the nature of the market. Had there been a competitor that kept building faster machines, this move would have been much harder to pull off. Just see what happened to AMD and their Bulldozer design, betting on parallelism while Intel kept improving single-core performance.
What is really intriguing though is just how this parallelism was achieved. It was not just adding more processor units in the same shared memory area, but they also built a memory-coherent clustering technology called Parallel Sysplex. I have tried to understand more on how it works, but details are very hard to come by unfortunately. The design relies on using hardware to handle time synchronization, coordination, synchronization, and memory coherency between separate boxes. Compared to normal shared-memory servers built on Intel architecture, IBM is throwing much more powerful hardware at the problem of parallelism, as well as modifying the software a bit to support the parallelism.
In addition to performance increases, parallel Sysplex can also be used for hardware redundancy (a bit like non-transparent PCI bridging I would say).
The programming model is not really global shared memory, it seems to be rather assisted somewhat explicit communication. Most users would not see it, as it is hidden in the OS layers and databases used by applications. I think you could look at the hardware is as hardware accelerators for software memory coherency – some software effort required, but it will run well and fast thanks to hardware support. As always, that is the advantage you get by building an entire system from the bottom up.
Impressive, but I would have liked to find out more. And using parallelism as a patch is pretty gutsy. But then again, IBM is a company that has survived for as long as it has by being able to change direction and bet the company on risky propositions more than once.