Since the very beginning of computers, processors were build with ever-increasing clock frequencies and instruction-level optimizations for faster serial code execution, such as ILP, caches, or speculative engines. Software developers and industry got used to the fact that applications get faster by just exchanging the underlying hardware. For several years now, these rules are proven to be no longer valid. Moore's law about the ever-increasing number of transistors per die is still valid, but decreased structural sizes and increased power consumption demand stalling, or even reduced, clock frequencies. Due to this development, serial execution performance no longer improves automatically with the next processor generation.
In the 'many-core era' that happens now, additional transistors are used not to speed up serial code paths, but to offer multiple execution engines ('cores') per processor. This changes every desktop-, server-, or even mobile system into a parallel computer. The exploitation of additional transistors is therefore now the responsibility of software, which makes parallel programming a mandatory approach for all software with scalability demands.