OpenCL – First tremors of an industry revolution
In December 2008, the Khronos Consortium announced ratification and public release of the OpenCL 1.0 specification for parallel programming. Six months prior to that, all the major processor vendors had been vigorously questioning, redesigning and improving the draft specification contributed by none other than Apple the consumer device manufacturer. Many asked, why would Apple be interested in investing in such low level technology in the first place, and then donating it away for free? The answer – to be in front line to benefit from the eventual demise of Moore’s law.
Ever since Intel’s co-founder Gordon E. Moore published his infamous concept in the mid-sixties, Moore’s law has been used to predict the long-term evolution of exponential increase of transistor counts in integrated circuits. The trend of doubling the transistor count every two years has been maintained from 1970’s all the way into new millennia. However, even Moore himself recognizes that the Law has it’s end on the horizon – in ten to twenty years, the size of the transistor is approaching the size of an atom. Even before that barrier, the more ruthless laws of economics will interfere. The sheer cost of designing a circuit beyond 16 nanometer process, with nuisances such as quantum tunneling, will eventually render the path of miniaturization useless. The end of the progress as we know it?
Well, no. Multi-core Computing and Parallel Processing come to the rescue. The simple reasoning behind all this is – If we cannot increase the processing power of a single unit, let’s throw in a number of processing units to do the task faster and faster! But there is a caveat – already in the 60’s, when academic research of parallel processing was in its first bloom, IBM’s computer architect Gene Amdahl introduced the model known as the Amdahl’s Law. It simply predicts the theoretical maximum speedup achieved by adding multiple processors.
In any given problem, there are parts that can be executed in parallel, and parts that need to be executed sequentially. For example, let’s say that a program needs 10 minutes to be executed with a single processor, and 2 minutes of the total execution time requires sequential processing. The remaining 8 minutes can be parallelized, but even with millions of infinitely fast processors, the total execution time approaches the 2 minute mark – thus giving the theoretical maximum multiprocessing speed-up of 5x.
“Well, that’s not so bad”, one might say. However, the depressing fact is that throughout the history of modern computing, most of the programs, operating systems, and algorithms have been developed as sequential by design.
Parallelization of an algorithm can be a significant, often impossible task.
Any kind of synchronization or data dependency within the algorithm, or between two or more separate algorithms is bound to make the parallelization undertaking extremely difficult. There are technology areas, such as pixel based graphics, which are well suited to parallelism due to nature of the algorithm or processed data. Therefore it is not surprising that it was the graphics rendering that invoked the renaissance of the parallel computing.
As each of the pixels in screen can be independently processed, graphics accelerator hardware design started to evolve towards the direction of highly parallelized pipelines and multiple cores. Eventually, the graphics processing units (GPUs) by-passed the CPUs in both transistor count and theoretical processing performance. GPUs became fine-tuned behemoths to perform one single purpose – generating life-like 3D objects and environments for entertainment and industrial design needs.
After introduction of programmable graphics hardware architecture, programmers suddenly had possibility to harness the millions of transistors in GPUs to their liking. It was quite easy to predict that this newfound freedom would attract attempts to do something else with this raw horsepower than just graphics geometry calculations or lightning models. GPGPU – general purpose computing on graphics processing units – was born.
The rise of the GPGPU
Algorithms from areas such as cryptanalysis, signal processing, bioinformatics and even the search for extra terrestrial life were ported over to take advantage of GPUs’ vast processing capabilities. However as the programming environment and data types in GPUs were designed with graphics in mind, the implementers had to resort to sub-optimal tricks in programming and data manipulation. Also the quality of the results was often compromised due to limited precision in GPU’s arithmetic units. The performance however was beyond any CPU implementations so far.
As GPU vendors witnessed the rise of the GPGPU, they realized that this could be the opportunity to conquer the market share from the traditional CPU vendors. They started to nurture the GPGPU community by introducing better, yet proprietary, programming interfaces such as Close To Metal from ATI and CUDA from NVIDIA. Hardware designs were modified so that the precision problems in computation got smaller. In their developer support programs and technology demonstrations, GPGPU became an important part of their software portfolio. With the assistance of GPU vendors, many 3rd party software developers added GPGPU support to for example, accelerate video encoding and decoding.
Parallel processing became commodity overnight
At the same time, the traditional CPU vendors were also moving towards parallelism. With the introduction of Symmetric Multiprocessing support in mainstream CPUs, and evolutionary transition towards multiple dies within single package (multicore), the CPUs feature similar possibilities and challenges as their GPU counterparts. When Sony chose to feature Cell, co-developed with Toshiba and IBM, in their new Playstation 3 game console, the parallel processing become commodity overnight. Game developers were forced to re-engineer their tools and engines to better adapt with parallel processing environments now found in consoles and desktop computers alike.
Fast forward to spring 2008. Apple hands over their parallel programming API specification to the Khronos Group, the standardization body responsible of developing royalty-free standards such as OpenGL. The OpenCL working group is formed, and with record-pace the new OpenCL 1.0 standard is ratified. On August 2009, Apple releases their latest operating system, OSX 10.6 Snow Leopard. Within the Snow Leopard’s core, holding hands with Apple’s task parallelization engine Grand Central Dispatch, is OpenCL. Clearly, they are ready for The Parallel Revolution.
Embracing the inevitable
With this seemingly small step, Apple was liberated from need to predict the direction of the future hardware evolution. Their operating system was now capable of dividing processing load to multiple CPUs, and GPUs, provided that the programmers knew what they were doing. For this, Apple needs the open standard. By embracing the inevitable parallel future this early in their operating system, they ensure that the developers learn to take full advantage of underlying hardware. It is interesting to see how Microsoft will respond, their Parallel Extensions for .NET 4.0 are in development.
Meanwhile in the embedded space, ARM is moving rapidly towards multiprocessor architecture. At the same time multiple solutions with highly parallel architectures targeted to portable devices are emerging. OpenCL in your cell phone may easily be a reality in very near future for very same reasons as on personal computers.
For us software engineers, parallel programming poses truly a refreshing challenge. We have to abandon most of our well served, yet serialized designs and start the learning process. Designing parallel algorithms is by nature a difficult task for the human brain. Representing the algorithm with current programming languages is even harder. New programming languages are needed to better describe the problem, and the solution, for the machine to process. OpenCL is just a beginning of a new era in computer science, and the industry itself.
About Symbio TechBlog
The Symbio TechBlog discusses topics related to convergence, software product development and user experience technologies as we see them evolve in the market and our daily lives. Our goal is to provide you with insights on demanding software challenges in different industries and establish an interactive forum for exchanging ideas and experiences.
Symbio has deep domain expertise in multiple industries ranging from IT, Telecommunication, Life Sciences, Automotive, and Industrial, all of which need to come to terms with the increased technological complexity in products and services. Driven by complex customers and end user demands, our experts aim to share their knowledge on a wide range of topics.
We welcome you to contribute and join our discussions!



Simple and Nice explanation about OpenCL.
Now, I understood the need of OpenCL and its growth.