Background

With a  couple of months' perspective, I’m pretty convinced that Intel has made a potentially disruptive entry in the market for programmable computational accelerators, often referred to as GPGPUs (General Purpose Graphics Processing Units) in deference to the fact that the market leaders, NVIDIA and AMD, have dominated the segment with parallel computational units derived from high-end GPUs. In late 2012, Intel, referring to the architecture as MIC (Many Independent Cores) introduced the Xeon Phi product, the long-awaited productization of the development project that was known internally (and to the rest of the world as well) as Knight’s Ferry, a MIC coprocessor with up to 62 modified Xeon cores implemented in its latest 22 nm process.

Why Xeon Phi Is Important

Parallel accelerators are an extremely effective way to accelerate solutions that have inherent parallelism. Beyond the realm of graphics processing and games, whose highly parallel pixel-oriented operations spawned the first generations of GPUs, these applications tend to be high-value solutions such as simulation, engineering analysis, seismic processing, and medical, remote sensing and security-related imaging processing, to list a sample. GPGPU’s can offer multiple 1  –  2 decimal orders of magnitude of acceleration for the right problems, although most results are in the single order of magnitude range when compared to a modern multi-core base platform. The “gotcha” is that they are very difficult to program. Even with modern programming environments such as CUDA and Open-CL, these accelerators are difficult to program, and in most cases programmers must have an explicit understanding of the algorithm’s parallelism and an understanding of the GPU architectures. Additionally, the resultant code is not compatible with the x86 code portion of the application, complicating development and maintenance.

Xeon Phi, while less dense in its initial implementation and certainly with lower peak performance than competing products from AMD and NVIDIA, has one outstanding compensating characteristic that has the potential to change the balance of power in this segment — it is x86 compatible. This compatibility gives developers the ability to reuse an immense amount of existing software IP, commonality with existing development tool sets and a more rapid learning curve. x86 compatibility is not a magic wand, and in many cases the existing algorithms will have to be extensively re-structured to take advantage of the hardware parallelism, reducing the potential benefit of nominal code compatibility. But in other cases, code which has been written for nominal levels of parallelism such as that found on threaded applications written for conventional multicore architectures can be run on the Phi coprocessor simply by recompiling standard code new compiler directives. Intel has demonstrated modest (2.5x or greater performance gains over conventional multi-threaded code on 8 – 16 core Xeon systems) with these minimalist migration steps, and while not applicable to all applications, the combination of simple migration and even modest performance gains will be attractive to a wide swath of developers. The Phi toolset is also very friendly to OpenMP code, which is a common development model in the x86 HPC world, and Intel's compilers and development tools are mature and well-accepted. Interestingly, despite its relative immaturity, the Xeon Phi tops the most recent (November 2012) Green500 list, beating out competing systems based on AMD and NVIDIA for the flops/Watt crown, albeit with a very low performance configuration relative to the other systems. I expect rapid scaling of Xeon Phi performance results as the product matures and its community pushes its limits.

What Will Be The Impact Of Phi?

By introducing a widely available parallel accelerator solution with lower barriers to application migration, Intel can accelerate the adoption of explicit parallel coprocessors. Competitors AMD and NVIDIA will be under additional pressure to improve their development tools and will continue to push the performance frontiers of their products.

Eventually, possibly a couple of successive CPU generations down the road, we may see the MIC architecture wedded to the Xeon memory space via an extension of the QuickPath architecture, much the same way that the AMD Fusion architecture couple the GPU components in their integrated APUs. On the way, Intel will introduce more scalable MIC products, and their immense leverage with their OEM partners will ensure the rapid development of a robust MIC ecosystem in terms of tools, supported ISV solutions and trained developers.

The key takeaway for Xeon Phi is that Intel has permanently changed the dynamics of the attached parallel accelerator segment with this introduction, moving it from a niche to emerging mainstream, even as we look for more mature products in the future. Developers and users benefit from a step-function in performance on a wide range of ISV applications over the next 12 – 18 months. Competitors need to stop taking refuge in their current performance advantage and market share and adapt to the reality of the world’s largest semiconductor company putting its very large foot down in their core market.