May 20, 2016
The Background — Linux as a Fast Follower and the Need for Hot Patching
No doubt about it, Linux has made impressive strides in the last 15 years, gaining many features previously associated with high-end proprietary Unix as it made the transition from small system plaything to core enterprise processing resource and the engine of the extended web as we know it. Along the way it gained reliable and highly scalable schedulers, a multiplicity of efficient and scalable file systems, advanced RAS features, its own embedded virtualization and efficient thread support.
As Linux grew, so did supporting hardware, particularly the capabilities of the ubiquitous x86 CPU upon which the vast majority of Linux runs today. But the debate has always been about how close Linux could get to “the real OS”, the core proprietary Unix variants that for two decades defined the limits of non-mainframe scalability and reliability. But “the times they are a changing”, and the new narrative may be “when will Unix catch up to Linux on critical RAS features like hot patching”.
Hot patching, the ability to apply updates to the OS kernel while it is running, is a long sought-after but elusive feature of a production OS. Long sought after because both developers and operations teams recognize that bringing down an OS instance that is doing critical high-volume work is at best disruptive and worst a logistical nightmare, and elusive because it is incredibly difficult. There have been several failed attempts, and several implementations that “almost worked” but were so fraught with exceptions that they were not really useful in production.[i]
A critical note here for those unfamiliar with patching in general. Hot patching as referenced here applies to the application of critical security and stability patches, those CVE CVSS rating of 6 and higher that involve critical security, data corruption or system stability issues. Hot patching is not for major OS version upgrades, and all versions of hot patch require some level of expert review before they are installed.
What Has Changed?
Early versions of Linux hot patching has been around for several years, most notably in a company called Ksplice, acquired a few years ago by Oracle. But the real change happened earlier this year when SUSE declared that its hot-patch capability, kGraft, previously in limited availability, was now in GA, suitable for all production workloads. This is a bold claim, opening SUSE up to immense problems if it fails regularly in production. However, in further support of the claim that they can support critical enterprise workloads, at the SAPPHIRE conference in May, SUSE further announced that their hot patch capability was certified by them for SAP HANA. In conversations with SUSE, they indicated that the percentage of such patches that could be applied using kGraft after expert review was in the high 90’s.
While the remaining major Unix OSs have tremendous proven capability in production and support their own hardware with a number of unique features which Linux still lacks, the importance of this event cannot be over-emphasized. The Open Source community process, which has proven itself capable of innovating a highly capable fast-follow OS environment has now proven that it can lead with advanced features ahead of the legacy Unix community. Some versions of Unix have had hot patching for years but never really emphasized it because it was limited in applicability, and those that are lacking it will catch up, but the perception of Linux as somehow “less than” has been permanently shattered.
The why is probably as simple as it was inevitable – different priorities and in some cases possible resource/budget constraints. The number of actual developers doing core development on proprietary Unix is probably an order of magnitude smaller than the number of people contributing to the Linux kernel and other related projects. The remaining revenue streams for the proprietary products, while representing secure and highly profitable cash flow for the next decade, are simply not enough to match the momentum behind Linux, requiring possibly more focused application of resources and in some cases representing real budget constraints. Cases in point:
- In the case of Oracle, it is pretty clearly an issue of competing priorities. They have just come out of a significant development cycle which saw some breakthrough capabilities around hardware acceleration of selected Oracle software operations and major improvements in security features supported by the latest SPARC hardware along with expansions of their cloud capabilities.
- IBM has continued to invest in AIX, and has done impressive things with continued RAS improvements for AIX with non-disruptive upgrades (not the same as hot patching) and support for the new POWER8 CPU and the OpenPOWER community
- HPE has been a bit opaque recently on the future of HP-UX, having lost major market share in the last several years and pointedly not porting it to their new x86-based Superdome-X mission-critical servers. My opinion, unsupported by hard data, is that they are signaling that they are throwing in the towel, so to speak, on HP-UX long term and doubling down on Linux and Windows with a resulting de-emphasis on new feature development on HP-UX while continuing to support security and stability issues.
The Hot Patch Landscape Today
While SUSE can be said to have crossed the line first with GA of its kGraft utility, the hot patch ecosystem is active and even boasts multiple architectural approaches. In addition to kGraft, Red Hat has a tech preview of its kpatch utility and Oracle has the original Ksplice available for its Linux distribution.
The different products operate slightly differently, to the tune of impassioned debate about which one is better (a debate which I am honestly unable to intelligently take a side). kGraft (SUSE) works on a per thread basis and allows the OS to really run continuously while pointers to the new versus old runtimes are swapped, but the entire process may take several minutes during which the systems continues to operate under the unpatched version and switches transparently to the new environment when the patching process is completed. Ksplice (Oracle) and Kpatch (Red Hat) must actually pause kernel execution for an interval that has been reported as being in the range of 10 – 40 ms and then perform all internal juggling at once. This momentary pause may be insignificant to some or very noticeable in other environments. The debate, reduced to its essentials, is the choice between getting it all done at once with just a tiny little hiccup that might turn out to be a big burp if your system is doing very high volume transactions versus having the process take several minutes without interruption. There are also differences in the limitations in regards to the kind of patches.
On the Unix front, vendors have been discussing and releasing partial solutions for years, and others have hinted this capability for future releases. In 2014, a senior Oracle technologist listed hot patching as a focus for future development, so we can speculate that it is coming sometime in an upcoming release of Solaris (https://blogs.oracle.com/markusflierl/entry/oracle_solaris_and_openstack_at). IBM has documented and commented on hot patch capabilities since 2007, but the available documentation seems to indicate that it has too many limitations to be considered a mature production capability.[ii] In the latest AIX release, 7.2, IBM has included a Live Update capability that seems to install a complete new OS image in a parallel LAPR and transfer over all running processes and their memory. This approach appears to eliminate most limitations on the kinds of patches that can be applied but the documentation seems to indicate that the process entails some “blackout period”.
But amidst all of the tentative steps, it appears that only SUSE has positively made a viable hot patch capability available for general availability production use now.
Where Is Linux Heading?
Faced with competing technologies to solve the same problem, the Linux community is doing what it usually does (as have standards bodies since time immemorial) and endorsing both. In the strange process of co-opetition that is at the heart of the Open Source community, both camps are feeding IP upstream into the kernel development process, and the 4.0 kernel apparently contains code for both approaches. It is almost certain that a future release of the 4.x kernel will contain production-ready hot patch as a standard feature, placing the burden on the Unix providers to prove they can keep up with Linux.
[i] The earliest reference I could find to hot patching was a 1998 paper that discussed the technical issues involved in several approaches and outlined an implementation on SPARC/Solaris.