Defining Operations
So, my formal area of coverage at Forrester is “digital operations.” But what does this mean?
We think of “operations” in terms of NASA mission control: trained professionals intently scrutinizing monitors, looking for signals of interest. Or we think of the stereotypical shop floor assembly line. Business schools offer degrees in operations, and businesses hire Chief Operating Officers. Common definitions focus on operations as execution, efficiency, and “day to day” activities.
There’s no question, especially given the explosion of interest in DevOps, that operations is changing. Operational engineers no longer congregate in network operations centers (NOCs); they are more likely to be found in virtual gatherings, whether conference bridges or Slack channels. Google’s Site Reliability Engineers are incented to eliminate repetitive work, aka “toil.”
But it’s tricky -“operations” is one of those terms that gets more confusing the more you poke at it. COO responsibilities are highly variable across companies. In digital management and business technology, some have proposed that operations will be automated out of existence – that the world needs to go to “NoOps.” But ultimately, what is this “Ops” term in “DevOps” or “NoOps”? We need measurable criteria.
Operations, compared to work in general, is work that is relatively less variable, more repeatable, more interrupt-driven, more concerned with efficiency and optimization, and more scalable in nature. It’s more about preservation as opposed to innovation.
Dr. Murray Cantor proposes an “S-curve” showing work as a spectrum of variability (slide 14). Work ranges from the most uncertain R&D, through medium-certainty engineering (e.g. ongoing application enhancements on well-understood platforms), through the most repeatable “operational” activities.
Digital product development creates systems we expect will provide value consistently. These longer term expectations drive organizational behavior, including operational efforts to 1) ensure that users of digital systems are receiving intended value and 2) correct the behavior of systems that are not meeting expectations.
These persistent aspects of ops are why “NoOps” is ultimately unlikely. (See the 2012 John Allspaw versus Adrian Cockcroft exchange on the topic). Yes, we “automate all the repeatable things.” But the interesting part of Operations lives on the boundary of predictability. Complex socio-technical systems remain prone to error – did you know we narrowly missed the worst disaster in aviation history last week? There are limits to what can and should be automated — an increasingly important topic as cognitive technologies are applied to operational issues.
Digital transformation means that when systems don’t meet expectations, the consequences are increasingly dire. Failures of our distributed digital infrastructure can be life critical — e.g., the recent WannaCry disruption of the UK’s National Health Service. We’ve got to improve our ability to operate it. I tip my hat especially to John Allspaw, who has been exposing many computing professionals to state of the art thinking in safety-critical fields.
My coverage area is broad. Some of it will be to examine (to quote John’s thesis) “teams engaging in understanding and resolving anomalies under high-tempo and high-consequence conditions.” I’ll strive to base such work on the proper foundations.