Privacy, security & risk

Measuring Disaster Recovery Maturity

Stephanie Balaouras
Vice President, Research Director
November 16, 2009

Each year for the past three years I've analyzed and written on the state of enterprise disaster recovery preparedness. I've seen a definite improvement in overall DR preparedness during these past three years. Most enterprises do have some kind of recovery data center, enterprises often use an internal or colocated recovery data center to support advanced DR solutions such as replication and more "active-active" data center configurations and finally, the distance between data centers is increasing. As much as things have improved, there is still a lot more room for improvement not just in advanced technology adoption but also in DR process management. I typically find that very few enterprises are both technically sophisticated and good at managing DR as an on-going process.

When it comes to DR planning and process management, there are a number of standards including the British Standard for IT Service Continuity Management (BS 25777), other country standards and even industry specific standards. British Standards have a history of evolving into ISO standards and there has already been widespread acceptance of BS 25777 as well as BS 25999 (the business continuity version). No matter which standard you follow, I don’t think you can go drastically wrong. DR planning best practices have been well defined for years and there is a lot of commonality in these standards. They will all recommend:
•    Executive sponsorship and accountability
•    Staff to support the process
•    A business impact analysis (refreshed regularly)
•    A risk assessment (refreshed regularly)
•    Strategies to mitigate the most probable, high impact risks
•    These strategies documented in actionable plans
•    Plans frequently tested
•    Plans continuously updated
•    Training and awareness
•    Coordination with business continuity efforts (DR is not BC but that's another blog)

ITIL recommends several key performance indicators for IT Service Continuity Management (a.k.a. DR) but I don't find these KPIs to be detailed enough or extensive enough to really measure maturity. They include:
•    Business processes covered with continuity agreements
•    Gaps in disaster preparation
•    Implementation duration
•    Number of disaster practices
•    Number of shortcomings identified during disaster practices.

Like ITIL, most DR standards provide a process framework and describe process best practices but they don't recommend any software tools for process management or any of the technologies (replication, network connectivity, data center configuration etc.) that enable DR.  When I talk with customers, I also look for the following:
•    A recovery data center
•    Hardened data centers (both production and recovery)
•    Adoption of advanced backup and replication (recovery point capabilities) by criticality tier (i.e. mission-critical, business-critical, business-important etc.)
•    Adoption of application failover technologies (recovery time capabilities) by criticality tier (i.e. mission-critical, business-critical, business-important etc.)
•    Adoption of techniques to manage network bandwidth (compression, deduplication, bandwidth throttling, other WAN optimization techniques)
•    Elimination of independent point products for backup and replication / Development of an IT Continuity Services Catalog
•    Level of automation (can you failover a group of interdependent applications and IT systems to a consistent point in time at the recovery data center)
•    Active-active data center configurations (if you have an in-house DR solution)
•    Coordination with enterprise/infrastructure architecture
•    DR considerations embedded with application development and testing
•    All applications and IT systems DR protected
•    Protection of applications and IT systems at remote sites
•    Protection of PCs corporate-wide
•    Of course, enterprises don’t take on DR for the sake of technology, all this effort must be appropriate to the recovery requirements defined by the business and commensurate with risk.

For large enterprises, I also look for the adoption of tools to improve DR process management, like Automated Communication and BC/DR planning software, and I also look for central governance of DR.

I’m interested to hear what other KPIs and metrics enterprises are using to measure DR maturity.

By Stephanie Balaouras

Check out Stephanie's research

You can follow Stephanie on Twitter here


Related Posts in Privacy, security & risk See All