The topic of Business Continuity isn't exactly the most riveting topic in the world, but its an area of IT that's close to my heart and one that seems to be regularly forgotten, misunderstood, or simply just poorly implemented. After working with a variety of organisations, the problem appears to me to just be a lack of communication between those running the business and the technical staff responsible for the implementation and maintainance of the mechanisms that greatly influence the potential of a disaster occuring, the time taken to recover from it, and ultimately, the overall impact to the business. All too often, heads are buried in the sand until there's a disaster and then there's hell to pay!
Business continuity is not something implemented at the time of a disaster; Business Continuity refers to those activities performed daily to maintain service, consistency, and recoverability.
So, where do you start? Well, firstly, the field of Business Continuity is huge, but the purpose of this post is to give enough information for you to go do something. By all means, once you've got the basics sorted, then by all means go read up further - but right now, lets just focus on understanding the basic concepts, learning some of the jargon, and some helpful hints to convince people in your company that there's more to Business Continuity than having some backup tapes in a cupboard!
Business Continuity? Disaster Recovery?
Understanding what you're doing is important, but all too often I've seen organisations spend far too much time debating what they mean by Business Continuity, rather than agreeing some basic terms and getting on with it! In this post, we'll work on the principle that Disaster Recovery is the act of preparing for recovery or continuation of critical IT components after a disaster (natural or otherwise) and that Business Continuity is the larger process of ensuring that all aspects of the business keep functioning. For example, a Business Continuity Plan might involve identifying the risks faced by critical business functions and identifying the IT components they rely upon. Meanwhile, Disaster Recovery is focussed on ensuring that there's sufficient fault tolerance and backups to meet the requirements of the Business Continuity Plan.
What do I need to do?
Your manager might prefer you call it a Business Impact Assessment (BIA), but I'd suggest you probably need to identify all the business functions within your organisation, assign each a level of importance and then work out what IT components these systems depend on. You can make this job as big and as complicated as you like, but at the end the day, you need to gain agreement with the business about what's important to them. The two fundamental measurements in Business Continuity are the:
- Recovery Time Objective (RTO): How quickly must we restore a business function after a disruption or disaster before it causes "unacceptable consequences" to the dd
- Recovery Point Objective (RPO): The maximum window of time before a disruption or disaster during which data may be lost.
The usual mistake is that IT ask the business what level of "uptime" is required and how much data loss is acceptable. Of course, the answer is almost always "100% uptime with zero data loss" - usually without any real understanding of what that might involve either technically or financially! A useful way of getting the facts is to ask the question in a different way:
If we had a major disaster right now, how long would our business survive without this particular business function? An hour? A day? A week? What would the impact be if we lost the last minute of transactions? What about the last hour? The last day?
Now all you have to do is review the IT components that each business function depends upon and ensure you can meet the agreed Recovery Time Objective (RTO) and Recovery Point Objective (RPO). That might sound like a cop out, but there's a vast array of documented solutions out there for introducing fault tolerance to an IT system. However, its very likely that you'll need to take a serious look at how quickly you could recover from a disaster. Also, once you've identified a critical business function and analysed all the IT components it depends on, you'll often find out that there's some long forgotten line-of-business application that's sitting on a single ancient server in the corner of the server room that turns out to be critical to the success of the business!
Why should I go to all this hassle?
From an IT perspective, the key purpose of the Business Impact Assessment is to understand the real requirements of the business in terms of fault tolerance and data backups. More importantly, done right, its an opportunity to clearly document the expectations of the business in a measurable fashion. This empowers the IT team to:
- Correctly design IT systems to ensure they have "sufficient and necessary" levels of fault tolerance.* Put the right backup mechanisms and schedules in place to ensure you can recover from a disaster.
- Justify the IT costs associated with additional servers, storage, vendor support contracts, etc.
- Sleep sound at night knowing that if disaster does strike, then you have it in black & white what was expected of you.