UKAU

Security, Compliance & Awareness

Security Management: Short Circuiting The DRP Process

The amount of time and money that some organisations spend in creating a DRP has always been something of a mystery to us at Safecoms, particularly the time spent on the “Business Impact Analysis” (or “BIA” as its generally referred to).  We regularly come across organisations whose DRP process has become bogged down in “months” of BIA.  If this is happening, it suggests to us either
•    organisations have misunderstood the purpose of the BIA; or
•    organisations are afraid to challenge “specialists” who shroud the BIA in mystery and claim that it is necessarily a fiendishly complex and lengthy process

The BIA is a critical early step in the DRP process, and needs to be done properly as it provides the context for the rest of the DRP project.  However, it does not need to take very long.  Our experience of developing BIAs for large and complex organisations is that it should be perfectly possible to complete a BIA within a couple of weeks (often a lot less).


Purpose of BIA
The purpose of a BIA is to establish three things:
i.    the “maximum allowable outage” (MAO) for each application and each component of the core infrastructure (LAN, SAN, WAN, Domain Controller etc.)
ii.    the relative prioritisation of applications / core infrastructure components that have the same MAO
iii.    upstream and downstream dependencies (e.g. recovery of the warehouse system presupposes the ERP system has already been recovered)

Some organisations also use the BIA exercise as an opportunity to gather reams of additional information about the IT infrastructure, users, and the alignment of IT with business processes.  However, whilst this may be useful for other purposes, it is rarely necessary for the purposes of the BIA.  The first thing to do to reduce the time spent on the BIA is to ensure that the activity remains tightly focused on achieving the three objectives enumerated above – no more and no less.


What Is An MAO?
The MAO can be defined as “the period of time for which this application / core infrastructure component can remain unavailable without having a severe negative impact on the business”.  In other words, a civil engineering company might decide (for instance) that it could continue to function quite happily without its payroll system for two weeks (manual records could be kept, cheques could be issued to weekly paid workers etc.) but that if its CAD systems were unavailable for more than 48 hours the consequences for the business would be very severe indeed, with major projects grinding to a halt, financial penalty clauses being invoked for late delivery etc.  This would mean that the MAO for the payroll system would probably be set at two weeks (or even longer), but that the MAO for the CAD systems would be set at 48 hours.

Once the MAO for each application and core infrastructure component has been set, the rest of the DRP exercise is about creating a workable plan to ensure that each component part of the IT infrastructure can be restored within its specified MAO.


Range Of MAO’s
At the beginning of the BIA exercise the organisation will typically define a range of around 5 potential MAO’s – e.g. 24 hours, 48 hours, one week, two weeks, one month.  Each application and core infrastructure component is then allocated its MAO from within this range.

From a practical perspective, there is generally little or no operational benefit to be gained by attempting to be more granular than this.  A very important point should be borne in mind here.

The point is this.  Disaster Recovery Plans are about dealing with just that – disasters.  They are not about navigating through “difficult” or “inconvenient” situations, but are concerned with ensuring survival in a catastrophic situation.  In our experience, too many organisations lose sight of this most fundamental of points when developing their DRPs, and significant amounts of time and intellectual energy are spent debating fine distinctions about the difference of impact of loss of a system for 60 hours compared with 48 hours – distinctions which would seem spectacularly irrelevant if you were wading through a meter of flood water in your server room, or were trying to find out how many of your colleagues had died in the gas explosion that had ripped the guts out of your office building.


Deciding The MAO For A System
How do you decide which of your range of possible MAO’s should be applied to any given application / core infrastructure component?  Typically this will be approached using a “scorecard” which looks at the level of business impact of the non availability of that particular system over different time periods (those time periods being the range of possible MAO’s).

The level of business impact will be looked at from several different perspectives.  Whilst these will vary from organisation to organisation, the following set could be regarded as fairly typical:

•    Outputs (i.e. impact of the non availability of this system on the ability of the organisation to deliver whatever it is that the organisation delivers)
•    Resources (i.e. impact of the non availability of this system on staff or plant / machinery)
•    Reputation (i.e. extent to which non availability of this system could result in damage to the organisation’s reputation – ranging from bad publicity to major legal liability)
•    Stakeholders (i.e. extent to which non availability of this system could result in loss of shareholder value etc.)
•    Compliance (i.e. extent to which non availability of this system could result in legal, regulatory or internal non compliance)

Different levels of impact under each of these perspectives will have a “score” associated with them.  The score will typically be from 1 to 5.  These scores (and their associated criteria) are defined at the outset of the exercise and are applied consistently for each system: for example, from the Outputs perspective, a score of 5 might be defined as a “greater than 20% failure to meet production quotas”, with a score of 4 representing greater than 12% failure, and so on.  Wherever possible, quantitative criteria should be used.

At the conclusion of the exercise you have a completed scorecard for each system (application or core infrastructure component) looking something like this:

APPLICATION: WAREHOUSE MANAGEMENT SYSTEM
  24 hrs 48 hrs 1 week 2 weeks 1 month
Outputs
2 3 4 5 5
Resources 1 1 3 4 5
Reputation 0 0 0 1 3
Stakeholders 0 0 0 0 2
Compliance 0 0 0 1 3

 

The MAO will usually be set to equate to the elapsed time when the first score of 4 occurs for that particular application (which in the case of the example scorecard given above is at one week).


Filling Out The Scorecard
It would be possible to argue forever about the score to be inserted in each box on the scorecard.  If permitted to do so, the development of MAO’s through filling out the scorecard can turn into an eternal game of “what if’s”, “hypotheticals”, argument and counter argument.  And in some organisations that appears to be exactly what happens.

If ever there was a time and place to apply “the 80 / 20 rule”, this is it.  By definition, there is no “right” answer here.  Even if you gave a team of 10 business analysts three months to address the question “if the warehouse system was out for a week, by what percentage would production fall?” ultimately all you would end up with would be their “best guess”.  For sure, it would be a best guess with a huge amount of supporting data, mathematical modelling and analysis behind it, but it would remain a best guess.  And our experience suggests that answer will generally be exactly the same as the answer you would get if you took two or three of the more experienced operations managers plus an accountant who knows the company’s cost and revenue models inside out, sat them around a table for an hour, and got them to come up with a consensus answer.

After all, in virtually all instances the MAO scorecard does not require a precise answer, but instead simply requires you to decide which range of values is most likely to apply.  Similar to a multiple choice test, in most cases you will be able to eliminate several of the possibilities straight away, and you will generally be choosing between two possible ranges: for example, “if this system is down for a week, is the lost revenue more likely to be somewhere between $500K and $700K, or between $700K and $900K?”


Dependencies
As noted at the beginning of this article, one of the purposes of the BIA is to identify dependencies between different systems within the IT infrastructure.  This in turn can result in a shortening of the MAO exercise, particularly when it comes to defining the MAO’s for core infrastructure components.

For example, you may have identified that recovery of the accounting system is dependent on the LAN, SAN and database server being available.  If the accounting system has been given an MAO of 48 hours, then (by definition) the MAO’s for the LAN, SAN and database server must also be 48 hours (or shorter).  There is no real need for further detailed analysis of the MAO’s of the core infrastructure components – they will generally be effectively pre-defined by the MAO’s of their dependent applications.

The same logic applies to applications, where one application is dependent upon another.  For example, if it is found that the ERP system is dependent upon the warehouse system then that will immediately reduce the range of possible MAO’s for the warehouse system, as it must (by definition) be shorter than (or equal to – see below under “prioritisation) the MAO for the ERP system.  By following this logic, the BIA exercise suddenly becomes a whole lot quicker.


Prioritisation
You will generally find at the end of the MAO exercise that there will be several applications and / or core infrastructure components with the same MAO.  At this point, these systems with the same MAO rating need to be prioritised.  The prioritisation will (obviously) first be driven by any dependencies between those systems.

If (having worked through the dependencies) there are still “standalone” systems within the same MAO range that have not yet been given a priority, then in most cases the collective view of a small number of experienced operations managers will rapidly be able to provide as good an answer as any as to the relative operational importance of those systems.  Little (if any) value add will be gained from further detailed analysis to determine relative priority.


Conclusion
Some of the points made in this article would probably be regarded as “heresy” by traditional DRP purists.  However, at a practical level Safecoms’ experience has been that a BIA that is conducted using the approach outlined above will provide a set of results that are at least as valid (potentially more so) as those contained in a BIA that is arrived at after months of sterile debate and number crunching.

Of course, the BIA is only the first step in the DRP process – but it is generally where organisations have a tendency to spend huge amounts of time and money without adding much value to the overall process.  If you would like to discuss in more depth how you can prevent your organisation’s DRP from becoming a sink hole for time and money, why not call us?

InfoAware

InfoAware is our training solution for User Awareness, IT Staff Awareness and Information Governance.  Covering all the relevant topics required by international standards such as ISO 17799, it comprises a multimedia Video/DVD and Learning Management System. 

InfoAware is easy to deploy over the Intranet and can be used for induction and refresher training courses.  InfoAware takes users through a multi-choice question and answer session on each topic and allows organisations to deploy additional training material and policy documents to all staff.

More details can be found at www.infoaware.com

Contact

Safecoms has operations in the UK and Australia, with representatives in the USA, Asia and the West of Scotland.

If you would like someone from Safecoms to contact you please email us at info@safecoms.co.uk