IT Service Continuity Management
There are two approaches to IT service continuity: preventive measures, which avoid interruptions to service, and reactive measures, which restore acceptable levels of service in the shortest time possible.
IT Service Continuity Management is responsible for designing prevention and recovery activities offering the necessary guarantees at reasonable expense.
Preventive measures require a detailed prior analysis of risks and vulnerabilities. Some of these will be general in nature: fires, natural disasters, etc. whereas others will be strictly IT related: storage system failures, hackers, viruses, etc.
Preventing general risks adequately depends on close collaboration with Business Continuity Management (BCM) and requires measures involving the organisation's "physical" infrastructure.
ITSCM needs to pay special attention to preventing risks and vulnerabilities to the IT systems. The close collaboration of Security Management is essential in this regard.
The customary protection systems are those that aim to build a fortress around the IT infrastructure by protecting its perimeter. Although essential, this approach is not without its difficulties as it increases the complexity of the IT infrastructure and may in turn be a source of fresh vulnerabilities.
Sooner or later, no matter how efficient our preventive activities have been, it will be necessary to bring recovery procedures into operation.
In general terms, there are three options for service recovery:
- "Cold standby": which requires an alternative site where the live and service environments can be reproduced in not more than 72 hours. This is the appropriate option if the recovery plans estimate that the organisation can maintain its levels of service during this period without the support of the IT infrastructure.
- "Warm standby": which requires an alternative site with active systems designed to allow recovery of critical services within 24 to 72 hours.
- "Hot standby": which requires an alternative site with continuous replication of the data and all the systems active ready to substitute the live structure immediately. Obviously, this is the most expensive option and should only be used when an interruption to IT service has immediate commercial repercussions.
Of course, there is also the alternative of doing "little or nothing", and hoping that things return to normal. However, this alternative is unlikely to recommend itself to anyone browsing this course on ITIL as one would imagine they work in organisations where IT services play an important role :-)