Any seasoned IT pro can recount stories where seemingly insignificant components or very rare events brought down critical systems. Perhaps it was the failure of a core switch and its redundant twin at the same time, an unimaginable scenario which resulted in every user being unable to connect to anything.
Maybe it was the failure of an entire MPLS network caused by a distant fiber cut which made both the primary and recovery datacenters unavailable. Or maybe it was a storage system that failed due to a software bug that also stopped it from being able to fail over to its replication partner. Murphy’s law is alive and well in the technical infrastructures of financial institutions!
When asked to describe their technical business continuity and disaster recovery plans, many institutions quickly respond that they back up their servers regularly, that they replicate critical data to a second location, or that they rely on Internet-based systems that provide true redundancy. While these are all great safeguards to have in place, they can lead to a false sense of security if not reinforced by scenario-based plans written with an understanding of the technical interdependencies of all of the related systems.
Continuity plans should include an inventory of every component involved in a critical process and should consider what the plan would be should each component fail. If a file server is very critical to all employees in all branches, the continuity plans should consider not just the server but also the storage devices connected to the server, the network connection to the server, the switch that connects the network connection to the rest of the network, the routers which route the server traffic to different branches, and the networks within the branches. Performing a thorough analysis of each of these components may lead to taking proactive steps that can save an institution from a prolonged outage in the future should one of these components fail.
While it seems that it would take a lot of time and effort to identify and analyze every component for critical systems, it is actually best done as a tabletop exercise that includes the business owners of a system, the lead engineers from IT, and a manager who can translate between the two groups and document the results. In the end, the results of this exercise can be used to build justification for any additional recovery components needed.
Whether you are interested in performing an advanced technical business continuity tabletop exercise to attack Murphy head-on or are simply trying to start writing your first BCP plan, we can meet you where you are and help you take the next steps forward. Email us at support@bedelsecurity.com to get started!
Additional Resources:
What Benefit is there in a Business Impact Analysis
https://www.bedelsecurity.com/blog/benefit-business-impact-analysis
Business Continuity Planning
https://www.bedelsecurity.com/blog/business-continuity-planning
6 Ways to be Better Prepared for an Incident
https://www.bedelsecurity.com/blog/6-ways-to-be-better-prepared-for-an-incident
Getting the Most out of your Business Impact Analysis
https://www.bedelsecurity.com/blog/getting-the-most-out-of-your-business-impact-analysis
Pandemic Planning
https://www.bedelsecurity.com/blog/pandemic-planning
5 Tips for Building an Effective Incident Response Plan
https://www.bedelsecurity.com/blog/5-tips-for-building-an-effective-incident-response-plan