The USA felt the brunt of the world’s three costliest natural disasters in 2018 with damages totaling more than $46 billion. The deadly Camp Fire in California was number one, with Hurricanes Michael and Florence coming in second and third place. Those disasters may have monopolized the headlines, but there are so many more homes and businesses destroyed each year by tornadoes, flooding, and fires.
With storm season right around the corner, many firms in the hurricane belt spend the first quarter of each year testing their business continuity/disaster-recovery plan (BCP/DR). All too often, however, firms assume they are in a “safe zone” and fail to adequately plan, prepare, and test.
The reality is that no firm is in a “safe zone.” Natural disasters themselves are not necessarily what will put your business in a risky situation. These are the top causes of data loss or downtime during such events:
- Hardware failure (45%)
- Power loss (35%)
- Software failure (34%)
- Data corruption (24%)
- External security breaches (23%)
- Accidental user error (20%)
The real costs associated with such data loss or downtime include:
- Reputational risk: Your clients rely on you to be operational and available. Incurring a significant outage implies a lack of planning and lack of proper infrastructure.
- Loss of productivity: If your payroll is $200,000 per month, every business day of downtime could cost you approximately $11,500.
- Legal risk: There are critical functions that must be performed within your practice. Certain tasks have an extremely high level of risk associated with them if you should miss even one (military search, hearing attendance, foreclosure sale attendance, etc.).
While there are various statistics available on the subject, some studies indicate that 90% of companies without an effective disaster-recovery plan who suffer a major data disaster are out of business within one year.
A Cloud Comparison
For many years, firms have been apprehensive to use the mysterious “cloud” as a strategy in their BCP/DR plan or overall data management, largely due to perceived compliance concerns or a general lack of understanding as to how to choose the right solution.
There are many drivers that may cause you to think about cloud computing as the best solution. They typically include:
- Risk reduction
- Agility (mobility)
Many are skeptical of cloud computing because of assumptions that it is less secure or carries greater risk. However, this theory can only be considered true if you have completed a direct and comprehensive comparison between the cloud provider’s environment and your on-premises infrastructure. Factors to be compared include:
- Technological components
- Risk-management processes
- Preventative, detective, and corrective controls
- Governance and oversight processes
- Resilience and continuity capabilities
- Multi-factor authentication
Until you have had an expert truly weigh your internal environment against the cloud, it would be premature to assume one is safer than the other.
What’s Your Plan?
Regardless of whether you choose the cloud as part of your strategy or not, you need an effective plan. When putting together an effective BC/DR solution, you must start with the basics:
- Know specifically what assets are important (data and processing).
- Consider the current location of assets (on-premises, co-location facility, cloud service provider).
- Understand the details of the network connection between the assets and the processing sites.
Having a reliable cloud computing site that you cannot reach because your ISP has failed does not provide you the coverage you need
Know your requirements and understand your environment: Whether you handle your own backups, use a cloud service provider (CSP), or a combination of both, your objective is to ensure you are protected against the risk of data not being available or business processes not functional, leading to a breach of your service level agreements, lost revenue, and damaged client relationships. It is important that you understand the specific requirements set forth by your clients. They include:
- Recovery Point Objective (RPO), which helps determine how much information must be recovered and restored. It includes asking questions such as, “Is it okay to have quick access to your case data and documents, even if your non-case-related documents are not available for several days or are lost altogether?” What do your clients require?
- Recovery Time Objective (RTO) is a measure of how quickly you need each system to be up and running in the event of a disaster or critical failure.
- Recovery Service Level (RSL) is a percentage measurement of how much computing power is necessary based on the percentage of the production system needed during a disaster.
Data Replication: Maintaining an up-to-date copy of the required data at a different location can be done on a few technical levels and with varying degrees of granularity. It is important to know your replication requirements. For example, data can be replicated at the block level, file level, or database level. Replication can be in bulk, on the byte level, via file synchronization, database mirroring, daily copies, etc. Each alternative impacts your RPO/RTO and has varying costs including bandwidth requirements.
Functionality Replication: This includes the ability to re-create processing capabilities at a different location. Depending on the risk to be mitigated and the scenario that’s chosen, this could be as simple as selecting an additional deployment zone or as involved as performing an extensive rearchitecting. Examples of simple cases are environments that are already heavily virtualized. The relevant VM images can then simply be copied, where they would be ready for service restoration on demand.
An ideal infrastructure cloud service provider will likely have the application architecture described and managed in an orchestration tool or other cloud infrastructure management system. With these, replicating the functionality can be a simple activity.
The worst recovery-elapsed time is probably when functionality is replicated only when disaster strikes. A better solution is the active-passive form, where resources are held on standby. In active mode, the replicated resources are participating in the production.
Planning, Preparing and Provisioning: This is the functionality around processes that lead up to the actual DR failover response. The most important factor in this category is adequate monitoring so that more time is available.
Failover Capability: Appropriate load balancing is required to ensure that redirection of the user service requests occurs properly and in a timely manner.
It is easy to see why many firms elect to make the cloud part of their solution. According to the 2017 Legal Technology Survey from the American Bar Association, cloud usage grew more than 40% from 2016 to 2017, from 37% to just over 52%. If you are ready to make that move, there are some things you need to consider.
- Assessing the risks associated with a cloud service provider (CSP)
- The elasticity of the CSP: Can the CSP provide all the resources if BCDR is invoked?
- Contractual issues: Will any new CSP address all contractual issues and SLA requirements?
- Available network bandwidth for timely replication of data.
- Available bandwidth between the impacted user base and the BCDR locations.
- Legal and licensing constraints that could prohibit the data or functionality to be present in the backup location.
- Common pitfalls of cloud computing:
- On-premises apps do not always transfer since many older apps were not developed with cloud-based services in mind making it difficult to “forklift” them to the cloud with minimal or no changes.
- Lack of training and awareness: New development techniques and approaches require training and willingness to utilize new services. When cloud-based environments are required/requested, this may introduce challenges with IT staff.
- Lack of documentation and guidelines: Best practices require developers to follow relevant documentation and methodologies. Given the rapid adoption of evolving cloud services, this has led to a disconnect between the CSP and application developers on how to utilize, integrate, or meet vendor requirements.
- Complexities of integration: Integrating new applications with existing ones is a key part of transitioning to the cloud. When developers and operational resources do not have open access to supporting components and services, integrations can be complicated, and troubleshooting becomes difficult.
- Make sure your CSP has service level agreements that align with your needs:
- Availability (for example, 99.99% of services and data)
- Performance (expected response times versus maximum response times)
- Security and privacy of the data (encrypting all stored and transmitted data)
- Logging and reporting (audit trails of all access and the ability to report on key requirements and indicators)
- DR expectations (worse-case recovery commitment, RTOs, the maximum period of tolerable disruption)
- Location of the data (ability to meet requirements or consistent with local legislation)
- Data format and structure (data retrievable from the provider in a readable and intelligent format)
- Portability of the data (ability to move data to a different provider, or to multiple providers)
- Identification and problem resolution (help desk/service desk, call center, or ticketing system)
- Change-management process (updates or new services)
- Exit strategy with expectations on the provider to ensure a smooth transition
The cloud is the future, but it must be embraced wisely.