High Availability versus Disaster Recovery
As one transitions from application architecture to infrastructure (mainly cloud) architecture, you start hearing the terms High Availability and Disaster Recovery thrown around a lot (and thrown around interchangeably!). (Also see, DR in the cloud)
While there is some overlap between the two, in general, they serve different purposes.
- A Highly Available architecture is one that minimizes downtime (for both applications and underlying infrastructure) . It is usually based on adding additional components — such as mirrors and redundant nodes (e.g. an Oracle RAC provides you with the redundancy needed at the database level). Think REDUNDANCY when you think of HA.
- A Disaster Recovery plan comes into play when even High Availability does not cut it. Think of your entire Oracle RAC — with all its redundant nodes — getting wiped out by a disaster (fire, flood, whatever). Think EVERYTHING downtime — including your HA cluster, possibly even the underlying networking infrastructure.
In such a scenario, you will need a step by step approach for recreating your entire production environment. Whether this is done from tape backups or cloud backups, needs to be part of your plan.
Which people, which locations, the exact steps to be taken by each person (roles and responsibilities) — all need to be part of your DR plan.
The Cloud blurs the boundary (between HA and DR)
Prior to the advent of the cloud, everything I said above was true. DR contingency planning starts where HA planning ends.
However, with the cloud, your HA solution can ITSELF serve as a Disaster Recovery Plan.
This happens due to constructs such as Availability Zones and Multi-Region tenancy.
Availability zones allow you to span your HA nodes across different data centers. If one of the data centers is struck by disaster, your HA redundant node in the second datacenter takes over. So — it manages to solve the problem of disaster recovery as well.
However, Availability Zones are not guaranteed to be in different geographic locations — so your redundant data centers can all be sitting within a few miles of one another. This makes it possible for Disaster to influence both your data centers.
However, if you go a step further, you can spread out your Availability Zones across different GEOGRAPHIC REGIONS (e.g. NorthWest would be one REGION and SouthWest would be another region). Now, with your nodes spread across such a wide geographic separation, the chances of disaster striking both nodes is minimal.
DR and HA are both used interchangeably — but mean different things. HA tries to provide uninterrupted uptime for your I.T. asset. DR goes a step further — and takes over when HA cannot hold up (as in the case of a real disaster).
Prior to cloud computing, these two strategies (DR and HA) were actually different — and independent of one another.
With the advent of the cloud, it is possible for your HA strategy to help in the case of Disaster as well. In fact, it costs a fraction of what it would in a non-Cloud environment.
Footnote- Impact of Disaster
- A study (University of Texas, Austin) discovered that 85 percent of businesses (tech and non-tech included) are entirely dependent on up time of their I.T. Systems.
- The longer it takes to restore communications (after the disaster), the more critical the impact on businesses.
Anuj reviews existing DR and backup strategies. In addition, Anuj can help create an internal DR team within your organization.
Train your internal team to become certified cloud experts, so you never need outside help again. Specializing in AWS, Azure and GCP, Anuj uses his hands-on experience to create ‘expert migration teams’ within your organization.