High availability, fault tolerance and disaster recovery
PUBLISHED ON: Wednesday, Jul 5, 2023
#
High availability (HA)
- HA aims to ensure an higher level of operational performance (uptime), during a higher than normal period.
- HA doesn't aim to stop failures - customers may face outages
- HA is not about user experience
- HA aims at maximizing a system's online time
- HA requires redundant servers/infrastructure to be in place ready to switch customers to, in the event of a disaster to minimise downtime
Key percentiles
- 99.9% = 8.77 hours per year downtime
- 99.999% = 5.26 minutes per year downtime
#
Fault tolerance (FT)
- FT is a property that enables a system to continue operating properly in the event of a failure of some of its components.
- FT aims at operating through failures
- Setting up an FT mechanism is expensive and takes longer time to implement.
#
Disaster Recovery (DR)
- DR is a set of policies or tools to enable the recovery or continuancy of vital technology infrastructure and systems following a natural or human-induced disaster.
- DR requires
- pre-planning
- backup premises
- taking regular backups at standby locations (offsite)
- copies of all processes