A recovery operation takes place following an outage, security incident or other disaster that takes an environment down or compromises it in a way that requires restoration. Recovery strategies are important because they have a big impact on how long your organization will be down or have a degraded environment, which has an impact on the company’s bottom line. Note that this section focuses on strategies rather than tactics, so be thinking from a design perspective, not from a day-day-day operational perspective.
- Backup storage strategies. While most organizations back up their data in some way, many do not have an official strategy or policy regarding where the backup data is stored or how long the data is retained. In most cases, backup data should be stored offsite. Offsite backup storage provides the following benefits:
- If your data center is destroyed (earthquake, flood, fire), your backup data isn’t destroyed with it. In some cases, third-party providers of off-site storage services also provide recovery facilities to enable organizations to recover their systems to the provider’s environment.
- Offsite storage providers provide environmentally sensitive storage facilities with high-quality environmental characteristics around humidity, temperature and light. Such facilities are optimal for long-term backup storage.
- Offsite storage providers provide additional services that your company would have to manage otherwise, such as tape rotation (delivery of new tapes and pickup of old tapes), electronic vaulting (storing backup data electronically), and organization (cataloging of all media, dates and times).
- Recovery site strategies. When companies have multiple data centers, they can often use one as a primary data center and one another as a recovery site (either a cold standby site or a warm standby site). An organization with 3 or more data centers can have a primary data center, a secondary data center (recovery site) and regional data centers. With the rapid expansion of public cloud capabilities, having a public cloud provider be your recovery site is feasible and reasonable. One key thing to think about is cost. While cloud storage is inexpensive and therefore your company can probably afford to store backup data there, trying to recover your entire data center from the public cloud might not be affordable or fast enough.
- Multiple processing sites. Historically, applications and services were highly available within a site such as a data center, but site resiliency was incredibly expensive and complex. Today, it is common for companies to have multiple data centers, and connectivity between the data centers is much faster and less expensive. Because of these advances, many applications provide site resiliency with the ability to have multiple instances of an application spread across 3 or more data centers. In some cases, application vendors are recommending backup-free designs in which an app and its data are stored in 3 or more locations, with the application handling the multi-site syncing. The public cloud can be the third site, which is beneficial for companies that lack a third site or that have apps and services already in the public cloud.
- System resilience, high availability, quality of service (QoS) and fault tolerance. To prepare for the exam, it is important to know the differences between these related terms:
- System resilience. Resilience is the ability to recover quickly. For example, site resilience means that if Site 1 goes down, Site 2 quickly and seamlessly comes online. Similarly, with system resilience, if a disk drive fails, another (spare) disk drive is quickly and seamlessly added to the storage pool. Resilience often comes from having multiple functional components (for example, hardware components).
- High availability. While resilience is about recovering with a short amount of downtime or degradation, high availability is about having multiple redundant systems that enable zero downtime or degradation for a single failure. For example, if you have a highly available database cluster, one of the nodes can fail and the database cluster remains available without an outage or impact. While clusters are often the answer for high availability, there are many other methods available too. For instance, you can provide a highly available web application by using multiple web servers without a cluster. Many organizations want both high availability and resiliency.
- Quality of service (QoS). QoS is a technique that helps enable specified services to receive a higher quality of service than other specified services. For example, on a network, QoS might provide the highest quality of service to the phones and the lowest quality of service to social media. QoS has been in the news because of the net neutrality discussion taking place in the United States. The new net neutrality law gives ISPs a right to provide higher quality of services to a specified set of customers or for a specified service on the internet. For example, an ISP might opt to use QoS to make its own web properties perform wonderfully while ensuring the performance of its competitors’ sites is subpar.
- Fault tolerance. As part of providing a highly available solution, you need to ensure that your computing devices have multiple components — network cards, processors, disk drives, etc. —of the same type and kind to provide fault tolerance. Fault tolerance, by itself, isn’t valuable. For example, imagine a server with fault-tolerant CPUs. The server’s power supply fails. Now the server is done even though you have fault tolerance. As you can see, you must account for fault tolerance across your entire system and across your entire network.
For solution, online support and query.