Xelon Blog | Xelon AG

Disaster recovery in virtualized environments | Blog | Xelon AG

Written by Matias Meier | Apr 29, 2020 8:23:00 AM

 

Damage to reputation, loss of confidence and losses in the millions: With IT disasters, all the work of the last few months is lost. It can happen to companies of any size that the servers suddenly stop working and users can no longer obtain services. Depending on the time of day and the industry, the affected company suffers losses ranging from several thousand to millions of francs per downtime day.

Without a reliable contingency plan, the company is on the brink of collapse in the event of natural disasters or unplanned system shutdowns. On average, 7 out of 10 businesses must close down within two years after a serious IT problem. All-round protection is expensive, and many companies decide to cut back or cut back on IT security. Some only have an IT contingency plan for the most important applications and processes. These cost savings reduce reliability and cannot guarantee recovery quality. A virtual infrastructure can solve the security problem and ultimately save you money.

How does virtualization simplify the disaster recovery process?

Virtualization greatly simplifies the recovery process by connecting the workstation, all server capacity and other systems into a virtual machine. Therefore, no physical servers need to be rebuilt for recovery. You transfer virtual machines to another system using the computing power. Virtualization allows you to be independent of physical devices, so you can get back to work soon after an IT disaster - just as if nothing had happened. The only requirements are a stable Internet connection and remote access.

Disaster Recovery Scenarios

Scenario 1: Bare Metal Server

If the physical server fails, recovery takes a long time and is not always smooth. For example, if you have physical servers and you have backed up all of them, you will need to purchase the same number of servers to recover all of your lost data. This is where it becomes problematic: you will often not find identical servers, as the models used may have been purchased a long time ago and are likely to be outdated. If the software of the new equipment differs from the previous one, the recovery will not work. You will also need to find the right drivers, which is not easy and requires careful monitoring of each step of the system recovery process.

3d rendering of modern datacenter with lots of server

 

Scenario 2: On-premise servers operate virtual machines (private cloud)

Imagine that you have four servers - just like in the first scenario. However, instead of physical machines, you have one server and four virtual machines. You make copies of the virtual machines and store them externally. If the server goes down in this situation, you will need to purchase new equipment, restore backups, start the virtual machines, and set up backup configurations to keep your system running properly. This approach provides greater reliability, but is time-consuming and can be tricky for some business areas.

Scenario 3: Public Cloud

This scenario looks similar to the one described above, except that the ESX server is now located outside the office building and you manage it externally.

Benefits of virtualization for business continuity

Continuous availability

Server availability is business critical. When a virtual server dives, a new VM automatically starts using the available IaaS resources. This gives you maximum availability, eliminating the need for a dedicated backup server, additional equipment or software.

If you need to guarantee the functioning of the servers without stopping the applications, clustering between virtual machines requires significantly fewer servers than with a conventional failover cluster.

Independence from hardware

One of the biggest advantages of virtualization is the independence of the recovery equipment from the recovery process itself. Virtual machines condense all necessary information: applications, data, operating system or BIOS. This allows you to restore the server to all parts of the virtual infrastructure and does not require a 100% identical recovery platform.

Hardware Consolidation

It is extremely unlikely that all workloads will fail at the same time and it is often acceptable to temporarily provide slightly lower application performance in the failover setup. The consolidation rate of failover facilities is often twice that of the primary data center. The unexpected result of this workload agility and high hardware consolidation is that organizations can allocate hardware to multiple workloads without major performance limitations. This makes insourcing a disaster recovery model much more economically attractive.

Best practices for disaster recovery in virtualized environments

Disaster recovery means restarting all components of your IT infrastructure after a disruption. In addition, the negative consequences of system failure must be minimized and it is important to ensure that business-critical processes function continuously. If you always want to be prepared for an emergency, you should have a recovery plan.

Design of a disaster recovery plan

Describes the disaster recovery process by creating a clear plan of action for each employee involved in the recovery process in the event of a system failure.

The first thing you should do is determine which systems are most important to your organization and prioritize them. Determine the maximum amount of time your systems can be down and customer service unavailable. Then, determine the maximum tolerable data loss (Recovery Point Objectives) and the Recovery Time Objectives. Determine what it takes to keep the systems that are critical to your business up and running. 

Divides the plan into the following sections: 

  1. Restoring the network
  2. Restoration of the data center 
  3. Virtual machine recovery

Stick to this plan in case of disaster recovery. 

Remember that the implementation of an IT contingency plan may be subject to external factors over which you have no control. This includes both force majeure and man-made circumstances. Therefore, the contingency plan should include solutions for various scenarios such as natural disasters, power outages, damage caused by cyber criminals, or hardware problems. 

Preparation of the disaster recovery site 

Some organizations use a disaster recovery site as an additional tool in case of a system failure. This site helps you replace the company's most important Internet resource in the event of an outage and provides customers with access to applications or business services. Such sites can take different forms: archived copies, projects ready for launch; hot, warm, or cold sites.

Automatic backups and replica

For successful recovery processes, backups and replication of virtual machines have top priority. Backups contain all virtual machine data and must be properly protected. Note that backup processes can take a while. Replicas are 100% copies of your virtual machine and ready for immediate launch in case of system failure. Thanks to special software, data is protected automatically. In contrast, manual backups often result in the loss of certain data or changes.

The right VM network configurations

The production site and the disaster recovery site can be operated in different virtual networks. If this is the case, the virtual machine network configuration settings should be changed when restoring to the disaster recovery site.

Provision of VM memory

For a successful recovery process you need enough free disk space. This is the most important requirement, as it enables sufficient performance. The memory should be separated from other networks.

Regular testing of the emergency plan

If you do not test the system recovery process, the whole plan may not work. So go through the IT emergency plan regularly and make adjustments if necessary. Testing will determine whether the recovery point objectives and recovery time objectives are being met. Even the smallest changes to the IT infrastructure require immediate testing because they can affect the algorithm and the recovery process.

 

Conclusion

Hacker attacks, system failure or natural disasters: IT disasters can affect companies of any size and often cause immense damage. Therefore, companies should have a solid IT contingency plan. The first step in defining a disaster recovery strategy is to understand the recovery requirements. Determine the maximum length of time your systems can be down and customer service unavailable. What is the maximum tolerable data loss (Recovery Point Objectives)? What are the Recovery Time Objectives? When evaluating technology options, consider the scalability factor. The scope of the environment and the number and type of applications to be supported also play a key role in finding solutions and defining service levels. 

Our infrastructure is located in two certified Swiss data centers. Your data is therefore completely subject to the Swiss data protection law.

Planning a system recovery may often bring up unpleasant memories of earlier projects. However, in recent years, disaster recovery options have improved significantly - not least thanks to virtualization. The main benefits of virtualization include continuous availability, independence from physical hardware, and hardware consolidation. Virtualization greatly simplifies the recovery process by combining the workstation, all server capacities and other systems in a virtual device.