Automation of testing of operating system backup and
recovery
The goal of the project is to fill the gap in testing of the
source repository of the Relax-and-Recover disaster recovery
tool by automating the recovery process and deploying a
Continuous Integration (CI) setup that will automatically test
all the proposed changes to the source repository.
Details and background
Backups are a very important part of IT infrastructure. In
practice, one often needs to preserve the whole setup of important
computers, because reinstalling them after a disaster would be
time consuming and their downtime can be very disruptive and
therefore expensive. Even if all the important data are preserved
by a backup tool, recreating the server from the backup to the
original state can be difficult, as this task is outside the
capabilities of the traditional backup software. This is
especially the case when the server had a complex disk setup. To
automate this task, one needs a slightly different type of
software - a disaster recovery tool.
One very popular tool for disaster recovery on Linux is
Relax-and-Recover
(ReaR). It creates bootable rescue media which can be used to recover
the system from backup, preserving the original configuration
faithfully. It can recreate a large variety of configurations
and integrate with many backup solutions. Given the task, such
tool needs to be very reliable and well tested. One particular
area which deserves testing is the actual recovery
process. While problems during the creation of the backup and
the bootable media are discovered immediately, problems during
the recovery process will be normally discovered only in the
case of emergency, when the original server has been destroyed,
and therefore could be impossible to correct. Such a failure can
lead to huge problems to the user who has been expecting to be
able to recover the system in the case of disaster and now has
only unusable backups instead. The only way to significantly
reduce the likelihood of such an undesirable failure is rigorous
testing of the recovery process. At the same time, testing of
the recovery process is difficult. One needs to create the
rescue image and then to boot from them and guide the tool
during the recovery process. This is not easy to do in an
automated way and therefore it is not being performed in the
upstream code repository on GitHub, and the package provided by
Red Hat as part of Red Hat Enterprise Linux is tested only in a
limited number of scenarios.
The goal of the project is to fill the gap in testing of
the source
repository of the ReaR tool by automating the recovery
process and deploying a Continuous Integration (CI) setup that
will automatically test all the proposed changes to the source
repository. The project will consist of:
-
research on automating the backup and recovery process,
enhancing the capabilities of the ReaR tool if they are not
currently sufficient
-
survey of the existing CI solutions which could be used to
run this automation
-
integrating the automation with the chosen CI solution and deploying
it to test the GitHub repository (or, alternatively, the package
provided by Linux distributions, such as Fedora).
Possible extensions of the project include:
-
CI testing of the package provided by Linux distribution
such as Fedora (if not done in the task above)
-
testing of integration with network backup tools (Bacula,
Bareos)
-
CI testing using a static analysis tool like Shellcheck
-
writing tests for more scenarions (like more complex storage setups)
Literature
W. Preston: Backup & Recovery. O'Reilly, 2009. http://shop.oreilly.com/product/9780596102463.do