Chapter 37. Backup and Recovery Methods

Objectives

  • Identify and prioritize data that needs backup
  • Employ different kinds of backup methods, depending on the situation
  • Establish efficient backup and restore strategies
  • Use different backup utilities, such as 
    • cpio
    • tar
    • gzip
    • bzip2
    • xz
    • dd
    • rsync
    • dump
    • restore
    • mt
  • Describe the two most well known backup programs
    • Amanda
    • Bacula

Why Backups

  • Data is valuable
  • Hardware fails
  • Software fails
  • People make mistakes
  • Malicious people can cause deliberate damage
  • Unexplained events happen
  • Rewinds can be useful

Backup methods

  • Full
    • Backup all files on the system
  • Incremental
    • Backup all files that have changed since the last incremental or full backup
  • Differential
    • Backup all files that have changed since the last full backup
  • Multiple level incremental
    • Backup all files that have changed since the previous backup at the same or a previous level
  • User
    • Backup only files in a specific user's directory

Backup Strategies

Sample of an strategy

  • Use disk 1 for a full backup
  • Use disks 2-5 for incremental backups on Monday - Thursday
  • Use disk 6 for full backup on second Friday
  • Use disks 2-5 for incremental backups on second Monday-Thursday
  • Do not overwrite disk 1 until completion of full backup on tape 6
  • After full backup to disk 6, move disk 1 to external location for disaster recovery
  • For next full backup (next friday) get disk 1 and exchange for disk 6

A good rule of thumb is to have at least two weeks of backups.

Backup Utilities

  • cpio
  • tar
  • gzip
  • bzip2
  • xz
  • dd
    • This powerful utility is often used to transfer raw data between media. It can copy entire partitions or entire disks
  • rsync
    • Can synchronize directory subtrees or entire filesystems across a network, or between different filesystems locations on a local machine
  • dump and restore
    • Old utilities that read from the filesystem directly
  • mt
    • Useful for querying and positioning tapes before performing backups and restores

CPIO

Copy in and out is a general file archiver utility that has been around since the earliest days of UNIX and was originally designed for tape backups.

Samples

Create an archive : use -o or --create

$ ls | cpio --create -O /dev/st0

​Extract form an archive: use -i or --extract

$ cpio -i somefile -I /dev/st0

​List contents of an archive : use -t or --list

$ cpio -t -I /dev/st0

You can specify the input (-I device) or output (-O device) or use redirection on the command line.

The -o or --create option tells cpio to copy files out to an archive. cpio reads a list of file names (one per line) from standard input and writes the archive to standard output.

The -i or --extract option tells cpio to copy files in from an archive, reading the archive from standard input. If you list file names as patterns (such as *.c) on the command line, only files in the archive that match the patterns are copied from the archive. If no patterns are given, all files are extracted.

The -t or --list option tells cpio to list the archive contents. Adding the -v or --verbose option generates a long listing.

Using tar for Backups

Tar is easier to use than cpio

  • When creating a tar archive, for each directory given as an argument, all files and subdirectories will be included in the archive
  • When restoring it reconstitutes directories as necessary
  • It even has a --newer option that lets you do incremental backups
  • The version of tar used in Linux can also handle backups that do not fit on one tape or whatever device you use

Samples

Create an archive using -c or --create

$ tar --create --file /dev/st0 /root
$ tar -cvf /dev/st0 /root

Create with multi volume option using -M or --multi-volume if your backup won't fit on one device (you will be prompted to put the next disk when needed)

$ tar -cMf /dev/st0 /root

​Verify files with compare option using -d or --compare (After completing a backup we can use this option to verify if the generated file is correct)

$ tar --compare --verbose --file /dev/st0
$ tar -dvf /dev/st0

​Using tar for Restoring files

The -x or --extract option extracts files from an archive (all by default). However it can extract only the required directories narrowing the extraction with parameters

The -p or --same-permissions options ensures files are restored with their original permissions

The -t or --list option lists, but does not extract, the files in the archive

Examples

Extract from an archive

​$ tar --extract --same-permissions --verbose --file /dev/st0
$ tar -xpvf /dev/st0
$ tar xpvf /dev/st0

​Specifying only specific files to restore

$ tar xvf /dev/st0 somefile

List the contents of a tar backup

$ tar --list --file /dev/st0
$ tar -tf /dev/sy0

​Incremental Backups with tar

In order to create this type of backups with tar we have the next parameter

  • -N or --newer
  • --after-date

Sample

$ tar --create --newer '2011-12-1' -vf backup1.tar /var/tmp
$ tar --create --after-date '2011-12-1' -vzf backup1.tar /var/tmp

​Either way will create a backup1.tar file containing the files modified after December 1 of 2011.

Compression: gzip, bzip2, xz and Backups

There are different types of tool to perform file compression

  • gzip
    • Uses Lempel-Ziv Coding (LZ77) and produces .gz files
  • bzip2
    • Uses Burrows-Wheeler block sorting text compression algorithm and Huffman coding to produce .bz2 files
  • xz 
    • Produce .xz files and also supports legacy .lzma format

Decompresstion times are almost similar to compression times.

For daily files, it is common to use gzip as it is extremely fast.

For large files and for archiving purposes the other two methods are often used.

The .zip format is rarely used in Linux except to extract files from other operative systems.

The compression utilities are very easily (and often) used in combination with tar

Samples

Compression

$ tar zcvf source.tar.gz source
$ tar jcvf source.tar.bz2 source
$ tar Jcvf source.tar.xz source

Decompression

​$ tar xzvf source.tar.gz
$ tar xjvf source.tar.bz2
$ tar xJvf source.tar.xz

​Or even simpler

$ tar xvf source.tar.gz

Since modern version of tar can detect the compression mechanism

dd

dd is one of the original UNIX utilities and is extremely versatile. Without options it does a very low level raw copying of files or even whole disks. Capable of doing many kinds of data conversions during the copy, with many options to control offsets, number of bytes, block sizes etc.

dd is often used to read fixed amounts of data from special device nodes such as /dev/zero or /dev/random. The basic syntax is

$ dd if=input-file of=output-file options

If the input or output files are not specified, the default is to use stdin and stdout. Doing

$ dd --help

Samples

Create a 10 MB file filled with zeros

$ dd if=/dev/zero of=outfile bs=1M count 10

Backup an entire hard drive to another (raw copy)

$ dd if=/dev/sda of=/dev/sdb

Create an image of a hard disk (which could later be transferred to another hard disk)

$ dd if=/dev/sda of=sdadisk.img

Backup a partition

$ dd if=/dev/sda1 of=partition1.img

Use dd in a pipeline

$ dd if=ndata conv=swab count=1024 | uniq > ofile

​rsync

Remote Synchronize is a tool used to transfer files across a network (or between different locations in the same machine) as in :

$ rsync [options] source destination

Either the source or the destination can take the form of [target:path] where target can be in the form of [user@host]

$ rsync file.tar someone@backup.mydomain:/usr/local
$ rsync -r a-machine:/usr/local b-machine:/usr
$ rsync -r --dry-run /usr/local /BACKUP/usr

The --dry-run option first check if the operation will be executed correctly, so it is highly recommended to use this command first before using the actual synchronization.

We can use the -r option in order to recursively make rsync to walk down the directory tree copying all files and directories below the one listed as sourcefile. Thus a very useful way to backup a project directory might be similar to:

​$ rsync -r project-X archive-machine:archives/project-X

A simple and efficient way to backup is to simply duplicate directories or partitions across a network with rsync command and to do so frequently.

Level 0 Backup with dump

Here's a very simple example of a level 0 complete dump of a partition /dev/sda2 mounted at /boot_master

$ sudo dump -0uf /tmp/boot_backup /boot_master

In order to check the backup information

$ cat /etc/dumpdates

​restore

Used to read archives, tapes or files wuch were created by dump. For example to restore all files that were dumped relative to the current directory

$ sudo restore -rvf /tmp/boot_backup

​Useful options to restore:

  • -r
    • Restores everything. The dumped material is read and the complete contents are loaded into the current directory
  • -t
    • Files and directories specified are listed on standard output if the reside on the backup. If no file argument is specified, the root directory on the backup is listed. No actual restore is performed with this option
  • -x
    • Pinpoint the exact folder or file to be extracted from the backup
  • -i
    • Allows interactive restoration of the backup

Note: We can not use -r and -x at the same time.

mt

Used to control magnetic tape devices. Only root user can use this command

Syntax

mt [-h] [-f device] operation [count] [arguments...]

Where

  • -h
    • Is for displaying usage
  • -f device
    • To specify the tape device
  • operation 
    • One fo the tape operations
  • count
    • Used for some repeatable operations
  • arguments
    • Used for some operations

Samples

Show status information about the tape device

$ mt status

Rewind the tape

$ mt rewind

Erase the tape

$ mt erase

Move forward to the end of the current archive

$ mt fsf

​Backup Programs

  • Amanda
  • Bacula
  • Clonezilla

Laboratories