Objectives
- Identify and prioritize data that needs backup
- Employ different kinds of backup methods, depending on the situation
- Establish efficient backup and restore strategies
- Use different backup utilities, such as
- cpio
- tar
- gzip
- bzip2
- xz
- dd
- rsync
- dump
- restore
- mt
- Describe the two most well known backup programs
- Amanda
- Bacula
Why Backups
- Data is valuable
- Hardware fails
- Software fails
- People make mistakes
- Malicious people can cause deliberate damage
- Unexplained events happen
- Rewinds can be useful
Backup methods
- Full
- Backup all files on the system
- Incremental
- Backup all files that have changed since the last incremental or full backup
- Differential
- Backup all files that have changed since the last full backup
- Multiple level incremental
- Backup all files that have changed since the previous backup at the same or a previous level
- User
- Backup only files in a specific user's directory
Backup Strategies
Sample of an strategy
- Use disk 1 for a full backup
- Use disks 2-5 for incremental backups on Monday - Thursday
- Use disk 6 for full backup on second Friday
- Use disks 2-5 for incremental backups on second Monday-Thursday
- Do not overwrite disk 1 until completion of full backup on tape 6
- After full backup to disk 6, move disk 1 to external location for disaster recovery
- For next full backup (next friday) get disk 1 and exchange for disk 6
A good rule of thumb is to have at least two weeks of backups.
Backup Utilities
- cpio
- tar
- gzip
- bzip2
- xz
- dd
- This powerful utility is often used to transfer raw data between media. It can copy entire partitions or entire disks
- rsync
- Can synchronize directory subtrees or entire filesystems across a network, or between different filesystems locations on a local machine
- dump and restore
- Old utilities that read from the filesystem directly
- mt
- Useful for querying and positioning tapes before performing backups and restores
CPIO
Copy in and out is a general file archiver utility that has been around since the earliest days of UNIX and was originally designed for tape backups.
Samples
Create an archive : use -o or --create
$ ls | cpio --create -O /dev/st0
Extract form an archive: use -i or --extract
$ cpio -i somefile -I /dev/st0
List contents of an archive : use -t or --list
$ cpio -t -I /dev/st0
You can specify the input (-I device) or output (-O device) or use redirection on the command line.
The -o or --create option tells cpio to copy files out to an archive. cpio reads a list of file names (one per line) from standard input and writes the archive to standard output.
The -i or --extract option tells cpio to copy files in from an archive, reading the archive from standard input. If you list file names as patterns (such as *.c) on the command line, only files in the archive that match the patterns are copied from the archive. If no patterns are given, all files are extracted.
The -t or --list option tells cpio to list the archive contents. Adding the -v or --verbose option generates a long listing.
Using tar for Backups
Tar is easier to use than cpio
- When creating a tar archive, for each directory given as an argument, all files and subdirectories will be included in the archive
- When restoring it reconstitutes directories as necessary
- It even has a --newer option that lets you do incremental backups
- The version of tar used in Linux can also handle backups that do not fit on one tape or whatever device you use
Samples
Create an archive using -c or --create
$ tar --create --file /dev/st0 /root $ tar -cvf /dev/st0 /root
Create with multi volume option using -M or --multi-volume if your backup won't fit on one device (you will be prompted to put the next disk when needed)
$ tar -cMf /dev/st0 /root
Verify files with compare option using -d or --compare (After completing a backup we can use this option to verify if the generated file is correct)
$ tar --compare --verbose --file /dev/st0 $ tar -dvf /dev/st0
Using tar for Restoring files
The -x or --extract option extracts files from an archive (all by default). However it can extract only the required directories narrowing the extraction with parameters
The -p or --same-permissions options ensures files are restored with their original permissions
The -t or --list option lists, but does not extract, the files in the archive
Examples
Extract from an archive
$ tar --extract --same-permissions --verbose --file /dev/st0 $ tar -xpvf /dev/st0 $ tar xpvf /dev/st0
Specifying only specific files to restore
$ tar xvf /dev/st0 somefile
List the contents of a tar backup
$ tar --list --file /dev/st0 $ tar -tf /dev/sy0
Incremental Backups with tar
In order to create this type of backups with tar we have the next parameter
- -N or --newer
- --after-date
Sample
$ tar --create --newer '2011-12-1' -vf backup1.tar /var/tmp $ tar --create --after-date '2011-12-1' -vzf backup1.tar /var/tmp
Either way will create a backup1.tar file containing the files modified after December 1 of 2011.
Compression: gzip, bzip2, xz and Backups
There are different types of tool to perform file compression
- gzip
- Uses Lempel-Ziv Coding (LZ77) and produces .gz files
- bzip2
- Uses Burrows-Wheeler block sorting text compression algorithm and Huffman coding to produce .bz2 files
- xz
- Produce .xz files and also supports legacy .lzma format
Decompresstion times are almost similar to compression times.
For daily files, it is common to use gzip as it is extremely fast.
For large files and for archiving purposes the other two methods are often used.
The .zip format is rarely used in Linux except to extract files from other operative systems.
The compression utilities are very easily (and often) used in combination with tar
Samples
Compression
$ tar zcvf source.tar.gz source $ tar jcvf source.tar.bz2 source $ tar Jcvf source.tar.xz source
Decompression
$ tar xzvf source.tar.gz $ tar xjvf source.tar.bz2 $ tar xJvf source.tar.xz
Or even simpler
$ tar xvf source.tar.gz
Since modern version of tar can detect the compression mechanism
dd
dd is one of the original UNIX utilities and is extremely versatile. Without options it does a very low level raw copying of files or even whole disks. Capable of doing many kinds of data conversions during the copy, with many options to control offsets, number of bytes, block sizes etc.
dd is often used to read fixed amounts of data from special device nodes such as /dev/zero or /dev/random. The basic syntax is
$ dd if=input-file of=output-file options
If the input or output files are not specified, the default is to use stdin and stdout. Doing
$ dd --help
Samples
Create a 10 MB file filled with zeros
$ dd if=/dev/zero of=outfile bs=1M count 10
Backup an entire hard drive to another (raw copy)
$ dd if=/dev/sda of=/dev/sdb
Create an image of a hard disk (which could later be transferred to another hard disk)
$ dd if=/dev/sda of=sdadisk.img
Backup a partition
$ dd if=/dev/sda1 of=partition1.img
Use dd in a pipeline
$ dd if=ndata conv=swab count=1024 | uniq > ofile
rsync
Remote Synchronize is a tool used to transfer files across a network (or between different locations in the same machine) as in :
$ rsync [options] source destination
Either the source or the destination can take the form of [target:path] where target can be in the form of [user@host]
$ rsync file.tar someone@backup.mydomain:/usr/local $ rsync -r a-machine:/usr/local b-machine:/usr $ rsync -r --dry-run /usr/local /BACKUP/usr
The --dry-run option first check if the operation will be executed correctly, so it is highly recommended to use this command first before using the actual synchronization.
We can use the -r option in order to recursively make rsync to walk down the directory tree copying all files and directories below the one listed as sourcefile. Thus a very useful way to backup a project directory might be similar to:
$ rsync -r project-X archive-machine:archives/project-X
A simple and efficient way to backup is to simply duplicate directories or partitions across a network with rsync command and to do so frequently.
Level 0 Backup with dump
Here's a very simple example of a level 0 complete dump of a partition /dev/sda2 mounted at /boot_master
$ sudo dump -0uf /tmp/boot_backup /boot_master
In order to check the backup information
$ cat /etc/dumpdates
restore
Used to read archives, tapes or files wuch were created by dump. For example to restore all files that were dumped relative to the current directory
$ sudo restore -rvf /tmp/boot_backup
Useful options to restore:
- -r
- Restores everything. The dumped material is read and the complete contents are loaded into the current directory
- -t
- Files and directories specified are listed on standard output if the reside on the backup. If no file argument is specified, the root directory on the backup is listed. No actual restore is performed with this option
- -x
- Pinpoint the exact folder or file to be extracted from the backup
- -i
- Allows interactive restoration of the backup
Note: We can not use -r and -x at the same time.
mt
Used to control magnetic tape devices. Only root user can use this command
Syntax
mt [-h] [-f device] operation [count] [arguments...]
Where
- -h
- Is for displaying usage
- -f device
- To specify the tape device
- operation
- One fo the tape operations
- count
- Used for some repeatable operations
- arguments
- Used for some operations
Samples
Show status information about the tape device
$ mt status
Rewind the tape
$ mt rewind
Erase the tape
$ mt erase
Move forward to the end of the current archive
$ mt fsf
Backup Programs
- Amanda
- Bacula
- Clonezilla