Backup data using rsync
Rsync is a program for synchronizing 2 directory trees across different filesystems even if they are on different computers. It can run its host <> host communications over ssh to keep things secure and to provide key based authentication. Rsync can also do a block level comparison of 2 files and transfer only the parts that have changed which is a huge benefit if you are transferring large files over a slow link.
Suppose you have a directory called source, and you want to back it up into the directory destination. To accomplish that, you’d use:
rsync -a source/ destination/
(Note: I usually also add the -v (verbose) flag too so that rsync tells me what it’s doing). This command is equivalent to:
cp -a source/. destination/
except that it’s much more efficient if there are only a few differences.
Just to whet your appetite, here’s a way to do the same thing as in the example above, but with destination on a remote machine, over a secure shell:
rsync -a -e ssh source/ email@example.com:/path/to/destination/
Trailing Slashes Do Matter…Sometimes
This isn’t really an article about rsync, but I would like to take a momentary detour to clarify one potentially confusing detail about its use. You may be accustomed to commands that don’t care about trailing slashes. For example, if a and b are two directories, then cp -a a b is equivalent to cp -a a/ b/. However, rsync does care about the trailing slash, but only on the source argument. For example, let a and b be two directories, with the file test initially inside directory a. Then this command:
rsync -a a b
produces b/a/test, whereas this command:
rsync -a a/ b
produces b/test. The presence or absence of a trailing slash on the destination argument (b, in this case) has no effect.
Using the --delete flag
If a file was originally in both source/ and destination/ (from an earlier rsync, for example), and you delete it from source/, you probably want it to be deleted from destination/ on the next rsync. However, the default behavior is to leave the copy at destination/ in place. Assuming you want rsync to delete any file from destination/ that is not in source/, you’ll need to use the --delete flag:
rsync -a –delete source/ destination/
Automate using Cron:
One of the toughest obstacles to a good backup strategy is human nature; if there’s any work involved, there’s a good chance backups won’t happen. (Witness, for example, how rarely my roommate’s home PC was backed up before I created this system). Fortunately, there’s a way to harness human laziness: make cron do the work.
To run the rsync-with-backup command from the previous section every morning at 4:20 AM, for example, edit the root cron table: (as root)
Then add the following line:
20 4 * * * rsync -a –delete source/ destination/
Finally, save the file and exit. The backup will happen every morning at precisely 4:20 AM, and root will receive the output by email. Don’t copy that example verbatim, though; you should use full path names (such as /usr/bin/rsync and /home/source/) to remove any ambiguity.
Why backup with rsync instead of something else?
Disk based: Rsync is a disk based backup system. It doesn’t use tapes which are too slow to backup modern systems with large hard drives.
Fast: Rsync only backs up what has changed since the last backup. It NEVER has to repeat the full backup unlike most other systems that have monthly/weekly/daily configurations.
Less work for the backup client: Most of the work in rsync backups including the rotation process is done on the backup server which is usually dedicated to doing backups. This means that the client system being backed up is not hit with as much load as with some other backup programs. The load can also be tailored to your particular needs through several rsync options.
Fastest restores possible: If you just need to restore a single file or set of files it is as simple as a cp or scp command. Restoring an entire filesystem is just a reverse of the backup procedure. Restoring an entire system is a bit long but is less work than backup systems that require you to reinstall your OS first and about the same as other manual backup systems like dump or tar.
Only one restore needed: Even though each backup is an incremental they are all accessible as full backups. This means you only restore the backup you want instead of restoring a full and an incremental.
Cross Platform: Rsync can backup and recover anything that can run rsync. I have used it to backup Linux, Windows, DOS, OpenBSD, Solaris, and even ancient SunOS 4 systems.
Cheap: It doesn’t seem like it would be cheap to have enough disk space for 2 copies of everything and then some but it is. With tape drives you have to choose between a cheap drive with expensive tapes or an expensive drive with cheap tapes. In a hard drive based system you just buy cheap hard drives and use RAID to tie them together. My current backup server uses 2 300GB Maxtor drives on an old 3Ware 6200 RAID controller giving me a total of 600GB for about $380 which is less than I paid for the DDS3 tape drive that I used to use and that doesn’t even include the tapes that cost about $10/12GB.
Internet: Since rsync can run over ssh and only transfers what has changed (at the block level not the file level) it is perfect for backing up things across the internet if you need to do so. This is perfect for backing up a web site at a web hosting company or even a colocated server.
Do-it-yourself: There are FOSS backup packages out now that use rsync as their back end but the nice thing here is that you are using standard command line tools (rsync, ssh, cp, rm) so you can engineer your own backup system that will do EXACTLY what you want and you don’t need a special tool to restore.
rsync is a great tool for backing up and restoring files. I’ll use some example to explain on how it works:
“comentum” is an example directory
“test.txt” is an example file
“design.comentum.com” is an example of remote host
“bob” is an example user
“/home/bob/” is an example of directories (usually linux user’s home page)
“/Users/backup/” is an example of backup directories (usually OS X user’s home page)
To copy a file from a remote computer to the computer you are working on type:
rsync firstname.lastname@example.org:/home/bob/test.txt /Users/backup/
Or to copy the whole directory “comentum”:
rsync -r email@example.com:/home/bob/comentum
To copy the above process in reverse, just reverse the above line for example the below will copy the folder comentum to the computer at:
design.comentum.com and the folder: /home/bob/comentum
rsync -r /Users/backup/comentum firstname.lastname@example.org:/home/bob/comentum
Here is a summary of the few usable options:
Use the options like these examples:
r = recursive – means it copies directories and sub directories
v = verbose – means that it prints on the screen what is being copied
a = archive – means that symbolic links, devices, attributes, permissions, ownerships, etc. are preserved in the transfer
z = compress – compress file data
# rsync -r /CustomersArt/ /Volumes/Backup/CustomersArt/
The above example will copy the content of CustomersArt to /Volumes/Backup/CustomersArt/
# rsync -r /CustomersArt /Volumes/Backup/CustomersArt
The above example will copy the folder CustomersArt and its content to /Volumes/Backup/CustomersArt (it will create the folder: CustomersArt).
When using “/” at the end of source, rsync will copy the content of the last folder.
When not using “/” at the end of source, rsync will copy the last folder and the content of the folder.
When using “/” at the end of destination, rsync will paste the data inside the last folder.
When not using “/” at the end of destination, rsync will create a folder with the last destination folder name and paste the data inside that folder.