[Semibug] arse-ink (rsync) --link-dest backup presentation notes

Jeff Marraccini marraccini at acm.org
Mon Sep 19 21:24:43 EDT 2016

Thank you very much, Nick!

On Mon, Sep 19, 2016 at 9:13 PM -0400, "Nick Holland" <nick at holland-consulting.net> wrote:

Been meaning to make the slides I used for my rsync/arse-ink
presentation available in usable form.  Attached is a PDF (ick) of the
presentation slides, and here is the basic text of those slides with
some additional comments.


rsync ­link-dest
Local, rotated, quick and useful backups!

* Not providing any complete scripts
* Just providing strategy so that a competent scripter will be able to
build what they need
* I have done what is being presented here on OpenBSD, FreeBSD, Solaris,
AIX and Linux.

A simple RSYNC backup system:
* From backup sources to central disk storage
* Copies over what changed.
* Copies over just the PARTS that change
* (after first backup) VERY fast and efficient.
* Not rotated.  No archive.  BARELY counts as a backup.

Crash review of hard linked files:
* Multiple directory entries on the same file system to ONE file on disk.
* All hard links are equal.
* File exists until last link is removed.
* Again ... All hard links are equal.  yes, this is important.

Better rsync backup -- with hard links
* Create new directory for new backup.
* Hard link everything in old directory to new directory, duplicating
directory tree structure.
* Rsync from target system to backup system's new directory
* Unchanged files stay a link to old version, changed files get
overwritten, but previous copies remain.
(see "Dirvish" http://www.dirvish.org/)

Better yet --link-dest
* Three way rsync --€“ Source, PREVIOUS copy, NEW copy.
* New files: copied over.
* Unchanged files: hard link from --link-dest directory (!!)
* Changed files --€“ Copied over (but with rsync bandwidth usage)

* Rotated! History!
* Minimal disk use, minimal network traffic
* EVERY backup is "full", but as fast as an incremental!
* WONDERFULLY USEFUL as it sits on the backup server.
* Communications over SSH, automatic key logon.
* Backup client? rsync! ANY version
* Restore client? rsync, scp, vi, whatever.
* Backup systems that your "primary" backup solution doesn't recoginize

(you knew there had to be some)
* No geographic diversity
  (everything on one host)
* File ownership, permissions CAN be munged.
  ('specially a problem when going between OSs)
* Root access needed to systems being backed up.
* Not a "bare metal" restore.
* Best for restoring data and config files
* End up with some complicated file systems.
  (lots of links.  LOTS of links.  You may find "issues" you have never
   seen before)
* MS Windows.
  ('nuff said?  well, maybe with a cygwin ssh & rsync client)
* A failed backup can really balloon your disk needs.
  (you can lose all the link history.  often best to manually delete a
   failed backup and re-run)
* du ... not fast. Not at all fast.
  (and no other good way to find how much space each backup is taking
   on disk.  And some OS's "du" don't recognize multiple links to one

Not just backups. This is Unix!
Assuming /bu///
* Which systems is nholland on?
   # grep nholland /bu/*/20160823/etc/passwd
* When did nholland's account get created?
   # grep nholland /bu/fs3/*/etc/passwd
* How's the database dump growing?
   # ls l /bu/fs3/*/db/dumpfile.txt
* Any time you have a question about all machines...
* File change detection (IDS)?
* Change ownership/permissions (non-root analysis).
* Systems doing self-backups
  (rsync --link-dest to another disk, or an SD/USB/whatever device)

Doing It.
* "Projects" exist. So what.  Roll your own.
  (Basic scripting and unix administration.)
* Rsync {-options} --link-dest {prevbu} {source} {newbu}
* Date your backups (yyyy-mm-dd)
  * Maybe most recent as "curr"?
  * Sortable naming
  * M-yyyy-mm-dd for monthly?
* Pre-create backup directories (2000-00-00, 2000-00-01, 2000-00-02,
* Create new backup dir.
* Make backup between source, previous, and new
* Delete oldest afterwards ­ keeps a constant number of backups.
* Want to "pull" a backup out of rotation? Rename it! Create new
  replacement. (note: "cost" will increase  with time)
* Save output of rsync to a file ­-- backup log!
* Create backup reports from the rsync log files.
* Chunk your data. Even though you don't want to.
  (don't create one monster partition, chunk it up into managable
   pieces.  You will thank me someday)
  * Symlink from /bu to actual storage spaces
* Watch your free space carefully. Don't run out.
* Know what your `du' command does with hard links.
* Use otherwise "wasted" space ­ local disk on VM hosts?
* Test your restores!
* Beware of reversing trust.
  (someone will try to have you let them log into the backup machine
   from their machine that wants to drop off a backup.)
* One script to run the job ­ "bu"
* Second script to grab the output from automatic runs ­-- "bucron"
   * Run just the specified job? ­ or ­
   * Run all jobs in specified directory?
   * pgrep | wc -l your rsyncs, hold off until there are fewer
     than X running (20 to 30?)
* head, tail, basename, dirname, df, du, grep are your friends.

Rsync options
(beyond --link-dest)
* -a (you want this. Covers a lot of things)
* -H (Preserve hard links. Probably)
* --stats (Summary statistics. For report)
* --progress (eh. Maybe not.  No, probably not.)
* --force (can't remember why I started using this)
* -z (Compress ­-- varies depending on use.  Sometimes it hurts,
    sometimes it helps.  Definitions of "hurt" and "help" varies)

rsync option --exclude-file
* Some things, you don't want backed up. Ever.
* Syntax is somewhere between tricky, black magic and just broken.
* Start with a default, then add to it as needed.

         + /
         - /mnt
         - /proc
         - /tmp
         - /ramtmp
         - /dev
         - /sys

* Very modest, unless you have a lot of local, high-speed systems.
* Lots of cheap but redundant disk storage.
* Slow CPU on backup system may reduce load on machines being backed up.
* Compression may or may not improve overall performance.  You may or
  may not wish to improve performance!
* Memory ­ usually determined by file system, not rsync tasks
* 1 core, 1G RAM is often more than sufficient.

* DO NOT run AV on the system.
  (Most Unix antivirus apps are simplistic ports of Windows apps, and
   they don't understand hard links, so they will scan the same file
   dozens of times)
* --link-dest need only be on the BU system; not the host being backed
  (used this to back up "untouchable" horribly old systems with rsync
  so old that --link-dest didn't exist)
* If backing up the backups to tape, beware massive numbers of hard
  links. And be ready for issues on restore...like having to restore ALL
  copies starting with the oldest!

FreeBSD/ZFS variant
* Each system gets its own ZFS partition.
* df shows all!
* No --link-dest, use ZFS snapshots
* ZFS SEND snapshots to another machine
   * Destination MUST be "Read Only"
     (yes, you are writing to a "RO" destination.)
   * atime is not your friend.
     (mortal enemy is more accurate.  ANY change in the target FS means
     "zfs send" won't work...and 'atime' changes are as real as
* Good luck. You may need it. Found to be about as stable as a pig on
  stilts (granted...vmware, insufficient RAM, insufficient "tuning".)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nycbug.org/pipermail/semibug/attachments/20160920/b21d168f/attachment.html>

More information about the Semibug mailing list