[nycbug-talk] online storage rsync.net, carbonite, dreamhost...or s3?

George Georgalis george at galis.org
Mon Oct 15 23:21:53 EDT 2007


On Mon, Oct 15, 2007 at 04:05:24PM -0700, Peter Wright wrote:
>
>> I'd like to back up various FreeBSD systems I have with rsync...
>>
>> Amazon's S3 seems like a good choice, except that there are no real
>> tools to have it be used with rsync. (There are a lot of "rsync-like"
>> tools, and there are some backup utilties that only run on Windows and
>> possibly MacOS X/Linux.)
>>
>> So my remaining options are Carbonite, rsync.net, and Dreamhost. They
>> all provide Unix shell accounts with gobs of space, more or less.
>>
>> Has anyone any recommendations?
>
>not to nit-pick - but rsync is not really a backup tool.  it is a
>mirroring tool, the difference may seem small but think of it this way. 
>backups allow you to restore your data to a specific point in time, i do
>not think rsync will help you when you make a major snafu and do not
>realize it until *after* your next rsync run.  that's just one thing to
>keep in mind (most backup suites allow you to catalog your data for
>indexing among other important features).

rsync works great for backups, this is some key elements from a
script that has taken "hardlink" (-H) push snapshots for a while

now () { date +%Y.%m.%d.%H%M.%S ;}
NOW=$(now)
RECENT=$(ssh $HOST "ls -d $PREFIX/1/* | tail -n1")
TARGETD="$PREFIX/0/$NOW"
ssh $HOST "mkdir -p $TARGETD/$START"
        { rsync $rsync_opt $exclude --link-dest="$RECENT/$START" /$START/ ${HOST}:/${TARGETD}/$START/ \
                || true # files could dissappear
        } | grep -v /$ || true # cleanup output, false exit expected
ssh $HOST "mv $TARGETD $PREFIX/1/"

(yes that is a bit hacky, a more atomic version is in the works)

I can take one hour snapshots of 1,000,000 files (100GB) 24/7;
files that haven't changed just hard link to the last snapshot,
changed or new files appear in the new snapshot. It's all very
efficient, typically takes 3 or 6 minutes from raid 5 to UDMA 6
over GB network. Since our files don't change much (nor do we
delete often), the backup host (with all the snapshots) hardly
uses any more space then the current original. ufs supports about
32,000 hard links per file (I think).

every day I purge some old snapshots and move one to a daily (./2)
directory; and do the the same type of rotation on weekly (./3)
basis.

It's important to note even with all these rsync timestamp
snapshots available, there is only one copy of each file, it's
important to copy from the backup partition before making any
changes to files. We NFS mount the snapshots ro to the main host
for user access. Also good to make a tgz from the weekly every so
often. tar preserves the hardlinks.

I didn't know the rsync/samba people do data hosting, but I bet
they do a good job.

// George


-- 
George Georgalis, information system scientist <IXOYE><



More information about the talk mailing list