[nycbug-talk] Off topic: Best way to mirror large ftp

Marc Spitzer mspitzer at gmail.com
Wed Nov 26 15:49:00 EST 2008


On Wed, Nov 26, 2008 at 11:12 AM, Matt Juszczak <matt at atopia.net> wrote:
> I have about 3 TB of data I need to mirror off of an FTP box.  Using
> traditional methods, it would take me about 16+ days to get all of that
> information.
>
> I've looked at things like lftp, and a few other "scripts" out there, but
> ideally I would love to find something that can:
>
> 1) Index the entire FTP

mtree on server?

> 2) Split the downloads into multiple threads

how much bandwidth do you have to work with?

> 3) Update the index at any time (the FTP server changes) and download the
> differences (yes, this may be an expensive operation I know)

run mtree every so often on server?

>
> Any suggestions?  Off topic I know, but I've been struggling for some time
> now on this issue and I'm hoping some of you fellow sysadmins have some
> suggestions.

run the following on the server:

1: run "find . -type d > dir_list"
2: run "find . -type f >file_list"

on client
3: down load both files
4: cat dir_list |xargs -n 20  mkdir -p
5: split file_list -l "pick reasonable number"
6: run a bunch of shell scripts to do the fetch, one per out put file from 5
....

or just run rsync and let it do its job.

marc

marc


-- 
Freedom is nothing but a chance to be better.
Albert Camus



More information about the talk mailing list