[talk] Climate Mirror
pete at nomadlogic.org
Wed Dec 14 12:59:34 EST 2016
On 12/14/16 5:52 AM, Brian Cully wrote:
> On 14-Dec-2016, at 00:29, Isaac (.ike) Levy <ike at blackskyresearch.net> wrote:
>>> Maybe torrents, IPFS, ...? Or a collaborative
>>> distributed file system. Perhaps using QFS, MFS or LFS?
>> While I just got pretty excited about NYC*BUG’s ability to take this on whole hog, I ABSOLUTELY would love to see this explored further.
>> Could you propose something we could get involved in as a group from NYC*BUG, perhaps something people can run to donate a small chunk of their own smaller servers?
> I like the idea of using torrents. There are lots of upsides: it’s easy to get involved by sharing a smallish chunk of the set, the relatively small tracker file can be separately copied around to ensure there’s no single point of attack, perhaps even via git or something similar to ensure it’s not tampered with (and made trivially available via github), and it’s pretty fire-and-forget (just leave it running on a routable server).
> The major downside I see is that unless the data has already been made available via torrent, someone’s gotta seed the thing, which still means you need at least one server with a lot of disk space to get the project started. That’s something that we may want anyway, just to ensure the thing can always be seeded (at least until the feds come knocking, but hopefully by then there are many redundant copies of the data sitting around the world).
> I know I’d certainly be willing to donate a few TB on my server to hosting a portion of the data set, but there’s no way I could host the whole thing, and I’d also be willing to throw some money into the hat to get the seed up.
I was thinking about using the torrent protocol last night and i think
there are two issues that would prevent this:
- we'd have to generate check-sums for every dataset that is stored,
then generate URI's for each of them. I am pretty confident the data
here is not bt friendly...which leads to my second point
- the academic/prof consumers of this data are probably not going to use
bt to download these files for research. unfortunately ftp and http are
probably used very frequently in these arena's.
having said this - i def feel that bt would be a *much* better method to
distribute and share the cost of hosting data...but i'm not sure if they
are ready for this or not :)
pete at nomadlogic.org
More information about the talk