[talk] Climate Mirror

Wed Dec 14 14:15:43 EST 2016

The data don't need to be online; save them to a redundant bunch of
cheap hard drives (or maybe tapes), and distribute them among lots of
bookshelves. They can even be slow and small hard drives pulled from old
computers; we need to write to each one only once, we might need to read
from each one once, and we otherwise only need to turn them on once
every couple years to make sure that they're still intact. Maintain a
website with a list of the datasets, the datasets' checksums, and the
contact information for the people with the hard drives on their
bookshelves.

Note that this is my opinion only on how this project could be
implemented. I don't know enough about the datasets or the likely
effects of geopolitics on their implementation in order to comment as to
whether I think the project should be implemented.

On Wed, Dec 14, 2016, at 06:40 PM, Isaac (.ike) Levy wrote:
> 
> > On Dec 14, 2016, at 12:59 PM, Pete Wright <pete at nomadlogic.org> wrote:
> > 
> > 
> > 
> > On 12/14/16 5:52 AM, Brian Cully wrote:
> >> On 14-Dec-2016, at 00:29, Isaac (.ike) Levy <ike at blackskyresearch.net> wrote:
> >> 
> >>> 
> >>>> Maybe torrents, IPFS, ...? Or a collaborative
> >>>> distributed file system. Perhaps using QFS, MFS or LFS?
> >>> 
> >>> While I just got pretty excited about NYC*BUG’s ability to take this on whole hog, I ABSOLUTELY would love to see this explored further.
> >>> 
> >>> Could you propose something we could get involved in as a group from NYC*BUG, perhaps something people can run to donate a small chunk of their own smaller servers?
> >> 
> >> 	I like the idea of using torrents. There are lots of upsides: it’s easy to get involved by sharing a smallish chunk of the set, the relatively small tracker file can be separately copied around to ensure there’s no single point of attack, perhaps even via git or something similar to ensure it’s not tampered with (and made trivially available via github), and it’s pretty fire-and-forget (just leave it running on a routable server).
> >> 
> >> 	The major downside I see is that unless the data has already been made available via torrent, someone’s gotta seed the thing, which still means you need at least one server with a lot of disk space to get the project started. That’s something that we may want anyway, just to ensure the thing can always be seeded (at least until the feds come knocking, but hopefully by then there are many redundant copies of the data sitting around the world).
> >> 
> >> 	I know I’d certainly be willing to donate a few TB on my server to hosting a portion of the data set, but there’s no way I could host the whole thing, and I’d also be willing to throw some money into the hat to get the seed up.
> >> 
> > 
> > 
> > I was thinking about using the torrent protocol last night and i think there are two issues that would prevent this:
> > 
> > 
> > - we'd have to generate check-sums for every dataset that is stored, then generate URI's for each of them.  I am pretty confident the data here is not bt friendly...which leads to my second point
> > 
> > - the academic/prof consumers of this data are probably not going to use bt to download these files for research.  unfortunately ftp and http are probably used very frequently in these arena's.
> > 
> > having said this - i def feel that bt would be a *much* better method to distribute and share the cost of hosting data...but i'm not sure if they are ready for this or not :)
> > 
> > -pete
> 
> From my view, everything points back to needing some simple big disk
> online to have complete sets- even as a base to seed torrents/other.
> 
> I’m personally going to focus on that end, but I’d really love to see
> more ideas for distributed data hit this list- particularly if someone
> has actionable ways to get involved, (e.g. how-to use this pkg, use this
> torrent, configure like so, etc…)
> 
> Well worth the discussion and collaboration here, even if this gets messy
> or incomplete at first!
> 
> Rocket-
> .ike
> 
> 
> 
> _______________________________________________
> talk mailing list
> talk at lists.nycbug.org
> http://lists.nycbug.org/mailman/listinfo/talk