[talk] Climate Mirror

Isaac (.ike) Levy ike at blackskyresearch.net
Wed Dec 14 13:40:09 EST 2016


> On Dec 14, 2016, at 12:59 PM, Pete Wright <pete at nomadlogic.org> wrote:
> 
> 
> 
> On 12/14/16 5:52 AM, Brian Cully wrote:
>> On 14-Dec-2016, at 00:29, Isaac (.ike) Levy <ike at blackskyresearch.net> wrote:
>> 
>>> 
>>>> Maybe torrents, IPFS, ...? Or a collaborative
>>>> distributed file system. Perhaps using QFS, MFS or LFS?
>>> 
>>> While I just got pretty excited about NYC*BUG’s ability to take this on whole hog, I ABSOLUTELY would love to see this explored further.
>>> 
>>> Could you propose something we could get involved in as a group from NYC*BUG, perhaps something people can run to donate a small chunk of their own smaller servers?
>> 
>> 	I like the idea of using torrents. There are lots of upsides: it’s easy to get involved by sharing a smallish chunk of the set, the relatively small tracker file can be separately copied around to ensure there’s no single point of attack, perhaps even via git or something similar to ensure it’s not tampered with (and made trivially available via github), and it’s pretty fire-and-forget (just leave it running on a routable server).
>> 
>> 	The major downside I see is that unless the data has already been made available via torrent, someone’s gotta seed the thing, which still means you need at least one server with a lot of disk space to get the project started. That’s something that we may want anyway, just to ensure the thing can always be seeded (at least until the feds come knocking, but hopefully by then there are many redundant copies of the data sitting around the world).
>> 
>> 	I know I’d certainly be willing to donate a few TB on my server to hosting a portion of the data set, but there’s no way I could host the whole thing, and I’d also be willing to throw some money into the hat to get the seed up.
>> 
> 
> 
> I was thinking about using the torrent protocol last night and i think there are two issues that would prevent this:
> 
> 
> - we'd have to generate check-sums for every dataset that is stored, then generate URI's for each of them.  I am pretty confident the data here is not bt friendly...which leads to my second point
> 
> - the academic/prof consumers of this data are probably not going to use bt to download these files for research.  unfortunately ftp and http are probably used very frequently in these arena's.
> 
> having said this - i def feel that bt would be a *much* better method to distribute and share the cost of hosting data...but i'm not sure if they are ready for this or not :)
> 
> -pete

From my view, everything points back to needing some simple big disk online to have complete sets- even as a base to seed torrents/other.

I’m personally going to focus on that end, but I’d really love to see more ideas for distributed data hit this list- particularly if someone has actionable ways to get involved, (e.g. how-to use this pkg, use this torrent, configure like so, etc…)

Well worth the discussion and collaboration here, even if this gets messy or incomplete at first!

Rocket-
.ike





More information about the talk mailing list