[talk] Climate Mirror

Wed Dec 14 17:23:12 EST 2016

On 12/14/2016 11:40 AM, Isaac (.ike) Levy wrote:
> 
>> On Dec 14, 2016, at 12:59 PM, Pete Wright <pete at nomadlogic.org>
>> wrote:
>> 
>> 
>> 
>> On 12/14/16 5:52 AM, Brian Cully wrote:
>>> On 14-Dec-2016, at 00:29, Isaac (.ike) Levy
>>> <ike at blackskyresearch.net> wrote:
>>> 
>>>> 
>>>>> Maybe torrents, IPFS, ...? Or a collaborative distributed
>>>>> file system. Perhaps using QFS, MFS or LFS?
>>>> 
>>>> While I just got pretty excited about NYC*BUG’s ability to take
>>>> this on whole hog, I ABSOLUTELY would love to see this explored
>>>> further.
>>>> 
>>>> Could you propose something we could get involved in as a group
>>>> from NYC*BUG, perhaps something people can run to donate a
>>>> small chunk of their own smaller servers?
>>> 
>>> I like the idea of using torrents. There are lots of upsides:
>>> it’s easy to get involved by sharing a smallish chunk of the set,
>>> the relatively small tracker file can be separately copied around
>>> to ensure there’s no single point of attack, perhaps even via git
>>> or something similar to ensure it’s not tampered with (and made
>>> trivially available via github), and it’s pretty fire-and-forget
>>> (just leave it running on a routable server).
>>> 
>>> The major downside I see is that unless the data has already been
>>> made available via torrent, someone’s gotta seed the thing, which
>>> still means you need at least one server with a lot of disk space
>>> to get the project started. That’s something that we may want
>>> anyway, just to ensure the thing can always be seeded (at least
>>> until the feds come knocking, but hopefully by then there are
>>> many redundant copies of the data sitting around the world).

Right. But how many servers, and how much storage? Taking a quick look
at that spreadsheet, I saw one 100TB dataset. And I wouldn't be
surprised if there were several PB overall.

But the initial box wouldn't need to seed 100TB or whatever in one go.
Maybe 1TB chunks. That's pretty common for HD video. Once there were
other seeders, the initial box could start seeding another 1TB chunk.

>>> I know I’d certainly be willing to donate a few TB on my server
>>> to hosting a portion of the data set, but there’s no way I could
>>> host the whole thing, and I’d also be willing to throw some money
>>> into the hat to get the seed up.
>>> 
>> 
>> 
>> I was thinking about using the torrent protocol last night and i
>> think there are two issues that would prevent this:
>> 
>> 
>> - we'd have to generate check-sums for every dataset that is
>> stored, then generate URI's for each of them.  I am pretty
>> confident the data here is not bt friendly...which leads to my
>> second point

Yes, seedboxes need decent CPU and RAM for checksums.

>> - the academic/prof consumers of this data are probably not going
>> to use bt to download these files for research.  unfortunately ftp
>> and http are probably used very frequently in these arena's.

Well, just about every Linux distro comes with a BT client. And there's
a webGUI for Transmission, which simplifies running remote servers. I'm
sure that there's comparable stuff on *BSD, Mac and Windows.

>> having said this - i def feel that bt would be a *much* better
>> method to distribute and share the cost of hosting data...but i'm
>> not sure if they are ready for this or not :)
>> 
>> -pete
> 
> From my view, everything points back to needing some simple big disk
> online to have complete sets- even as a base to seed torrents/other.

Yes. Plural, I think. There's a _lot_ of data. And it needs replication.

> I’m personally going to focus on that end, but I’d really love to see
> more ideas for distributed data hit this list- particularly if
> someone has actionable ways to get involved, (e.g. how-to use this
> pkg, use this torrent, configure like so, etc…)

I've been playing with distributed file systems. Mainly LizardFS and
Quantcast QFS. My primary focus has been running nodes as Tor onion
services, with IPv6 OnionCat links.

I've also been playing with MPTCP. With six IPv6 OnionCat links, I get
30-50Mbps between onions. With real multihomed servers, sans Tor, I
suspect that Tbps is doable. Arguably, Tor-level privacy is unnecessary
for this effort. I'll shift focus to that, for now.

> Well worth the discussion and collaboration here, even if this gets
> messy or incomplete at first!
> 
> Rocket- .ike
> 
> 
> 
> _______________________________________________ talk mailing list 
> talk at lists.nycbug.org http://lists.nycbug.org/mailman/listinfo/talk
>