[nycbug-talk] mapreduce, hadoop, and UNIX

Isaac Levy ike at lesmuug.org
Sat Aug 25 11:47:11 EDT 2007


On Aug 25, 2007, at 11:22 AM, Alex Pilosov wrote:

> On Sat, 25 Aug 2007, Isaac Levy wrote:
>
>> can anyone shed some light on similar prior works in distributed
>> computing and RPC systems which are 'old classics' in UNIX?  These
>> distributed computing problems simply can't be new.
>>
>> To be really straight, what I'm getting at, is why is this more or  
>> less
>> useful than intelligently piping commands through ssh?  What about  
>> older
>> UNIX rpc mechanisms?  Aren't there patterns in even kernel source  
>> code
>> which match this work, or are even computationally more sophisticated
>> and advanced?
> mapreduce is most of all, an API. Unix is contrary to idea of APIs
> (everything is a stream of bytes).

Damn good observation.

Guess that's why Pike wrote the 'Sawzall' utility on top of it :)

>
> mapreduce isn't really rocket science by any means, see below.
>
>>  From kernel to userland to network, I'm dying to find similar works,
>> any help is much appreciated!
> Similar things to look at: PVM and MPI -

AWESOME, exactly what I was wanting to grok-  Thanks Alex!

--
Links for this thread, for the record:

PVM (created 1989, currently actively maintained):
http://www.csm.ornl.gov/pvm/
http://en.wikipedia.org/wiki/Parallel_Virtual_Machine

MPI (created 1990s, man implementations in various contexts/languages):
http://en.wikipedia.org/wiki/Message_Passing_Interface
http://www.mpi-forum.org/


> these are APIs for non-shared
> memory, message passing, distributed computation. They are an order of
> magnitude more involved than mapreduce - they are much more generic.
> mapreduce can be easily implemented using PVM but not vice versa.
>
> mapreduce is optimal for 'embarassingly parallel' jobs - ones that are
> very easy to paralellize. There hasn't been much research into that  
> - its
> been a solved problem 40 years ago.

Not surprised.  :)
However powerful the simple idea of MapReduce is, there seems to be  
far too much hype over it all IMHO- and lots of confusion about  
applying it in discussions online, (when all you have is a hammer,  
everything is a nail...)

Looking at it in historical context is very useful here.

Rocket- and thanks Alex!
.ike





More information about the talk mailing list