[nycbug-talk] mapreduce, hadoop, and UNIX

Sat Aug 25 11:22:20 EDT 2007

On Sat, 25 Aug 2007, Isaac Levy wrote:

> can anyone shed some light on similar prior works in distributed
> computing and RPC systems which are 'old classics' in UNIX?  These
> distributed computing problems simply can't be new.
> 
> To be really straight, what I'm getting at, is why is this more or less
> useful than intelligently piping commands through ssh?  What about older
> UNIX rpc mechanisms?  Aren't there patterns in even kernel source code
> which match this work, or are even computationally more sophisticated
> and advanced?
mapreduce is most of all, an API. Unix is contrary to idea of APIs
(everything is a stream of bytes).

mapreduce isn't really rocket science by any means, see below.

>  From kernel to userland to network, I'm dying to find similar works,
> any help is much appreciated!
Similar things to look at: PVM and MPI - these are APIs for non-shared
memory, message passing, distributed computation. They are an order of
magnitude more involved than mapreduce - they are much more generic.
mapreduce can be easily implemented using PVM but not vice versa.

mapreduce is optimal for 'embarassingly parallel' jobs - ones that are 
very easy to paralellize. There hasn't been much research into that - its 
been a solved problem 40 years ago.

-alex