[nycbug-talk] mapreduce, hadoop, and UNIX
alex at pilosoft.com
Sat Aug 25 11:22:20 EDT 2007
On Sat, 25 Aug 2007, Isaac Levy wrote:
> can anyone shed some light on similar prior works in distributed
> computing and RPC systems which are 'old classics' in UNIX? These
> distributed computing problems simply can't be new.
> To be really straight, what I'm getting at, is why is this more or less
> useful than intelligently piping commands through ssh? What about older
> UNIX rpc mechanisms? Aren't there patterns in even kernel source code
> which match this work, or are even computationally more sophisticated
> and advanced?
mapreduce is most of all, an API. Unix is contrary to idea of APIs
(everything is a stream of bytes).
mapreduce isn't really rocket science by any means, see below.
> From kernel to userland to network, I'm dying to find similar works,
> any help is much appreciated!
Similar things to look at: PVM and MPI - these are APIs for non-shared
memory, message passing, distributed computation. They are an order of
magnitude more involved than mapreduce - they are much more generic.
mapreduce can be easily implemented using PVM but not vice versa.
mapreduce is optimal for 'embarassingly parallel' jobs - ones that are
very easy to paralellize. There hasn't been much research into that - its
been a solved problem 40 years ago.
More information about the talk