[nycbug-talk] mapreduce, hadoop, and UNIX

Sat Aug 25 11:34:20 EDT 2007

On Sat, 25 Aug 2007, Isaac Levy wrote:

> Afterthought addition,
> 
> On Aug 25, 2007, at 10:48 AM, Isaac Levy wrote:
> 
> >  From kernel to userland to network, I'm dying to find similar works,
> > any help is much appreciated!
> 
> E.G.:
> 
> Distributed computing implementations:
> - Plan 9?
> - DragonflyBSD Clustering?
We all are hoping today to have clusters similar to what VMS had 25 years
ago - fully transparent non-shared memory clustering aka "single system
image". You don't know, and you don't care which node on the cluster the
job is running on, and jobs can be migrated to and from nodes depending on
the load.

For proper clustering, you need a distributed filesystem, distributed lock 
manager, and job distribution engine.

On linux front, closest thing would be MOSIX, which is *almost* that. 
Unfortunately, MOSIX is first and foremost a research project, with 
restrictive licensing and fragmented community (see, openmosix). Today, 
the project to have properly working clusters is openssi.org - I believe 
it is based on openmosix and opengfs.

Clustering is hard, comparing to writing an OS - even Linus can do that 
one. 

> Data implementations:
> - Sun ZFS?
> - AFS and the like?
> - RH GFS and the like?
If you are talking about proper distributed filesystems, they are few and 
far between. 

gfs/opengfs
oracle ocfs
intermezzo/lustre
pvfs
veritas dfs 
sgi cxfs (distributed xfs)

Distributed filesystems are hard, compared to writing an OS.

-alex