[nycbug-talk] hadoop - sharing a server?

Tue May 11 23:55:24 EDT 2010

Hi all,

I just recently went back and listened to the hadoop presentation from a 
few months ago.  The timing was great, as I've been tasked with setting up 
a basic hadoop environment for pulling some stats out of a ton of mail 
logs.  We'll likely be using HBase, but will be looking at Pig as well.

I have a 3-node test setup running on FreeBSD 8.0 in VMWare.  I was 
pleasantly surprised that Java was not a real pain to get going.  In 
short, this all looks good, and it looks like it would be easy enough to 
copy one of these nodes to a jail, archive that jail, and then deploy a 
bunch of these things all over the place.

So my question...  What we're looking to do with Hadoop does not yet 
justify going out and buying a half dozen or so servers.  I'd like to jail 
it on a bunch of our existing servers.  The nature of the load on these 
things is that they have widely varying workloads with many lulls during 
the day.  The nature of the jobs we want to run on the hadoop cluster is 
that basically we can wait as long as it takes for now.  So is anyone 
running hadoop nodes on servers not dedicated to this task?  Does it 
respond to being niced down?  Are there some resource utiliztion knobs 
I've missed in all the quicky howto's I've read?

Thanks,

Charles