[nycbug-talk] hadoop - sharing a server?

Charles Sprickman spork at bway.net
Mon May 17 22:37:17 EDT 2010

On Mon, 17 May 2010, Edward Capriolo wrote:

> You can jail anything of course. The issue with jailing hadoop is that it is
> very IO heavy because data is constantly being spilled to disk. Even if your
> jail can limit memory or processor ticks the real problem is jails do not
> protect your disk. Now if you system is only being used for background batch
> processing that is fine. However, if you are trying to run a "real time" ish
> mysql instance and hadoop on the same they may not play together well if
> they fight for the disk. Same is true with any jail/vm solution, but hadoop
> batching likes to saturate things with load.

Thanks for the excellent feedback...  Right now I just need to get 
something up for various reasons:

-Evaluate Hadoop/HBase/Pig running on multiple hosts
-Get myself up to speed on Hadoop and to some extent, Java from a sysadmin 
-Get the folks that will be using this an environment to evaluate it and 
see if this is the proper set of tools to do the type of data analysis 
they want to do
-Shake out any BSD-specific issues

If this all goes well, we'd likely just bring up a few cheap servers as a 
standalone cluster.

Until then, the idea of jailing it on servers that have very sporadic 
usage patterns and don't have to really do stuff in "real time" seems like 
it might be a good compromise.  I'll be throwing this onto a few boxes in 
the next few days, so I'll report back with any interesting issues.

I'm going to do two things to try and keep hadoop from being a total pig - 
it's jail will be on it's own zfs partition with a quota to prevent it 
from chewing up too much space, and when I put together an rc.d script for 
it, I'll nice down hadoop.

For the future, there's some disk scheduling stuff coming into 8.1:



More information about the talk mailing list