[nycbug-talk] hadoop - sharing a server?
Charles Sprickman
spork at bway.net
Mon May 17 22:37:17 EDT 2010
On Mon, 17 May 2010, Edward Capriolo wrote:
> You can jail anything of course. The issue with jailing hadoop is that it is
> very IO heavy because data is constantly being spilled to disk. Even if your
> jail can limit memory or processor ticks the real problem is jails do not
> protect your disk. Now if you system is only being used for background batch
> processing that is fine. However, if you are trying to run a "real time" ish
> mysql instance and hadoop on the same they may not play together well if
> they fight for the disk. Same is true with any jail/vm solution, but hadoop
> batching likes to saturate things with load.
Thanks for the excellent feedback... Right now I just need to get
something up for various reasons:
-Evaluate Hadoop/HBase/Pig running on multiple hosts
-Get myself up to speed on Hadoop and to some extent, Java from a sysadmin
perspective
-Get the folks that will be using this an environment to evaluate it and
see if this is the proper set of tools to do the type of data analysis
they want to do
-Shake out any BSD-specific issues
If this all goes well, we'd likely just bring up a few cheap servers as a
standalone cluster.
Until then, the idea of jailing it on servers that have very sporadic
usage patterns and don't have to really do stuff in "real time" seems like
it might be a good compromise. I'll be throwing this onto a few boxes in
the next few days, so I'll report back with any interesting issues.
I'm going to do two things to try and keep hadoop from being a total pig -
it's jail will be on it's own zfs partition with a quota to prevent it
from chewing up too much space, and when I put together an rc.d script for
it, I'll nice down hadoop.
For the future, there's some disk scheduling stuff coming into 8.1:
http://wiki.freebsd.org/Releng/8.1TODO
http://info.iet.unipi.it/~luigi/papers/20090508-geom_sched-slides.pdf
Charles
More information about the talk
mailing list