[nycbug-talk] Extremely large SAN/NAS
pete at nomadlogic.org
Fri Jan 20 16:34:51 EST 2012
On Fri, Jan 20, 2012 at 08:11:25AM -0500, Christopher Olsen wrote:
> Hello Everyone,
> Anyone here have any experience or thoughts on how to put together a large Data store?
> What I would like to accomplish would be to have something with the capacity in the area of 5,000 terabytes and also have the ability to take snapshots...
> It wouldn't necessarily need to appear as a single node but I definitely want to get the highest possible storage density per node. Also performance need not be considered as long as its within reason.
i think there are a couple things to keep in mind when building any
large storage architecture. the first is - what is your
application/use-case? this will help you figure out if you need a SAN
or if a NAS will suffice. For example - building out a huge SAN, what
filesystem will eventually be overlayed on your LUN's, do you really
need a 5PB SAN or can it be broken down into more managable pools...etc.
Lets assume you are building a NAS infrastructure though, as I imagine
that would be a more common use-case for a 5PB storage architecture. My
opinion is that if you are building out something this big you really
would benefit working with an appliance vendor - esp. if this a tier-1
system you are building (interestingly enough you'll find that vendors
like Isilon and NetApp are actually based on FreeBSD). Aside from
hardware integration and support - appliances will also generally take
care of HA clustering and other difficult problems. You wouldn't want
your 5PB datastore to have a SPOF would you :)
A final thought is - check out clustered filesystems like Gluster,
Ceph (http://ceph.newdream.net/) or something similar. These solutions
will allow you to leverage off the shelf hardware w/o sacrificing HA
capabilities. They also should scale if designed correctly from the
Although - like I said in the begining you really need to figure out
your usecase when buiding something to scale like this. Once you figure
out how data is being accessed (block level via a SAN, at the IP layer
via a NAS, or via an API from a clustered filesystem) that'll help you
figure out what your system will look like at the end of the day. Each
one has is benefits and drawbacks.
pete at nomadlogic.org
More information about the talk