[nycbug-talk] shared disks for heavy I/O
Pete Wright
pete at nomadlogic.org
Mon May 4 12:37:25 EDT 2009
On 3-May-09, at 1:54 PM, Marco Scoffier wrote:
> Hello all,
>
> I am looking for recommendations for a shared disk solution in the 3
> to
> 5TB range that can support heavy reading and writing. Standalone ?
> Dedicated server ? Fiber channel ? Budget is around $4000. Has
> someone
> looked into this recently? How does Gig-ethernet performance
> compare to
> other solutions in real world situations?
>
hey marco - on that budget i'd say you should be able to get pretty
fast storage for ~5TB. it may not be reliable though (i.e. not
something like a netapp or isilon were you can suffer nfs server
failures w/ no downtime) - which may or may not be a big deal to you.
at a previous employer we were building high-resolution video playback
systems (capable of playing 2048x1536 at 60fps, as well as systems
capable of playing dual stream 1080p 3D video streams) for around this
much. our setup was pretty simple - since we needed to stream ~300MB/
s we used hardware RAID SATA controllers to our video playback
systems. for what you need 1Gb nics would probably be fine...if you
start saturating a single gig-nic you can always bond them for more
bandwidth. i think your disk subsystem will get saturated before your
network interfaces.
out setup was pretty simple:
1 dual quad-core workstation with 32GB ram
1 3ware 9000 series sata raid controller (no BBU - although that'd
probably help with your use case, but it'd drive up the cost).
1 external sata JBOD
(something similar to this: http://rackmountmart.stores.yahoo.net/sa3urastch10.html)
a bunch of large sata drives.
since we were a linux shop with a bunch of former SGI'ers we used the
XFS filesystem which has *very* good streaming I/O performance. For
your workload ZFS would probably suite fine - and you'd get to use
FreeBSD :)
The only hack we did was to format the disks in such a way that we did
not use any of the inside tracks of the individual disks. this
ensured that we'd be laying down, and reading blocks in a contiguous
manner on the outside of tracks of the disk. it actually had a
significant impact on the performance for us (at a slight storage
penalty).
> I am working with large data sets (presently 100G but eventually to
> become larger). I have to store intermediate steps of the
> computation to disk. Often to speed things up I run parallel
> computations which each read and write 1 to 10G of data at the
> beginning
> and the end of the computation. I have 2 16cpu servers with SATA
> disks
> which each exports its' disks to the other using NFS. Very often all
> cpus on both systems are maxed at 100%. Lately when ramping up the
> computations NFS has been locking up (even when only reading remotely
> and writing locally). I/O has been slow (ls takes forever to return
> for
> eg.). I think that we are probably asking too much of the current
> setup
> where we are both running computations and exporting NFS from the same
> machines.
>
I reckon the above setup would be good for your environment. loading
up your server with ton's of ram for caching should help with I/O
thrashing situations like you describe above. also using a bunch of
disks will help in this situation as well. a Battery Backup Unit on
our RAID controller will further help with caching - and give you a
little security in case of power failures etc.
also - don't forget about tuning your NFS client options. use large
read and write block sizes; think about using async writes if your
data isn't *that* important <grin>. and if you can use jumbo frames
use them - that'll help both the client and server.
sorry for rambling post - i've been neck deep in designing some new
storage systems and have been kicking around alot of ideas lately :)
HTH,
-pete
More information about the talk
mailing list