[nycbug-talk] shared disks for heavy I/O

Pete Wright pete at nomadlogic.org
Mon May 4 14:34:08 EDT 2009


On 4-May-09, at 11:10 AM, marco scoffier wrote:

> Thanks a lot for the details Pete. I actually had you in might when  
> I posed the question :)
>
> <snip>
>
> > on that budget i'd say you should be able to get pretty fast  
> storage for ~5TB.  it may not >be reliable though (i.e. not  
> something like a netapp or isilon were you can suffer nfs >server  
> failures w/ no downtime)
>
> Sorry but too many double negatives in the opener... I think I  
> understood, netapp and isilon are good but more expensive ?  But I  
> think I am more interested in the system you describe below...
>

gah - that's awful typing on my part, sorry about man.  basically i  
was trying to say that storage vendors like NetApp can provide you  
with high performance, reliable storage.  But it is quite expensive,  
which would be well over the 4k budget.

> Pete Wright wrote:
>>
>> out setup was pretty simple:
>>
>> 1 dual quad-core workstation with 32GB ram
>> 1 3ware 9000 series sata raid controller (no BBU - although that'd  
>> probably help with your use case, but it'd drive up the cost).
>> 1 external sata JBOD
>> (something similar to this: http://rackmountmart.stores.yahoo.net/sa3urastch10.html)
>> a bunch of large sata drives.
>>
> Forgive me for being a bit clueless here.   I haven't done one of  
> these external disk setups before.   There are 10 cables running  
> between the workstation and the external JBOD ?  The RAID controller  
> is in the workstation or the external ?  The idea is that the  
> workstation exports NFS shares through gigabit ethernet but uses all  
> its memory and CPU for disk access ?

so in our setup what happens is you have the external drive bay with  
lets say 10 SATA drives in them.  The drives connect on a backplane  
which concentrates some (up to 4 i believe) SATA interfaces into one  
external SATA cable.  The cable(s) then connect to external ports on  
our 3ware cards.  The cards see the 10 individual drives though - so  
you can do hardware RAID on the 3ware card, or pass them through to  
your OS.  If I have time today I can google up the parts we were using  
to do this...but here's a link from 3ware that may help get ya started:

http://www.3ware.com/products/cables.asp

look under Cables for 9590SE and 3Ware Sidecar.  we were using the 19"  
SATA "Multilance" CBL-IB-05M.  Another configuration we've used is the  
3Ware sidecar (check out the Drive Cages menu on the left hand side) -  
but this limits you to 4 drives.

In our case we had one workstation that had the storage directly  
attached to it for video playback.  In your case I would recommend  
setting up a dedicated NFS server if possible.  Then you can tune your  
systems accordingly.



>>
>> The only hack we did was to format the disks in such a way that we  
>> did not use any of the inside tracks of the individual disks.  this  
>> ensured that we'd be laying down, and reading blocks in a  
>> contiguous manner on the outside of tracks of the disk.  it  
>> actually had a significant impact on the performance for us (at a  
>> slight storage penalty).
> I didn't know one had access to know where tracks are on the disk.   
> That the drive manufacturer could lay down tracks randomly  
> distributed across the disk if that helped them get the performance  
> specs they required.
>

yea this was achieved in the fdisk/parted phase of preparing the disks  
for a filesystem.  it took a little math, hard drive knowledge and  
testing to get the correct values here :)

>> a Battery Backup Unit on our RAID controller will further help with  
>> caching - and give you a little security in case of power failures  
>> etc.
>>
> Why does a BBU help with caching?  I understand that it allows a  
> write to finish from cache in the event of a power failure, but I  
> didn't know it could help with performance, or did I misunderstand.

sorry, I should have been more clear.  The cache should help with  
performance of writes, as your disk subsystem can return an file  
handle of a write when it is in the BBU cache rather than waiting for  
the bits to hit the disk itself.

>> also - don't forget about tuning your NFS client options.  use  
>> large read and write block sizes; think about using async writes if  
>> your data isn't *that* important <grin>.  and if you can use jumbo  
>> frames use them - that'll help both the client and server.
>>
> Thanks for the tips.  We could do some async writes but then would  
> need some integrity checks.  This is financial data so someone cares  
> about every number :)

oh yea - then i'd stay away from async writes then :)

-p
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.nycbug.org/pipermail/talk/attachments/20090504/2ff21dbc/attachment.html>


More information about the talk mailing list