[nycbug-talk] FreeBSD software RAID?

Thu Mar 8 00:28:10 EST 2007

>>>>> "sk" == Steven Kreuzer <skreuzer at f2o.org> writes:
>>>>> "gr" == George Rosamond <george at ceetonetechnology.com> writes:

    sk> RAID 3 is that it generally cannot service multiple requests
    sk> simultaneously. [...]  I would stay the hell away from RAID 3,
    sk> both hardware and software implementations

well, I'd stay the hell away from something because it can lose data
or cause filesystem corruption during a power loss that wouldn't
have happened to a filesystem on a single disk.

RAID5 can do this when there is no NVRAM because of the ``RAID5 write
hole''.  RAID3 doesn't have this hole in FreeBSD (AIUI), because the
UFS blocksize is increased from 512 to cover an entire stripe.

The write hole would be the sort of reason that would trigger ``stay
the hell away from'' in my mind, not so much ``it's slower for
seek-heavy workloads.''  that's my reason for saying RAID3, and I
think also the reason FreeBSD bothered to implement it (and implement
the variable blocksize for UFS to go on top of it).  but I'm not sure.
I hope I'm not leading you wrong.  It makes sense to me, but the
manual pages are so short, and there isn't an easy way to just test
the ideas rather than reading all my ranty speculations.

    gr> That would be my position Andy. . . I would rather just stick
    gr> to hardware.

yeah maybe.  again, I'm heavy on ranting and short on experience, but
at least going from my _friends'_ experience with hardware RAID, I
intend to stay the hell away from any RAID-on-a-card, period.

First, many of them don't have an NVRAM.  Some have something they
call an NVRAM, but they use it to store metadata, not for a write
cache to plug the RAID5 write hole.  This is the whole reason for
doing hardware RAID: to get that NVRAM to fix the RAID5 write hole.

Second, there are too many horror stories of RAID cards losing entire
arrays.  The card goes bad or gets confused.  It's part of Dell's
card-of-the-month club, and a replacement card is unobtainable, and
new cards won't work with the array.  Or the array's metadata was
stored on the old card's so-called-but-not-really NVRAM, so the new
card understands the old array but won't recognize it.  or the
configurator tool is clunky and buggy and won't give back your array,
or there's more than one configurator like one in BIOS and one in DOS
and one in Windows, and only one tool works and the others are decoys,
or whatever.

With software RAID, you can back up your metadata on _paper_ if you
want to, and type it in by hand---the array will still work.  If
you're concerned about your method of paper backup, you can test it on
a non-live filesystem.  Deliberately delete/confuse your metadata, and
force-recreate the array, see if it passes fsck and 'pax -r . >
/dev/null'.  Keep trying until you have a written procedure that
works.  Label the physical disks with their names on the sheet of
paper (so you've recorded their stripe ordering).  so there is less
possibility software RAID will refuse to see your array because some
little pointer block got mangled, than with the card-RAID.  And you
don't have to worry about multiple opaque configurator tools---there's
just one, and it's native to the OS, and it's available on the
LiveCD/installCD/whatever.

With software RAID, there's no concern about not being able to obtain
a card that matches the array structure.  Even if geom changes its
structure, you can more easily document which version of FreeBSD you
used than which Dell card-of-the-month they shipped.  And you can
always obtain that old version of FreeBSD at any time in the future.

Software RAID thus solves all the ``second'' problems with
RAID-on-a-card, if you are a good sysadmin, or has them worse than
ever if you're a bad one.  And RAID3 instead of RAID5 solves the First
problems with RAID-on-a-card, as I understand it.

I'm sure a bunch of people can chime in and say ``I've used
RAID-on-a-card, and I can't stress enough how close to zero is the
number of problems I've had with it.  It is really close to zero.
It's so unbelieveably close to zero, it IS zero, so I think it must be
very trustworthy.''  Well, that's great, I'm just saying I've heard
more than one story from someone who HAS had some stupid problem with
some expensive RAID-on-a-card that they really shouldn't be having.

so basically it all sucks. :)

Honestly if what you want is a ``backup'' I would do nightly rsync,
maybe with some kind of sanity-check.  mirroring is more for
continuity, when you don't want to lose availability when a disk fails
(even then it's a little hard to make it live up to its promise
because a slowly failing disk will start taking 30 seconds instead of
30 milliseconds to answer requests---it stays in the array but slows
your machine to 1/1000th speed, so you call it ``crashed''.  The bad
disk ``crashed'' my machine.).  or mirroring for speed, if you want
the seek bandwidth of an extra spindle for reads.  not so much for
backup, IMHO, but definitely not worthless for that purpose I guess,
and used successfully by a few friends who saw disk failures.  

A mirror is also very nice for snapshots.  You can break the mirror,
do something dangerous, and then resync it only if you succeed.
Sometimes either side of the mirror is bootable, so that's extremely
nice.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 304 bytes
Desc: not available
URL: <http://lists.nycbug.org/pipermail/talk/attachments/20070308/d2e990cf/attachment.bin>