[nycbug-talk] FreeBSD software RAID?

Peter Wright pete at nomadlogic.org
Thu Mar 8 16:33:38 EST 2007

>>>>>> "pw" == Peter Wright <pete at nomadlogic.org> writes:
>     pw> i think you may be thinking about your software raid
>     pw> configuration data here not metadata...
> yeah, I am.  The meaning of the word ``meta'' is very flexible.  The
> way I've used it _is_ appropriate---in Solaris SVM for example:
>      metadb - create and delete replicas of the metadevice  state
>      database
> (which is where array geometry and state is stored)


>     pw> most decent controllers offer BBU's (Battery Backup Units)
>     pw> which not only allow better I/O rates but also help prevent
>     pw> loss of data during a catastrophic
> well the BBU's plug the RAID5 write hole, so long as they're not
> separated from the disks that make up the array, as they would be if
> for example the hardware RAID controller card failed.

well that's why any hardware raid controller worth purchasing supports
BBU's and a write-back cache.  otherwise you would need to use synchronous
writes when mounting the filesystem, which for some people may be

> They are perhaps sold for speeding up databases and mailservers and
> (in the old days) NFSv2 servers that do a lot of fsync(), but for the
> purposes of this ``i wouldn't touch'' thread, again, it's the RAID5
> write hole that I care about, not performance.

pretty much any "enterprise" grade hardware RAID controller will use a BBU
and write-back cache not only for data integrity reasons but also for the
performance gains.

> The need for them to achieve the illusion of the correct behavior of a
> single disk is the reason I think software RAID5 is, AIUI, a bad idea.
>     pw> You think vendors like NetApp/EMC/IBM/etc. use software to
>     pw> implement low level RAID functionality?
> That's a funny statement, but I know what you mean.

sorta, i was referring to the fact that most hardware raid controllers
will calculate parity etc. on the ASIC which is independent from the OS. 
hope that clarifies my intent.

this is an important distinction: if a RAID implementation is tied to the
OS, then any interruption to the OS increases risk a data corruption.  by
offloading this to a ASIC with a BBU one mitigates this risk by allowing
data in caches to be sync'd to disk regardless of the sate of the OS.

> In any case I think we agree on this so far as: my criticisms apply to
> RAID-on-a-card only.  The SAN vendors do all have NVRAM that fixes the
> RAID5 write hole, but the cards often don't.  Even cards that say ``we
> have NVRAM!'' often don't have what the SAN vendors call NVRAM, and
> that bugs me a lot because they are basing their business on trying to
> confuse people rather than on building trust, which I think is quite
> wrong in this space.

hmm...i guess i'm just not sure what you mean by "RAID-on-a-card".  i'll
also have to look into the "RAID5 write hole" as that's a new term for me
as well.

>     pw> i'd be willing to bet any problems people have had with
>     pw> hardware RAID may have been due to misconfiguration of the
>     pw> array itself, or a misunderstanding about the fundamentals of
>     pw> configuring RAID.
> ...well...I think there's a misunderstanding about the fundamental
> problem of losing your array because you are not able to order the
> model of hardware RAID controller that matches your metadata, or not
> being able to safely backup this metadata or move it from one card to
> another without a lot of hesitant, ominous key-pecks in some clunky
> BIOS Blue Screen of Setup.  (RAID metadata, not filesystem metadata)

call it what it is, configuration data...not data-about-data.  i still
don't see your point, although i recon at this point we are beyond
splitting hairs.



Peter Wright
pete at nomadlogic.org

More information about the talk mailing list