[nycbug-talk] ZFS and firewire - conditions for a perfect storm
Isaac Levy
ike at lesmuug.org
Mon Jun 30 11:24:44 EDT 2008
Hi Miles,
Thanks for your input!,
On Jun 30, 2008, at 3:23 AM, Miles Nordin wrote:
>>>>>> "il" == Isaac Levy <ike at lesmuug.org> writes:
>
> il> 1) The firewire bus could possibly be loosing track of which
> il> device is which- and confusing ZFS. In my daisy-chain setup,
> il> when one drive in the chain dies, (say, da2), and it's removed
> il> from the chain, it seems to become the previous drive
> il> (e.g. da1).
>
> zpool export ; zpool import
>
> I think that will ``just work.''
Ah-ha- good thinking, but I did indeed try that, again I believe they
suffer from the firewire-induced effects:
- When all firewire disks are online and obviously healthy, 'zpool
export' and 'zpool import' work as expected.
- When one (any) device is offline and obviously dead, 'zpool export'
gives me:
[root at blackowl /usr/home/ike]# zpool export Z
cannot unmount '/Z/shared': Device busy
[root at blackowl /usr/home/ike]#
- Or, sometimes it just hangs like I described previously.
I haven't been able to reliably isolate different causes failed
behaviors, drive order or failure order doesn't seem to give me one or
the other, it just truly seems random which failure I get.
+ Which leads me to believe further that this is an issue with
firewire driver event notifications in the kernel.
>
>
> il> (Anyone know about OpenSolaris/Firewire/ ZFS? How's that for
> il> esoteric :)
>
> yeah, I used this. I've used mirrors only, no raidz2.
No kidding! Cool- I'm not surprised with you Miles :)
>
>
> * I haven't fooled around with any of that OpenSolaris or Nexenta
> stuff. I've used only Solaris 10 U<n> and various SXCE builds.
>
> * non-Oxford-911 case that I had, the case would crash. The case had
> to be rebooted. This was confusing because for a while I thought
> the driver/OS was messed up.
>
> * ZFS could handle a case crashing during use, but ZFS had problems
> if a case crashed during a scrub.
>
> * error reporting through the firewire bridge is not always
> fantastic, and smartctl would not pass through, so diagnosing
> failing disks is significantly harder when they're inside firewire
> cases.
Gah- I have the same frustrating problem with firewire, when using
smartctl from FreeBSD on the firewire drives.
+ Again, and after digging around lists online, this one leads me to
believe that the only people who've done a great job implementing
firewire is Apple, (it's theirs to begin with).
It makes me somewhat sad, firewire has been SO RELIABLE and flexible
on OSX systems for years... and now it's cheaper gear than ever.
>
>
> * for mirrors, ZFS wasn't great about remembering that the mirror was
> dirty and needed resyncing. If I rebooted during a resync, it
> wouldn't continue where it left off, and wouldn't start over---it
> would just quit trying to resync and accumulate checksum errors.
> The resync, when it did complete, often wasn't adequate to stop a
> stream of ``checksum errors'' over the next few weeks---I had to
> manually request a zpool scrub if half the mirror ever bounced.
Yikes. That's kindof unacceptable behavior for a system one wishes to
trust.
I think most people would agree, filesystems simply *must* be the most
refined, reliable, and unchanging part of any system.
>
>
> Because of some of these problems and cost, I've moved to
> ZFS-over-iSCSI. It's very slow and has problems still, but works
> better than the firewire did for me.
Digit. For now, since I'm starting from scratch, I've split my
firewire disks up into 3 machines, so I have a 3rd backup for the
future.
Against my best wishes, I'll keep using the Apple machines for
storage- the Journaling of HFS+ (Case-sensitive!) is a well-trusted
and easy path for disks which *I'll never have to fsck* - my most
necessary feature on multi-TB systems- (especially at home, where my
time hacking other stuff and using my data is precious).
For production/work systems, the Apple gear doesn't meet most needs on
more critical levels- and I can gladly accept fsck and use UFS there
in most applications.
--
I believe for any future growth at home, I'll simply start thinking
towards using SATA and known good controllers, (Areca, 3ware, Adaptec,
etc...). Sad part here is that this means no I won't be able to use
old laptops or mini-pc's as (slow but silent) file servers, which have
worked out very nicely in my tiny apartment. I wince at the thought
of having to drop cash on silent pc gear- yuck.
>
>
> I think ZFS is the Future, but the more I use it the less confidence I
> have in it.
Yeah, I think ZFS is the future too- and is simply a matter of time
and maturing.
I think it's biggest enemy right now is complexity- it's a very
feature packed filesystem for users (why it's so cool!!!!!), but I
don't see this as any different than the history of UFS or my history
with HFS/+, all the filesystems I've trusted over the years have had
their features boiled down to extremely simple and reliable defaults-
from a user perspective.
ZFS still seems to have a foot in the zone between developers and users.
For UFS, ACL's, heck- softupdates (1999), and all the tunable features
seem to have taken years to work out and become the trusted media we
know and love now.
/me sighs and goes back to other hacking...
Rocket-
.ike
More information about the talk
mailing list