[nycbug-talk] ZFS and firewire - conditions for a perfect storm

Isaac Levy ike at lesmuug.org
Mon Jun 30 19:42:54 EDT 2008

So, I think I'm coming to a modified marketing slogan for ZFS.

"ZFS Likes Cheap Disks, especially SATA/PATA, not so hot for firewire,  
and who knows about USB",

On Jun 30, 2008, at 4:25 PM, Miles Nordin wrote:
>>>>>> "il" == Isaac Levy <ike at lesmuug.org> writes:
>    il> [root at blackowl /usr/home/ike]# zpool export Z cannot unmount
>    il> '/Z/shared': Device busy
> maybe this is the freebsd version of 'no valid replicas', the generic
> banging-head-against-wall message Solaris gives you when it's trying
> to ``protect'' you from doing something ``dumb'' like actually fixing
> your fucked-up array.
> you can try erasing zpool.cache and then 'import -f'.
>    il> - Or, sometimes it just hangs like I described previously.

Cool- thx for the heads-up on this approach, I'm learning a lot more  
about ZFS...  (stuff I didn't necessarily want to know :)

However, for the record here, I just tried unplugging a drive as  
before (to bring on a disk I/O hang), deleted the zpool.cache, and  
tried 'import -f' - and it's all just hung.

The OS keeps chugging along nicely though, (UFS2 on an internal disk).

/me sighs, reboots, and starts fresh again...

> I find 'zpool status' hangs a lot.  A status command should never
> never never cause disk I/O or touch anything that could
> uninterruptable-sleep.  Especially, a system-wide status command needs
> to not hang because one pool is messed up, any more than it's
> acceptable for failures in one pool to impact availability of the
> whole ZFS subsystem (which AFAIK they correctly don't spill over, in
> terms of stable/fast filesystem access to pools other than the one
> with problems.  but for 'zpool status', they do, so if you consider
> the zpool command part of the ZFS subsystem then they do spillover.)
>    il> + Again, and after digging around lists online, this one leads
>    il> me to believe that the only people who've done a great job
>    il> implementing firewire is Apple, (it's theirs to begin with).

Oy- you are correct here Miles!

On an Apple machine, using a firewire disk, after installing  
smartmontools, I can't get even a lick of info out of the firewire  

plumb:~ ike$ smartctl -a disk8
smartctl version 5.38 [i386-apple-darwin9.3.0] Copyright (C) 2002-8  
Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Smartctl open device: disk8 failed: Operation not supported by device
plumb:~ ike$

And using apple's diskutil, good stuff like SMART isn't supported:

plumb:~ ike$ diskutil info disk8
    Device Identifier:        disk8
    Device Node:              /dev/disk8
    Part Of Whole:            disk8
    Device / Media Name:      WiebeTech

    Volume Name:
    Mount Point:

    Partition Type:           GUID_partition_scheme
    Bootable:                 Not bootable
    Media Type:               Generic
    Protocol:                 FireWire
    SMART Status:             Not Supported

    Total Size:               931.5 Gi (1000204886016 B) (1953525168  
512-byte blocks)
    Free Space:               0.0 B (0 B) (0 512-byte blocks)

    Read Only:                No
    Ejectable:                Yes
    Whole:                    Yes
    Internal:                 No
    OS 9 Drivers:             No
    Low Level Format:         Not Supported

plumb:~ ike$

Wow.  Firewire is kindof making me sad.

> I just tried it, and smartctl doesn't work for me over firewire on
> Apple either.  I'm using the smartctl in NetBSD pkgsrc and Mac OS
> 10.5.3.  I think it's a limitation of the firewire bridge chip, not
> the OS's driver stack.  well...it is a limitation fo the OS stack in
> that there's no defined way to pass the commands through the bridge,
> so the OS doesn't implement them, but the real limitation is in the
> bridge chip and the standards that define how they should work.
> i think.  It's odd that DVD burners ``just work'' i guess.  but...i
> bet, for example, those special commands one can send to Lite-On
> drives to make them rpc1 so dvdbackukp works better, would not pass
> through a firewire bridge.  untested though.
> of course the error reporting stuff may be a different story, may
> actually be firewire stack problems, but again I would expect the case
> to interfere with error reporting and some cases to handle disks going
> bad better than others.
>    il> -- I believe for any future growth at home, I'll simply start
>    il> thinking towards using SATA and known good controllers,
>    il> (Areca, 3ware, Adaptec, etc...).
> from what I've heard/understood, be sure to get a battery because it's
> necessary for correctness, not just for speed.  Otherwise you need to
> do RAID3 which means you need a filesystem that supports large sector
> sizes which you don't have.

Ah- well, it depends on the controller- whole other thing.
I meant that I'd snag some fairly inexpensive and well supported SATA  
cards with lots of ports, and use them for ZFS volumes- and ditch  
firewire.  ZFS doesn't seem to have these gross problems at all with  
the SATA stuff I've used- (Areca, Adaptec, 3Ware).

And yeah I agree- don't skimp on the batteries for a given controller  
if you use it for hardware RAID :)

> Another thing to worry about with this RAID-on-a-card crap is
> controllers going bad.  If I were using such a controller rather than
> ZFS, I'd buy a spare controller and put it on the shelf (in case the
> model which understands my RAID metadata goes out of production), and
> I'd test the procedure for moving disks from one controller to another
> BEFORE the controller breaks, and BEFORE putting any data on the
> raidset.

Buying cards to put on the shelf is actually a plan I've put in action  
several times in recent years- (after getting stuck with ancient and  
irreplaceable Compaq cards going bad...)

A trend I like seeing recently, which changes this game, is that  
Supermicro and Tyan server motherboards are coming with 8 SATA ports  
onboard, with something like an LSI card built-in.  For the 1u high- 
density boxes I tend to deploy for jobs, they get deployed in pairs or  
triples- and usually some component failure happens either immediately  
(warranty replacement) or well after the working life of the machines  
is past (3-4 yrs).  I've rarely seen the machines/cards/etc fail in  
the middle space, but that's just my experiences...

>    il> Yeah, I think ZFS is the future too- and is simply a matter of
>    il> time and maturing.
> yeah, but it's really not maturing very quickly at all compared to
> SVM, LVM2, ext3, HFS+, netapp/emc/vendorware storage stuff, or
> basically anything at all that's not dead-in-the-water abandonware
> like FFS/LFS/RAIDframe.  It seems to be maturing at about the same
> speed as Lustre, which is too fucking slow.  I don't know what the
> hell they _are_ working on, besides this stability stuff.  If I had a
> Sun support contract I'd have opened at least five big fat bugs and
> would be pestering them monthly for patches.  There are known
> annoying/unacceptable problems they are not fixing after over two
> years.  When Solaris 11 ships it is still oging to be rickety flakey
> bullshit.  It's not exactly a disappointment, but it IS flakey
> bullshit.

Hrmph.  Yeah, I do worry about things maturing fast enough to stay  
alive long term.  With disks, buggy crap like this have to go away  
really FAST or else users will...


More information about the talk mailing list