[nycbug-talk] ZFS and firewire - conditions for a perfect storm

Mon Jun 30 19:42:54 EDT 2008

So, I think I'm coming to a modified marketing slogan for ZFS.

"ZFS Likes Cheap Disks, especially SATA/PATA, not so hot for firewire,  
and who knows about USB",

On Jun 30, 2008, at 4:25 PM, Miles Nordin wrote:
>>>>>> "il" == Isaac Levy <ike at lesmuug.org> writes:
>
>    il> [root at blackowl /usr/home/ike]# zpool export Z cannot unmount
>    il> '/Z/shared': Device busy
>
> maybe this is the freebsd version of 'no valid replicas', the generic
> banging-head-against-wall message Solaris gives you when it's trying
> to ``protect'' you from doing something ``dumb'' like actually fixing
> your fucked-up array.
>
> you can try erasing zpool.cache and then 'import -f'.
>
>    il> - Or, sometimes it just hangs like I described previously.

Cool- thx for the heads-up on this approach, I'm learning a lot more  
about ZFS...  (stuff I didn't necessarily want to know :)

However, for the record here, I just tried unplugging a drive as  
before (to bring on a disk I/O hang), deleted the zpool.cache, and  
tried 'import -f' - and it's all just hung.

The OS keeps chugging along nicely though, (UFS2 on an internal disk).

/me sighs, reboots, and starts fresh again...

>
>
> I find 'zpool status' hangs a lot.  A status command should never
> never never cause disk I/O or touch anything that could
> uninterruptable-sleep.  Especially, a system-wide status command needs
> to not hang because one pool is messed up, any more than it's
> acceptable for failures in one pool to impact availability of the
> whole ZFS subsystem (which AFAIK they correctly don't spill over, in
> terms of stable/fast filesystem access to pools other than the one
> with problems.  but for 'zpool status', they do, so if you consider
> the zpool command part of the ZFS subsystem then they do spillover.)
>
>    il> + Again, and after digging around lists online, this one leads
>    il> me to believe that the only people who've done a great job
>    il> implementing firewire is Apple, (it's theirs to begin with).

Oy- you are correct here Miles!

On an Apple machine, using a firewire disk, after installing  
smartmontools, I can't get even a lick of info out of the firewire  
drive:

plumb:~ ike$ smartctl -a disk8
smartctl version 5.38 [i386-apple-darwin9.3.0] Copyright (C) 2002-8  
Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Smartctl open device: disk8 failed: Operation not supported by device
plumb:~ ike$

--
And using apple's diskutil, good stuff like SMART isn't supported:

plumb:~ ike$ diskutil info disk8
    Device Identifier:        disk8
    Device Node:              /dev/disk8
    Part Of Whole:            disk8
    Device / Media Name:      WiebeTech

    Volume Name:
    Mount Point:

    Partition Type:           GUID_partition_scheme
    Bootable:                 Not bootable
    Media Type:               Generic
    Protocol:                 FireWire
    SMART Status:             Not Supported

    Total Size:               931.5 Gi (1000204886016 B) (1953525168  
512-byte blocks)
    Free Space:               0.0 B (0 B) (0 512-byte blocks)

    Read Only:                No
    Ejectable:                Yes
    Whole:                    Yes
    Internal:                 No
    OS 9 Drivers:             No
    Low Level Format:         Not Supported

plumb:~ ike$

--
Wow.  Firewire is kindof making me sad.

>
>
> I just tried it, and smartctl doesn't work for me over firewire on
> Apple either.  I'm using the smartctl in NetBSD pkgsrc and Mac OS
> 10.5.3.  I think it's a limitation of the firewire bridge chip, not
> the OS's driver stack.  well...it is a limitation fo the OS stack in
> that there's no defined way to pass the commands through the bridge,
> so the OS doesn't implement them, but the real limitation is in the
> bridge chip and the standards that define how they should work.
>
> i think.  It's odd that DVD burners ``just work'' i guess.  but...i
> bet, for example, those special commands one can send to Lite-On
> drives to make them rpc1 so dvdbackukp works better, would not pass
> through a firewire bridge.  untested though.
>
> of course the error reporting stuff may be a different story, may
> actually be firewire stack problems, but again I would expect the case
> to interfere with error reporting and some cases to handle disks going
> bad better than others.
>
>    il> -- I believe for any future growth at home, I'll simply start
>    il> thinking towards using SATA and known good controllers,
>    il> (Areca, 3ware, Adaptec, etc...).
>
> from what I've heard/understood, be sure to get a battery because it's
> necessary for correctness, not just for speed.  Otherwise you need to
> do RAID3 which means you need a filesystem that supports large sector
> sizes which you don't have.

Ah- well, it depends on the controller- whole other thing.
I meant that I'd snag some fairly inexpensive and well supported SATA  
cards with lots of ports, and use them for ZFS volumes- and ditch  
firewire.  ZFS doesn't seem to have these gross problems at all with  
the SATA stuff I've used- (Areca, Adaptec, 3Ware).

And yeah I agree- don't skimp on the batteries for a given controller  
if you use it for hardware RAID :)

>
>
> Another thing to worry about with this RAID-on-a-card crap is
> controllers going bad.  If I were using such a controller rather than
> ZFS, I'd buy a spare controller and put it on the shelf (in case the
> model which understands my RAID metadata goes out of production), and
> I'd test the procedure for moving disks from one controller to another
> BEFORE the controller breaks, and BEFORE putting any data on the
> raidset.

Buying cards to put on the shelf is actually a plan I've put in action  
several times in recent years- (after getting stuck with ancient and  
irreplaceable Compaq cards going bad...)

A trend I like seeing recently, which changes this game, is that  
Supermicro and Tyan server motherboards are coming with 8 SATA ports  
onboard, with something like an LSI card built-in.  For the 1u high- 
density boxes I tend to deploy for jobs, they get deployed in pairs or  
triples- and usually some component failure happens either immediately  
(warranty replacement) or well after the working life of the machines  
is past (3-4 yrs).  I've rarely seen the machines/cards/etc fail in  
the middle space, but that's just my experiences...

>
>
>    il> Yeah, I think ZFS is the future too- and is simply a matter of
>    il> time and maturing.
>
> yeah, but it's really not maturing very quickly at all compared to
> SVM, LVM2, ext3, HFS+, netapp/emc/vendorware storage stuff, or
> basically anything at all that's not dead-in-the-water abandonware
> like FFS/LFS/RAIDframe.  It seems to be maturing at about the same
> speed as Lustre, which is too fucking slow.  I don't know what the
> hell they _are_ working on, besides this stability stuff.  If I had a
> Sun support contract I'd have opened at least five big fat bugs and
> would be pestering them monthly for patches.  There are known
> annoying/unacceptable problems they are not fixing after over two
> years.  When Solaris 11 ships it is still oging to be rickety flakey
> bullshit.  It's not exactly a disappointment, but it IS flakey
> bullshit.

Hrmph.  Yeah, I do worry about things maturing fast enough to stay  
alive long term.  With disks, buggy crap like this have to go away  
really FAST or else users will...

Rocket-
.ike