[nycbug-talk] ZFS and firewire - conditions for a perfect storm
ike at lesmuug.org
Mon Jun 30 19:42:54 EDT 2008
So, I think I'm coming to a modified marketing slogan for ZFS.
"ZFS Likes Cheap Disks, especially SATA/PATA, not so hot for firewire,
and who knows about USB",
On Jun 30, 2008, at 4:25 PM, Miles Nordin wrote:
>>>>>> "il" == Isaac Levy <ike at lesmuug.org> writes:
> il> [root at blackowl /usr/home/ike]# zpool export Z cannot unmount
> il> '/Z/shared': Device busy
> maybe this is the freebsd version of 'no valid replicas', the generic
> banging-head-against-wall message Solaris gives you when it's trying
> to ``protect'' you from doing something ``dumb'' like actually fixing
> your fucked-up array.
> you can try erasing zpool.cache and then 'import -f'.
> il> - Or, sometimes it just hangs like I described previously.
Cool- thx for the heads-up on this approach, I'm learning a lot more
about ZFS... (stuff I didn't necessarily want to know :)
However, for the record here, I just tried unplugging a drive as
before (to bring on a disk I/O hang), deleted the zpool.cache, and
tried 'import -f' - and it's all just hung.
The OS keeps chugging along nicely though, (UFS2 on an internal disk).
/me sighs, reboots, and starts fresh again...
> I find 'zpool status' hangs a lot. A status command should never
> never never cause disk I/O or touch anything that could
> uninterruptable-sleep. Especially, a system-wide status command needs
> to not hang because one pool is messed up, any more than it's
> acceptable for failures in one pool to impact availability of the
> whole ZFS subsystem (which AFAIK they correctly don't spill over, in
> terms of stable/fast filesystem access to pools other than the one
> with problems. but for 'zpool status', they do, so if you consider
> the zpool command part of the ZFS subsystem then they do spillover.)
> il> + Again, and after digging around lists online, this one leads
> il> me to believe that the only people who've done a great job
> il> implementing firewire is Apple, (it's theirs to begin with).
Oy- you are correct here Miles!
On an Apple machine, using a firewire disk, after installing
smartmontools, I can't get even a lick of info out of the firewire
plumb:~ ike$ smartctl -a disk8
smartctl version 5.38 [i386-apple-darwin9.3.0] Copyright (C) 2002-8
Home page is http://smartmontools.sourceforge.net/
Smartctl open device: disk8 failed: Operation not supported by device
And using apple's diskutil, good stuff like SMART isn't supported:
plumb:~ ike$ diskutil info disk8
Device Identifier: disk8
Device Node: /dev/disk8
Part Of Whole: disk8
Device / Media Name: WiebeTech
Partition Type: GUID_partition_scheme
Bootable: Not bootable
Media Type: Generic
SMART Status: Not Supported
Total Size: 931.5 Gi (1000204886016 B) (1953525168
Free Space: 0.0 B (0 B) (0 512-byte blocks)
Read Only: No
OS 9 Drivers: No
Low Level Format: Not Supported
Wow. Firewire is kindof making me sad.
> I just tried it, and smartctl doesn't work for me over firewire on
> Apple either. I'm using the smartctl in NetBSD pkgsrc and Mac OS
> 10.5.3. I think it's a limitation of the firewire bridge chip, not
> the OS's driver stack. well...it is a limitation fo the OS stack in
> that there's no defined way to pass the commands through the bridge,
> so the OS doesn't implement them, but the real limitation is in the
> bridge chip and the standards that define how they should work.
> i think. It's odd that DVD burners ``just work'' i guess. but...i
> bet, for example, those special commands one can send to Lite-On
> drives to make them rpc1 so dvdbackukp works better, would not pass
> through a firewire bridge. untested though.
> of course the error reporting stuff may be a different story, may
> actually be firewire stack problems, but again I would expect the case
> to interfere with error reporting and some cases to handle disks going
> bad better than others.
> il> -- I believe for any future growth at home, I'll simply start
> il> thinking towards using SATA and known good controllers,
> il> (Areca, 3ware, Adaptec, etc...).
> from what I've heard/understood, be sure to get a battery because it's
> necessary for correctness, not just for speed. Otherwise you need to
> do RAID3 which means you need a filesystem that supports large sector
> sizes which you don't have.
Ah- well, it depends on the controller- whole other thing.
I meant that I'd snag some fairly inexpensive and well supported SATA
cards with lots of ports, and use them for ZFS volumes- and ditch
firewire. ZFS doesn't seem to have these gross problems at all with
the SATA stuff I've used- (Areca, Adaptec, 3Ware).
And yeah I agree- don't skimp on the batteries for a given controller
if you use it for hardware RAID :)
> Another thing to worry about with this RAID-on-a-card crap is
> controllers going bad. If I were using such a controller rather than
> ZFS, I'd buy a spare controller and put it on the shelf (in case the
> model which understands my RAID metadata goes out of production), and
> I'd test the procedure for moving disks from one controller to another
> BEFORE the controller breaks, and BEFORE putting any data on the
Buying cards to put on the shelf is actually a plan I've put in action
several times in recent years- (after getting stuck with ancient and
irreplaceable Compaq cards going bad...)
A trend I like seeing recently, which changes this game, is that
Supermicro and Tyan server motherboards are coming with 8 SATA ports
onboard, with something like an LSI card built-in. For the 1u high-
density boxes I tend to deploy for jobs, they get deployed in pairs or
triples- and usually some component failure happens either immediately
(warranty replacement) or well after the working life of the machines
is past (3-4 yrs). I've rarely seen the machines/cards/etc fail in
the middle space, but that's just my experiences...
> il> Yeah, I think ZFS is the future too- and is simply a matter of
> il> time and maturing.
> yeah, but it's really not maturing very quickly at all compared to
> SVM, LVM2, ext3, HFS+, netapp/emc/vendorware storage stuff, or
> basically anything at all that's not dead-in-the-water abandonware
> like FFS/LFS/RAIDframe. It seems to be maturing at about the same
> speed as Lustre, which is too fucking slow. I don't know what the
> hell they _are_ working on, besides this stability stuff. If I had a
> Sun support contract I'd have opened at least five big fat bugs and
> would be pestering them monthly for patches. There are known
> annoying/unacceptable problems they are not fixing after over two
> years. When Solaris 11 ships it is still oging to be rickety flakey
> bullshit. It's not exactly a disappointment, but it IS flakey
Hrmph. Yeah, I do worry about things maturing fast enough to stay
alive long term. With disks, buggy crap like this have to go away
really FAST or else users will...
More information about the talk