[nycbug-talk] FreeBSD PseudoRAID RAID0 array broken on atapci1: <Intel ICH5 SATA150 controller>

Yarema yds at CoolRat.org
Tue Sep 25 18:16:40 EDT 2007


--On Tuesday, September 25, 2007 8:49 AM +0200 Søren Schmidt 
<sos at deepcore.dk> wrote:

> Yarema wrote:
>> Hi, I need some help recovering from this.  First some back story.
>> Running 6.2-STABLE i386 from Sep 17, 2007.  My /home slice is mounted
>> from /dev/ar0s1e where the relevant kernel messages look like so when
>> all is good:
>>
>> atapci1: <Intel ICH5 SATA150 controller>
>> ata2: <ATA channel 0> on atapci1
>> ata3: <ATA channel 1> on atapci1
>> ad4: 381554MB <WDC WD4000YR-01PLB0 01.06A01> at ata2-master SATA150
>> ad6: 381554MB <WDC WD4000YR-01PLB0 01.06A01> at ata3-master SATA150
>> ar0: 763108MB <FreeBSD PseudoRAID RAID0 (stripe 256 KB)> status: READY
>> ar0: disk0 READY using ad4 at ata2-master
>> ar0: disk1 READY using ad6 at ata3-master
>>
>> Today this server crashed with the following loggeed:
>>
>> ad4: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=144888320
>> ad4: TIMEOUT - READ_DMA retrying (1 retry left) LBA=143390319
>> ad4: FAILURE - device detached
>> ar0: FAILURE - RAID0 array broken
>> subdisk4: detached
>> ad4: detached
>> g_vfs_done():ar0s1e[WRITE(offset=146002964480, length=2048)]error = 5
>> initiate_write_filepage: already started
>> g_vfs_done():ar0s1e[WRITE(offset=146002964480, length=2048)]error = 5
>> g_vfs_done():ar0s1e[WRITE(offset=6144000, length=16384)]error = 5
>> g_vfs_done():ar0s1e[WRITE(offset=6160384, length=16384)]error = 5
>> g_vfs_done():ar0s1e[WRITE(offset=6176768, length=16384)]error = 5
>> g_vfs_done():ar0s1e[WRITE(offset=6193152, length=16384)]error = 5
>> g_vfs_done():ar0s1e[WRITE(offset=6209536, length=2048)]error = 5
>> g_vfs_done():ar0s1e[WRITE(offset=65536, length=2048)]error = 5
>> g_vfs_done():ar0s1e[WRITE(offset=147801325568, length=12288)]error = 5
>> g_vfs_done():ar0s1e[WRITE(offset=147142686720, length=2048)]error = 5
>> g_vfs_done():ar0s1e[WRITE(offset=65536, length=2048)]error = 5
>> g_vfs_done():ar0s1e[WRITE(offset=6144000, length=16384)]error = 5
>> g_vfs_done():ar0s1e[WRITE(offset=6160384, length=16384)]error = 5
>> g_vfs_done():ar0s1e[WRITE(offset=6176768, length=16384)]error = 5
>> g_vfs_done():ar0s1e[WRITE(offset=6193152, length=16384)]error = 5
>> g_vfs_done():ar0s1e[WRITE(offset=6209536, length=2048)]error = 5
>> g_vfs_done():ar0s1e[WRITE(offset=146831867904, length=16384)]error = 5
>> g_vfs_done():ar0s1e[WRITE(offset=147024330752, length=16384)]error = 5
>> initiate_write_filepage: already started
>> g_vfs_done():ar0s1e[WRITE(offset=146002964480, length=2048)]error = 5
>> initiate_write_filepage: already started
>> g_vfs_done():ar0s1e[WRITE(offset=146002964480, length=2048)]error = 5
>> initiate_write_filepage: already started
>> g_vfs_done():ar0s1e[WRITE(offset=147801325568, length=12288)]error = 5
>> initiate_write_filepage: already started
>> g_vfs_done():ar0s1e[WRITE(offset=147142686720, length=2048)]error = 5
>>
>> Now the kernel messages read:
>>
>> ar0: FAILURE - RAID0 array broken
>> ar0: 763108MB <FreeBSD PseudoRAID RAID0 (stripe 256 KB)> status: BROKEN
>> ar0: disk0 READY using ad4 at ata2-master
>> ar0: disk1 DOWN no device found for this subdisk
>> ar1: 763108MB <FreeBSD PseudoRAID RAID0 (stripe 256 KB)> status: BROKEN
>> ar1: disk0 DOWN no device found for this subdisk
>> ar1: disk1 READY using ad6 at ata3-master
>>
>> For some reason the second disk in the array shows up as ar1 instead
>> of being part of ar0.  I suspect there's gotta be some way to force
>> the two drives to show up as part of the same array by perhaps editing
>> the PseudoRAID metadata on disk without putting any of the UFS2 data
>> in "jeopardy".  Any pointers on where to start poking around for the
>> relevant metadata structures on disk or what to search for?  I figure
>> if I can dd the metadata off the disks, tweak a field or two and then
>> dd the whole mess back I stand a chance of either hosing the array
>> irrevocably or getting it all back. ;)  Or maybe atacontrol could be
>> used to re-create the metadata without destroying the UFS2 on the
>> array?  I have a coredump of the kernel from this crash if that helps
>> analyze things any.
>>
>
> The solution to getting the array back is to "atacontrol delete ar0"
> "atacontrol delete ar1" "atacontrol create stripe 512 ad4 ad6" and
> the array is reborn.
>  However your filesystems might be just a bunch of bits depending
> on how much of the failed write that made it in there, you get the
> (missing) protection you asked for using RAID0....

Søren,

Thank you for your prompt and helpful reply.  I'm running into an new 
situation with atacontrol:

% atacontrol create RAID0 512 ad4 ad6
ar0: 763108MB <Intel MatrixRAID RAID0 (stripe 128 KB)> status: READY
ar0: disk0 READY using ad4 at ata2-master
ar0: disk1 READY using ad6 at ata3-master

Note that the original RAID0 which broke was
ar0: 763108MB <FreeBSD PseudoRAID RAID0 (stripe 256 KB)> status: READY

Now atacontrol will not create FreeBSD PseudoRAID metadata with a 256KB 
stripe, but insists on creating Intel MatrixRAID metadata with a 128KB 
stripe.  This is on a non-R version of the ICH5 southbridge.  So there's no 
way to enable/disable the Intel MatrixRAID from the BIOS.  Nor is there any 
way to change the stripe size in the BIOS since there is no Intel 
MatrixRAID BIOS on this motherboard.  The computer in question is a Dell 
SC400 with an Intel OEM motherboard which has a very limited BIOS Setup 
interface typical of Intel/Dell.

Is there any way to force atacontrol to create FreeBSD PseudoRAID metadata? 
Perhaps using an older FreeSBIE release based on FreeBSD 6.0 since IIRC I 
created this RAID0 back when 6.0 was CURRENT.

-- 
Yarema



More information about the talk mailing list