[nycbug-talk] Reproducible data corruption on 6.1-Stable [Long but please read]

Jonathan Stewart jonathan at kc8onw.net
Wed Sep 13 19:00:52 EDT 2006


Hello all,

I set up a new server recently and transferred all the information from
my old server over.  I tried to use unison to synchronize the backup of
pictures I have taken and noticed that a large number of pictures where
marked as changed on the server.  After checking the pictures by hand I
confirmed that many of the pictures on the server were corrupted.  I
attempted to use unison to update the files on the server with the
correct local copies but it would fail on almost all the files with the
message "destination updated during synchronization."

It appears the corruption happens during the read process because when I
recompare the files in a graphical diff tool between cache flushes the
differences move around!?!?!?  The differences also appear to be very
small for the most part, single bytes scattered throughout the file.  I
really have no idea what is causing the problem and would like to pin it
down so I can either replace hardware if it's bad or fix whatever the
bug is.

The problem appears no matter how I read the file, unison, md5, etc.  1
out of maybe 100 times it will read correctly.  I have another drive
that I use for the OS and I have done many buildworlds/kernels without
problems on that drive as well as compiling some very large software
packages.  I'm wondering if a possible cause is the controller ignoring
read errors from the hard drive but I would think more than the
occasional single byte would be changed?

I cvsup-ed and rebuilt world and kernel recently hoping that it had been
fixed but with no luck. I have not seen any error messages on the
console at all either. I have a pair of 320GB SATA hard drives setup as
RAID0 on a HighPoint RocketRaid 1520 card.  The card BIOS is the latest
revision as is the motherboard BIOS.

This being a data corruption issue I can afford any amount of downtime
needed for trouble shooting as it's not very useful to have the server
up if everything is going to get corrupted.

I'm thinking about maybe trying to dd the file from the raw device in an
attempt to see if the problem is occurring in the filesystem code or is
lower level yet.  Any suggestions on how to locate the file on the disk
or how to isolate the problem better are welcome.  I don't mind doing
the work I just have no idea where to look/what to try next.

Thank you if you actually read all this :),
Jonathan

uname -a:
FreeBSD XXXXX 6.1-STABLE FreeBSD 6.1-STABLE #0: Sun Sep 10 22:54:17 EDT
2006     root at XXXXX:/usr/obj/usr/src/sys/SERVER  i386

dmesg:
Copyright (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD 6.1-STABLE #0: Sun Sep 10 22:54:17 EDT 2006
    root at XXXXX:/usr/obj/usr/src/sys/SERVER
mptable_probe: MP Config Table has bad signature: 4\^C\^_
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: AMD Athlon(tm) XP 3200+ (2090.16-MHz 686-class CPU)
  Origin = "AuthenticAMD"  Id = 0x6a0  Stepping = 0

Features=0x383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE>
  AMD Features=0xc0400800<SYSCALL,MMX+,3DNow+,3DNow>
real memory  = 1073676288 (1023 MB)
avail memory = 1041698816 (993 MB)
kbd1 at kbdmux0
ath_hal: 0.9.17.2 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
acpi0: <Nvidia AWRDACPI> on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x4008-0x400b on acpi0
cpu0: <ACPI CPU> on acpi0
acpi_button0: <Power Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
Correcting nForce2 C1 CPU disconnect hangs
agp0: <NVIDIA nForce2 AGP Controller> mem 0xd8000000-0xdbffffff at
device 0.0 on pci0
pci0: <memory, RAM> at device 0.1 (no driver attached)
pci0: <memory, RAM> at device 0.2 (no driver attached)
pci0: <memory, RAM> at device 0.3 (no driver attached)
pci0: <memory, RAM> at device 0.4 (no driver attached)
pci0: <memory, RAM> at device 0.5 (no driver attached)
isab0: <PCI-ISA bridge> at device 1.0 on pci0
isa0: <ISA bus> on isab0
pci0: <serial bus, SMBus> at device 1.1 (no driver attached)
ohci0: <OHCI (generic) USB controller> mem 0xe1085000-0xe1085fff irq 5
at device 2.0 on pci0
ohci0: [GIANT-LOCKED]
usb0: OHCI version 1.0, legacy support
usb0: <OHCI (generic) USB controller> on ohci0
usb0: USB revision 1.0
uhub0: nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 3 ports with 3 removable, self powered
ohci1: <OHCI (generic) USB controller> mem 0xe1082000-0xe1082fff irq 5
at device 2.1 on pci0
ohci1: [GIANT-LOCKED]
usb1: OHCI version 1.0, legacy support
usb1: <OHCI (generic) USB controller> on ohci1
usb1: USB revision 1.0
uhub1: nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 3 ports with 3 removable, self powered
ehci0: <NVIDIA nForce2 USB 2.0 controller> mem 0xe1083000-0xe10830ff irq
12 at device 2.2 on pci0
ehci0: [GIANT-LOCKED]
usb2: EHCI version 1.0
usb2: companion controllers, 4 ports each: usb0 usb1
usb2: <NVIDIA nForce2 USB 2.0 controller> on ehci0
usb2: USB revision 2.0
uhub2: nVidia EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub2: 6 ports with 6 removable, self powered
nve0: <NVIDIA nForce MCP2 Networking Adapter> port 0xe400-0xe407 mem
0xe1084000-0xe1084fff irq 12 at device 4.0 on pci0
nve0: Ethernet address 00:0c:6e:7d:e0:79
miibus0: <MII bus> on nve0
rlphy0: <RTL8201L 10/100 media interface> on miibus0
rlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
nve0: Ethernet address: 00:0c:6e:7d:e0:79
pci0: <multimedia, audio> at device 5.0 (no driver attached)
pci0: <multimedia, audio> at device 6.0 (no driver attached)
pcib1: <ACPI PCI-PCI bridge> at device 8.0 on pci0
pci1: <ACPI PCI bus> on pcib1
atapci0: <HighPoint HPT372N UDMA133 controller> port
0xa000-0xa007,0xa400-0xa403,0xa800-0xa807,0xac00-0xac03,0xb000-0xb0ff
irq 11 at device 6.0 on pci1
ata2: <ATA channel 0> on atapci0
ata3: <ATA channel 1> on atapci0
pci1: <multimedia, audio> at device 9.0 (no driver attached)
pci1: <input device> at device 9.1 (no driver attached)
atapci1: <nVidia nForce2 UDMA133 controller> port
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 9.0 on pci0
ata0: <ATA channel 0> on atapci1
ata1: <ATA channel 1> on atapci1
pcib2: <ACPI PCI-PCI bridge> at device 12.0 on pci0
pci2: <ACPI PCI bus> on pcib2
xl0: <3Com 3c920B-EMB Integrated Fast Etherlink XL> port 0xc000-0xc07f
mem 0xdd000000-0xdd00007f irq 5 at device 1.0 on pci2
miibus1: <MII bus> on xl0
acphy0: <AC101L 10/100 media interface> on miibus1
acphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
xl0: Ethernet address: 00:26:54:10:8c:0f
pcib3: <ACPI PCI-PCI bridge> at device 30.0 on pci0
pci3: <ACPI PCI bus> on pcib3
pci3: <display, VGA> at device 0.0 (no driver attached)
fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: does not respond
device_attach: fdc0 attach returned 6
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on
acpi0
sio0: type 16550A
sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
ppc0: <ECP parallel printer port> port 0x378-0x37f,0x778-0x77b irq 7 drq
3 on acpi0
ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/16 bytes threshold
ppbus0: <Parallel port bus> on ppc0
plip0: <PLIP network interface> on ppbus0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: does not respond
device_attach: fdc0 attach returned 6
pmtimer0 on isa0
orm0: <ISA Option ROMs> at iomem 0xd0000-0xd17ff,0xd6000-0xd67ff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounter "TSC" frequency 2090164914 Hz quality 800
Timecounters tick every 1.000 msec
ad0: 194481MB <Maxtor 6L200R0 BAH41G10> at ata0-master UDMA133
acd0: DVDROM <CREATIVE DVD-ROM DVD1241E/VER 0.44> at ata0-slave UDMA33
ad4: 305245MB <Seagate ST3320620AS 3.AAC> at ata2-master UDMA133
ad6: 305245MB <Seagate ST3320620AS 3.AAC> at ata3-master UDMA133
ar0: 610490MB <HighPoint v2 RocketRAID RAID0 (stripe 16 KB)> status: READY
ar0: disk0 READY using ad4 at ata2-master
ar0: disk1 READY using ad6 at ata3-master
Trying to mount root from ufs:/dev/ad0s1a
_______________________________________________
freebsd-stable at freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"



More information about the talk mailing list