[nycbug-talk] Reproducible data corruption on 6.1-Stable [Long but please read]
Isaac Levy
ike at lesmuug.org
Wed Sep 13 20:50:25 EDT 2006
Hi Jonathan,
What timing. I've just run into the same reproducible problem (I'm
having flashbacks to 5.x and sweating and gnashing my teeth).
On Sep 13, 2006, at 7:00 PM, Jonathan Stewart wrote:
> Hello all,
>
> I set up a new server recently and transferred all the information
> from
> my old server over. I tried to use unison to synchronize the
> backup of
> pictures I have taken and noticed that a large number of pictures
> where
> marked as changed on the server. After checking the pictures by
> hand I
> confirmed that many of the pictures on the server were corrupted. I
> attempted to use unison to update the files on the server with the
> correct local copies but it would fail on almost all the files with
> the
> message "destination updated during synchronization."
>
> It appears the corruption happens during the read process because
> when I
> recompare the files in a graphical diff tool between cache flushes the
> differences move around!?!?!? The differences also appear to be very
> small for the most part, single bytes scattered throughout the
> file. I
> really have no idea what is causing the problem and would like to
> pin it
> down so I can either replace hardware if it's bad or fix whatever the
> bug is.
>
> The problem appears no matter how I read the file, unison, md5,
> etc. 1
> out of maybe 100 times it will read correctly. I have another drive
> that I use for the OS and I have done many buildworlds/kernels without
> problems on that drive as well as compiling some very large software
> packages. I'm wondering if a possible cause is the controller
> ignoring
> read errors from the hard drive but I would think more than the
> occasional single byte would be changed?
I've narrowed it down to the SATA drivers, something has changed
since I burned the 6.1 bootonly media (tried installing from both
freebsd.nycbug.org and ftp.freebsd.org to the same effect).
When I cvsup the STABLE branch, the kernel seems to totally freak out
on me. I originally thought it was the buildworld process, but after
simply installing the new kernel and rebooting, (su mode or not), I
get screens full of disk read errors; mostly:
"error issuing READ_DMA command"
From your DMESG:
> ad4: 305245MB <Seagate ST3320620AS 3.AAC> at ata2-master UDMA133
Ok- as soon as I get this box back up again, I'll post my dmesg as
well for comparison. This box was ruunning fine for about a week
with the 6.1 install media, so for me I'm just going to wipe it and
install from ftp binaries once more.
If this gets messy, I'll try to snag a spare SATA drive and replicate
the problem again, but for now I have to get this box back to work...
Best,
.ike
More information about the talk
mailing list