[nycbug-talk] Reproducible data corruption on 6.1-Stable [Long but please read]

Isaac Levy ike at lesmuug.org
Wed Sep 13 20:50:25 EDT 2006

Hi Jonathan,

What timing.  I've just run into the same reproducible problem (I'm  
having flashbacks to 5.x and sweating and gnashing my teeth).

On Sep 13, 2006, at 7:00 PM, Jonathan Stewart wrote:

> Hello all,
> I set up a new server recently and transferred all the information  
> from
> my old server over.  I tried to use unison to synchronize the  
> backup of
> pictures I have taken and noticed that a large number of pictures  
> where
> marked as changed on the server.  After checking the pictures by  
> hand I
> confirmed that many of the pictures on the server were corrupted.  I
> attempted to use unison to update the files on the server with the
> correct local copies but it would fail on almost all the files with  
> the
> message "destination updated during synchronization."
> It appears the corruption happens during the read process because  
> when I
> recompare the files in a graphical diff tool between cache flushes the
> differences move around!?!?!?  The differences also appear to be very
> small for the most part, single bytes scattered throughout the  
> file.  I
> really have no idea what is causing the problem and would like to  
> pin it
> down so I can either replace hardware if it's bad or fix whatever the
> bug is.
> The problem appears no matter how I read the file, unison, md5,  
> etc.  1
> out of maybe 100 times it will read correctly.  I have another drive
> that I use for the OS and I have done many buildworlds/kernels without
> problems on that drive as well as compiling some very large software
> packages.  I'm wondering if a possible cause is the controller  
> ignoring
> read errors from the hard drive but I would think more than the
> occasional single byte would be changed?

I've narrowed it down to the SATA drivers, something has changed  
since I burned the 6.1 bootonly media (tried installing from both  
freebsd.nycbug.org and ftp.freebsd.org to the same effect).

When I cvsup the STABLE branch, the kernel seems to totally freak out  
on me.  I originally thought it was the buildworld process, but after  
simply installing the new kernel and rebooting, (su mode or not), I  
get screens full of disk read errors; mostly:

"error issuing READ_DMA command"

 From your DMESG:
> ad4: 305245MB <Seagate ST3320620AS 3.AAC> at ata2-master UDMA133

Ok- as soon as I get this box back up again, I'll post my dmesg as  
well for comparison.  This box was ruunning fine for about a week  
with the 6.1 install media, so for me I'm just going to wipe it and  
install from ftp binaries once more.

If this gets messy, I'll try to snag a spare SATA drive and replicate  
the problem again, but for now I have to get this box back to work...


More information about the talk mailing list