[talk] zfs disk outage and aftermath
Jonathan
jonathan at kc8onw.net
Wed Feb 6 18:06:04 EST 2019
I've had a home setup with consumer drives in a horrible high vibration
environment for almost 10 years now with multiple reconfigurations and
migrations for the array. I've probably lost 6 or 7 drives over that
time and I have yet to lose data because of it. I did have a file that
had bitrot and then I lost a drive and had to restore it from original
media but that's why scrubs are so important.
Jonathan
On 2019-02-04 16:56, John C. Vernaleo wrote:
> I read the subject of this had that sinking feeling in my stomach as
> I'm becoming more and more reliant on ZFS and no email with 'disk' in
> the subject line is ever good news. I was afraid there would be a
> horror story to make me rethink my reliance on ZFS,
>
> Was pleasantly surprissed to see I was wrong. And that health check
> script looks like a really good idea.
>
> John
>
> -------------------------------------------------------
> John C. Vernaleo, Ph.D.
> www.netpurgatory.com
> john at netpurgatory.com
> -------------------------------------------------------
>
> On Mon, 4 Feb 2019, N.J. Thomas wrote:
>
>> Well, yesterday morning I suffered my first drive failure on one of my
>> ZFS boxes (running FreeBSD 12.0), it actually happened on my primary
>> backup server.
>>
>> "zpool status" showed that my regularly scheduled scrub had found (and
>> fixed) some errors on one of the disks in a mirrored pair.
>>
>> I made sure that my replica box had up to date snapshots transferred
>> over, shut down the machine, and asked the datacenter team to check.
>> They indeed found that the drive was faulty and replaced it.
>>
>> It took about 4 hours for the drive to be resilvered, and that was it.
>> Back to normal with almost no issues -- apart from the few minutes
>> that
>> the machine was down while its drive was being replaced.
>>
>> My takeaways:
>>
>> - use ZFS
>>
>> - take regular snapshots
>>
>> - replicate your snapshots to another machine
>>
>> - scrub your disks regularly (unlike fsck, this can be run while
>> the
>> drive is mounted and active)
>>
>> - monitor zfs health (I use this script from Calomel.org:
>> https://calomel.org/zfs_health_check_script.html)
>>
>> The first three points are kinda obvious, the last two I picked up
>> from
>> other, more experienced, ZFS users.
>>
>> I had been waiting for this day since I first started using ZFS years
>> ago and am very happy with that decision to use this filesystem.
>>
>> Thomas
>>
>> _______________________________________________
>> talk mailing list
>> talk at lists.nycbug.org
>> http://lists.nycbug.org:8080/mailman/listinfo/talk
>>
>>
>
> _______________________________________________
> talk mailing list
> talk at lists.nycbug.org
> http://lists.nycbug.org:8080/mailman/listinfo/talk
More information about the talk
mailing list