[nycbug-talk] Hardware annoyance

Marco Scoffier marco at metm.org
Fri Jun 23 18:45:21 EDT 2006

I have a server, which has been solid for years (yes years)

I put it into a colo and it has started randomly powering off, 
yes completely off.

When I reboot the machine after one of these poweroffs the BIOS will get
to where it initializes the Mylex Raid card (dac1100?) and then powers
off again.

If I unplug the RAID card the BIOS will complete its diagnostics until
when it finds no system disk, no PXE boot etc.  So I unplug the machine,
put the Raid card back in its slot.  Reboot and everything is ok.  RAID
is fine sychronized everything. 

I finish the boot and everything comes up ok, mlxcontrol (FreeBSD 6.1)
shows everything is hunky dork.  Basically I went through all the
pulling the card, and rebooting stuff 5 days ago, ran memtest and
crossed my fingers that I had fixed the hardware voodoo (perhaps 
something had wiggled and gotten unseated when I moved the server
to the colo).

Today the server is off again.  So I plan on going to the colo,
removing the raid card and reinstalling a geom RAID straight on one of
the SCSI channels.

Is there something else I should check?  I wiggled the power cord and
power connections this did not cause the server to power off.

Could there be another reason for the mysterious power-offs?



