[nycbug-talk] Hardware annoyance
Yusuke Shinyama
yusuke at cs.nyu.edu
Fri Jun 23 23:33:31 EDT 2006
Marco Scoffier <marco at metm.org> wrote:
> I have a server, which has been solid for years (yes years)
>
> I put it into a colo and it has started randomly powering off,
> yes completely off.
I had a similar case. The machine powered off after a while after
booting. First I thought this was a memory problem, so ran
memtest and found it always shuts down at a certain point in the
test, which tempted me to believe this is real. But after
replacing the memories, the problem still persisted.
Actually, I even asked this in nylug list:
http://www.nylug.org/pipermail/nylug-talk/2006-February/029279.html
Then I was almost giving up the machine. But a week after or so, I
found that the buckle of a heatsink was loosened, and it was not
tightly attached to the processor. This caused the processor
heated too much when the processor load exceeds a certain amount,
which leads it to the sudden death.
After having the buckle lever firmly pressed down, the machine
runs perfectly fine.
Yusuke
More information about the talk
mailing list