[nycbug-talk] regular hardware troubleshooting/monitoring

George Georgalis george
Thu Jun 23 10:42:45 EDT 2005


On Thu, Jun 23, 2005 at 10:13:41AM -0400, Ray wrote:
>On Wed, Jun 22, 2005 at 10:50:45PM -0700, David Rio Deiros wrote:
>> I cannot see how to test the memory without rebooting the machine.

recompile a kernel 20 times and pipe stderr/stdout to a file,
compare files sizes... pretty darn effective, for a running machine


>> Regarding to the CPU, pretty much the same.... Well... you can actually
>> run programs like cpuburn but those are going to put your CPU to 0%
>> idle. Something you don't want in a production server.
>
>nice(1).

and you may well learn the burden continuous context switching puts on
your machine...


How about a real world problem? I've got a box that I cannot identify
the problem with. The cause is probably from pushing the bus and clock
rates to a point it was remained stable, a few years ago. The cpu is an
AMD 750 clocked to 950 (they are good at that), it's a VIA chipset with
a little heatsink required on the 'south bridge' chip (disk controller,
sound, etc). Well that fell off a while ago and didn't cause any
immediate problems (and I mean in a tight, warm, installation, for many
months after I discovered it)...

now, this tower running open, fails (locks up) from 3 to 48 hours,
during no unusual activity (disk, cpu, etc) the south bridge chip won't
be hot to touch and I'll often use an SATA and no ATA disk. Memtest86
won't fail after long runs.

So how can I test/salvage? My guess is it's the south bridge, but short
of investing $30 in Artic silver glue to see if the problem goes away
(which I doubt because that chip doesn't really get hot), I'm not sure
how to tell, ditto for the cpu, don't want to replace if it's not broke
and that's got a nice new fan on it... so what is broke? and how can I
tell?

(I don't expect a useful answer here...)

// George


-- 
George Georgalis, systems architect, administrator Linux BSD IXOYE
http://galis.org/george/ cell:646-331-2027 mailto:george at galis.org




More information about the talk mailing list