[nycbug-talk] regular hardware troubleshooting/monitoring

David Rio Deiros driodeiros
Thu Jun 23 01:50:45 EDT 2005

On Thu, Jun 23, 2005 at 01:33:15AM -0400, George R. wrote:
> Anyway, Ike and I were discussing this evening, and maybe it would make 
> sense to figure out some daily tests on production hardware that would 
> notify the results when an error occurs, say, with memtest or fsck. 
> Outside of what already comes out in the daily/weekly/monthly emails.
> Any thoughts on this?  Does it make sense to have regular tests running 
> on production hardware?  Does anyone do this in their own environment, 
> outside of whatever the various full-scale open and closed source 
> products already do on hardware monitoring?

This is what I think about it:

I cannot see how to test the memory without rebooting the machine. Even 
if you could do it, you will have to modify the memtest code to send
you the results via (smtp, http, etc...) something tough considering 
(I am not sure) memtest is written in assembly.

Regarding to the CPU, pretty much the same.... Well... you can actually
run programs like cpuburn but those are going to put your CPU to 0%
idle. Something you don't want in a production server. Besides cpuburn
has to run a couple of days at least in order to verify that the CPU
is going to work fine over extreme.

Rergarding to the Hard drives, you have the SMART feature that 
comes with all the hard drives nowadays, again, if you make 
your hd to run tests that is going to reduce performance. If 
you know when you can tolerate that then you can safely run it. 


