[nycbug-talk] regular hardware troubleshooting/monitoring

Ray nycbug
Thu Jun 23 10:13:41 EDT 2005


On Wed, Jun 22, 2005 at 10:50:45PM -0700, David Rio Deiros wrote:
> I cannot see how to test the memory without rebooting the machine.

I would imagine a kernel module could be continuously scanning RAM, taking up RAM
just like any normal program would, except that it never gets swapped out and
it gets to choose exactly where in the memory space it will be allocated.

> Even if you could do it, you will have to modify the memtest code to send
> you the results via (smtp, http, etc...) something tough considering 
> (I am not sure) memtest is written in assembly.

Even if memtest were written completely in assembly (which I'm doubtful of)
you can still call C functions in assembly.  Just like in the kernel.

Or instead of doing that, you could add a sysctl and change its value whenever
anything is found.

> Regarding to the CPU, pretty much the same.... Well... you can actually
> run programs like cpuburn but those are going to put your CPU to 0%
> idle. Something you don't want in a production server.

nice(1).

> Besides cpuburn has to run a couple of days at least in order to
> verify that the CPU is going to work fine over extreme.

Your production servers don't run for over a couple of days?




More information about the talk mailing list