[nycbug-talk] regular hardware troubleshooting/monitoring

David Rio Deiros driodeiros
Thu Jun 23 13:23:36 EDT 2005


On Thu, Jun 23, 2005 at 10:13:41AM -0400, Ray wrote:
> On Wed, Jun 22, 2005 at 10:50:45PM -0700, David Rio Deiros wrote:
> > I cannot see how to test the memory without rebooting the machine.
> 
> I would imagine a kernel module could be continuously scanning RAM, taking up RAM
> just like any normal program would, except that it never gets swapped out and
> it gets to choose exactly where in the memory space it will be allocated.

What about the memory part where the kernel is located?

> > Even if you could do it, you will have to modify the memtest code to send
> > you the results via (smtp, http, etc...) something tough considering 
> > (I am not sure) memtest is written in assembly.
> 
> Even if memtest were written completely in assembly (which I'm doubtful of)
> you can still call C functions in assembly.  Just like in the kernel.

Perhaps not all is written in assembly but probably an important part.
Specially those parts that the program use more often, in order to 
improve performance.

> Or instead of doing that, you could add a sysctl and change its value whenever
> anything is found.

Sorry I don't get that. Sysctl? You don't have the OS running underneath
your program. Umm... I don't see what you mean.

> > Regarding to the CPU, pretty much the same.... Well... you can actually
> > run programs like cpuburn but those are going to put your CPU to 0%
> > idle. Something you don't want in a production server.
> 
> nice(1).

Ok.. that will work. 
But now that you said that, I have another question: Imagine that 
you have a program that it is very I/O consuming. If you renice 
that program, you are going to ensure that program is not going 
to eat all your CPU. What about the I/O requests? Is there any
way to renice them?

> > Besides cpuburn has to run a couple of days at least in order to
> > verify that the CPU is going to work fine over extreme.
> 
> Your production servers don't run for over a couple of days?

I meant, you have to put your cpu under heavy load for a decent 
number of hours to verify that it does not have heating issues.
How much time? I don't know. I said 2 days at least... 

David




More information about the talk mailing list