[nycbug-talk] regular hardware troubleshooting/monitoring

David Rio Deiros driodeiros
Thu Jun 23 03:33:35 EDT 2005


On Thu, Jun 23, 2005 at 02:45:01AM -0400, Isaac Levy wrote:
> Hi David, All,

Hi!

Some wild ideas:

Let's take this as a premise:

> *but* what if these tests were 
> run on clusters of boxes? 

Ok, every machine in the cluster has a little cronjob that runs 
a program/script every day. That script will ask to another 
machine if it is its turn to make the system checks. If so, 
the script will modify the booting options on the local
machine in order to load a ramdisk as a filesystem. On that 
filesystem we will have the right tools for the testing and
the init scripts will end up loading a program which will 
run all the tests to finally report to another machine the
results. After doing that, we can rewrite the booting options on
that machine and the machine would boot with the "normal" OS.

There is still one little problem: How do we run memtest?

> Just food for thought on this issue- it seems to me this gets even more 
> realistic as things like BGP/multihoming and CARP based systems make it 
> 'easy' to run tests which affect hardware performance...

This would be a nice way to take advantage of CARP. 

David




More information about the talk mailing list