[nycbug-talk] Statistical Monitoring

Tue Nov 4 15:09:17 EST 2008

> I'm running Nagios + pnp4nagios which takes the extra data that the
> nagios service checks picks up and makes RRD/Cacti graphs out of them.
> I did this to reduce the amount of polling which can skew results, and
> soaks up resources for those times when you really need the graphs.
> Also it's all wrapped up in one place to maintain.

Sounds cool, but I'm running a lot of my checks via check_by_ssh, so when 
things get bogged down, I tend to get a lot of "plugin timeout". 
Technically, I could switch these to SNMP checks, and/or passive checks, 
which would help a lot, but there are many things I want to graph that I 
don't want to alert on -- such as each webserver's input/output on the 
NIC, I/O on hard disk, etc.  Would I just create these as checks inside 
nagios but just never set a critical or warning level for them?  Or is it 
better to use something different since there are so many checks that I 
don't want to monitor for alerts?