[nycbug-talk] Statistical Monitoring

Tue Nov 4 15:35:27 EST 2008

On Nov 4, 2008, at 3:09 PM, Matt Juszczak wrote:

>> I'm running Nagios + pnp4nagios which takes the extra data that the
>> nagios service checks picks up and makes RRD/Cacti graphs out of  
>> them.
>> I did this to reduce the amount of polling which can skew results,  
>> and
>> soaks up resources for those times when you really need the graphs.
>> Also it's all wrapped up in one place to maintain.
>
> Sounds cool, but I'm running a lot of my checks via check_by_ssh, so  
> when
> things get bogged down, I tend to get a lot of "plugin timeout".
> Technically, I could switch these to SNMP checks, and/or passive  
> checks,
> which would help a lot, but there are many things I want to graph  
> that I
> don't want to alert on -- such as each webserver's input/output on the
> NIC, I/O on hard disk, etc.  Would I just create these as checks  
> inside
> nagios but just never set a critical or warning level for them?  Or  
> is it
> better to use something different since there are so many checks  
> that I
> don't want to monitor for alerts?

Personally, I think it is very bad form to try do what you want to do  
with nagios.
People always try to make nagios into something it isn't and the  
results are
usually poorly implemented and difficult to support.

I have seen people try to turn nagios into a replacement for cron, a  
tool
to isolate system faults and god knows what else.

Its core strength is checking the state of a host or service and  
alerting you if
that host or service is not in a "good" state.

If you need graphing, look at a "heavy" application like cacti, or  
roll your own
with rrdtool and whatever scripting language you prefer.

Keep your nagios configuration simple and clean. They are complex enough
that you don't need to add another layer of complexity on top of them.

Steven Kreuzer
http://www.exit2shell.com/~skreuzer