[nycbug-talk] Monitoring > 1000 devices

George Georgalis george
Wed Mar 23 11:54:04 EST 2005


On Wed, Mar 23, 2005 at 11:34:24AM -0500, steverieger wrote:
>Hi all,
>
>Am going to start a nice discussion here about monitoring, and would like
>your opinions.
>
>Having used nagios, zabbix, cricket, mrtg (not a true monitoring package),
>and a few others to keep an eye on all my devices around the world. The
>devices are made up of the following types.
>500 cisco
>    need to monitor about 20 different things on each device
>300 servers
>    need to monitor about 40 different things on each device, including
>apache, mysql, network, uptime, checksum of /usr/local/sbin/sshd, etc.....
>100 printers
>    need to monitor about 10 different things, purely via snmp
>10 windows servers
>    need to monitor about 15 things, mostly via snmp, but an agent would be
>ok.
>
>    nagios which comes to mind is great but a bit of a pain to set up for
>such a large env. Adding a whole new group of servers or devices might take
>a few days. Zabbix is awesome, it can monitor everything either via agent or
>snmp, and is very extensible. But zabbix has some issues on the recovery
>side when monitoring via snmp. Mrtg does what it is supposed to, and I get
>my sexy graphs. But I get no notification if something is amiss.
>
>    so do any of you know if there is a tool out there that can run an auto
>discovery, something like netdisco, and also monitors according to the
>parameters I set.
>

not sure about auto discovery... nmap?

http://kernel.org/pub/software/admin/mon/html/
http://kernel.org/pub/software/admin/mon/

you will probably want very specific tests and alerts, with lots of
control: mon. technically mon isn't a monitor, it's a scheduler that
prevents test from running concurently, and configures for "do alert
script if test x (which runs every 3 minutes) fails 3 times in a row,
only send one alert per 3 hours, and run status restored script after
service is restored" it comes with cgi reports and lots of test and
alert scripts, or make your own. you can do mrtg on the test script data
if you want.

// George


-- 
George Georgalis, systems architect, administrator Linux BSD IXOYE
http://galis.org/george/ cell:646-331-2027 mailto:george at galis.org




More information about the talk mailing list