[nycbug-talk] Cacti Sucks, So what do I replace it with.

Edward Capriolo edlinuxguru at gmail.com
Thu May 5 17:07:47 EDT 2011

On Thu, May 5, 2011 at 4:20 PM, Jesse Callaway <bonsaime at gmail.com> wrote:
> On Thu, May 5, 2011 at 1:37 PM, Jason Dixon <jason at dixongroup.net> wrote:
>> On Thu, May 05, 2011 at 12:18:56PM -0400, Mark Saad wrote:
>>> Talk
>>>   I have a good question for you. I started to hate cacti for a few
>>> reasons I dont want to get into.
>>> I know that a few other trending / monitoring projects have reached
>>> critical mass have a good number of people using them.
>>> What do you recommend I move to.  Here are my requirements.
>>> 1. SNMP Polling
>>> 2. RRD , SQLite, or Berklydb data storage
>>> 3. I don't want it to lower my tco or bake me a cake .
>>> 4. Flexible trend management.  (If I want to trend nfs read operations
>>> for 100 servers into one graph I should not have to jump threw hoops)
>>> So people have pointed me to
>>> 1. zabbix.com
>>> 2. munin-monitoring.org
>>> 3. ganglia.sourceforge.net
>> I'm a big fan of Graphite (http://graphite.wikidot.com/). There are a
>> lot of agents (Munin, collectd, gmond) that already support it. It will
>> also read in any existing RRD files you have, which is really nice. It's
>> less of a dashboard than Cacti; currently it excels at metrics storage
>> and complex graph creation. But it does server-side rendering and
>> supports all creation options as HTTP parameters, so it's easy to adjust
>> graphs on the fly, embed them in your own HTML dashboards, etc.
>> I gave a recent talk at PICC on using Graphite in conjuction with Nagios
>> and PNP4Nagios to get more ROI on your existing Nagios installation.
>> http://www.slideshare.net/obfuscurity/trending-with-purpose
>> --
>> Jason Dixon
>> DixonGroup Consulting
>> http://www.dixongroup.net/
>> _______________________________________________
>> talk mailing list
>> talk at lists.nycbug.org
>> http://lists.nycbug.org/mailman/listinfo/talk
> Much appreciated, all. Cacti has it all there, it just needs to be
> rewritten from scratch... which isn't going to happen.
> Did not know that munin, collectd, and gmond have stuff that spews to
> graphite... nice!
> There is a lot of interest out there in getting a good replacement
> going, and many projects. It's good to see all of these efforts come
> out at once. Some projects are looking at what others are doing, and
> it's making a great feedback cycle... the Bazaar!!!
> -jesse
> _______________________________________________
> talk mailing list
> talk at lists.nycbug.org
> http://lists.nycbug.org/mailman/listinfo/talk

Being a big cacti/snmp guy I have to chime in.

First let me start by saying I do not like push based systems like
ganglia.  (BTW I met one of original ganglia authors. Really cool guy)
Why? Counters are supposed to go up. The reason it is done like this
is so N independent systems can sample the value at different
intervals. For example, if i get an alert from my NMS saying "CPU is
high" but I have to wait "5" or "10" minutes to see if it clears or
actually SSH on the system, and run top my NMS is NOT useful. In this
case cacti has an awesome "Real time" plugin that allows me to look at
something in 5,10,20,30...second intervals. Game changer.

Most users do not learn or understand the features built into SNMP
1) It is trivial to use extend or exec in snmp and pass a request for
an OID directly to a script
2) You can use SNMP AGENT or AGENTX technology to link SNMP directly
to counters/method in a running process
3) It is widely understood by a wide variety of tools.

This fundamental lack in understanding results in much wheel
reinvention and clunky solutions for passing data around. Take for
example how most people do apache stats. Typically they try to write
some wonky script that acquires information using wget from the
server_status page. Each time this page changes or adds something new
the scripts usually break.

On the other side of the pond, look at IIS. Windows performance
counters and IIS are implemented !!beautifully!! You open a MMC
console connect to a remote server, get a list of objects, IE IIS, get
a list of counters IE requests/sec. chose an instance like
mywebsite.org, and bam real time counters, rendered on screen, built
in support to save this information to a file or SQL database.

The open source world has just completely missed the boat in most
cases. Rather then look at the simple elegant way windows does this
and leverage SNMP agents and already existing SNMP tools, each project
takes a different wheel reinventing approach to accomplish the same

I am back to old school, every machine gets 4 graphs CPU, Disk
activity, network, and memory. Maybe I build a custom graph with
requests/second if applicable when I am in the mood, but that is it.
This stuff jumped the shark a long time ago.

More information about the talk mailing list