[nycbug-talk] Cacti Sucks, So what do I replace it with.

Jesse Callaway bonsaime at gmail.com
Thu May 5 17:43:55 EDT 2011

On Thu, May 5, 2011 at 5:07 PM, Edward Capriolo <edlinuxguru at gmail.com> wrote:
> On Thu, May 5, 2011 at 4:20 PM, Jesse Callaway <bonsaime at gmail.com> wrote:
>> On Thu, May 5, 2011 at 1:37 PM, Jason Dixon <jason at dixongroup.net> wrote:
>>> On Thu, May 05, 2011 at 12:18:56PM -0400, Mark Saad wrote:
>>>> Talk
>>>>   I have a good question for you. I started to hate cacti for a few
>>>> reasons I dont want to get into.
>>>> I know that a few other trending / monitoring projects have reached
>>>> critical mass have a good number of people using them.
>>>> What do you recommend I move to.  Here are my requirements.
>>>> 1. SNMP Polling
>>>> 2. RRD , SQLite, or Berklydb data storage
>>>> 3. I don't want it to lower my tco or bake me a cake .
>>>> 4. Flexible trend management.  (If I want to trend nfs read operations
>>>> for 100 servers into one graph I should not have to jump threw hoops)
>>>> So people have pointed me to
>>>> 1. zabbix.com
>>>> 2. munin-monitoring.org
>>>> 3. ganglia.sourceforge.net
>>> I'm a big fan of Graphite (http://graphite.wikidot.com/). There are a
>>> lot of agents (Munin, collectd, gmond) that already support it. It will
>>> also read in any existing RRD files you have, which is really nice. It's
>>> less of a dashboard than Cacti; currently it excels at metrics storage
>>> and complex graph creation. But it does server-side rendering and
>>> supports all creation options as HTTP parameters, so it's easy to adjust
>>> graphs on the fly, embed them in your own HTML dashboards, etc.
>>> I gave a recent talk at PICC on using Graphite in conjuction with Nagios
>>> and PNP4Nagios to get more ROI on your existing Nagios installation.
>>> http://www.slideshare.net/obfuscurity/trending-with-purpose
>>> --
>>> Jason Dixon
>>> DixonGroup Consulting
>>> http://www.dixongroup.net/
>>> _______________________________________________
>>> talk mailing list
>>> talk at lists.nycbug.org
>>> http://lists.nycbug.org/mailman/listinfo/talk
>> Much appreciated, all. Cacti has it all there, it just needs to be
>> rewritten from scratch... which isn't going to happen.
>> Did not know that munin, collectd, and gmond have stuff that spews to
>> graphite... nice!
>> There is a lot of interest out there in getting a good replacement
>> going, and many projects. It's good to see all of these efforts come
>> out at once. Some projects are looking at what others are doing, and
>> it's making a great feedback cycle... the Bazaar!!!
>> -jesse
>> _______________________________________________
>> talk mailing list
>> talk at lists.nycbug.org
>> http://lists.nycbug.org/mailman/listinfo/talk
> Being a big cacti/snmp guy I have to chime in.
> First let me start by saying I do not like push based systems like
> ganglia.  (BTW I met one of original ganglia authors. Really cool guy)
> Why? Counters are supposed to go up. The reason it is done like this
> is so N independent systems can sample the value at different
> intervals. For example, if i get an alert from my NMS saying "CPU is
> high" but I have to wait "5" or "10" minutes to see if it clears or
> actually SSH on the system, and run top my NMS is NOT useful. In this
> case cacti has an awesome "Real time" plugin that allows me to look at
> something in 5,10,20,30...second intervals. Game changer.

Ganglia and Graphite only send the data when it's necessary... You can
stick with a regular interval or you can send when it's appropriate.
This is flexibility. Most stuff I'm trending is not appropriate to
view on a 15 second polling interval. I just don't find the realtime
graphs entirely useful for trending.

I would like to be able to poll and push, ideally. There are benefits
to both... However the real win with graphite is that you could get
some alert in the middle of the night, and think... geez time to trend
this stat. Write a script to throw the data to the collector and then
go to bed without worrying about polluting OID space with a poorly
structured table. Write the graphs later when you think more clearly.

> Most users do not learn or understand the features built into SNMP
> 1) It is trivial to use extend or exec in snmp and pass a request for
> an OID directly to a script
> 2) You can use SNMP AGENT or AGENTX technology to link SNMP directly
> to counters/method in a running process
> 3) It is widely understood by a wide variety of tools.

SNMP is useful. Good tool. Agentx is not so easy. I don't see what the
data source has to do with this.

> This fundamental lack in understanding results in much wheel
> reinvention and clunky solutions for passing data around. Take for
> example how most people do apache stats. Typically they try to write
> some wonky script that acquires information using wget from the
> server_status page. Each time this page changes or adds something new
> the scripts usually break.
> On the other side of the pond, look at IIS. Windows performance
> counters and IIS are implemented !!beautifully!! You open a MMC
> console connect to a remote server, get a list of objects, IE IIS, get
> a list of counters IE requests/sec. chose an instance like
> mywebsite.org, and bam real time counters, rendered on screen, built
> in support to save this information to a file or SQL database.
> The open source world has just completely missed the boat in most
> cases. Rather then look at the simple elegant way windows does this
> and leverage SNMP agents and already existing SNMP tools, each project
> takes a different wheel reinventing approach to accomplish the same
> thing.

Is it the responsibility of every application to maintain an SNMP
agent, compatible with whatever flavor (okay... ucd-snmp) of snmpd is
running on whatever OS?

> I am back to old school, every machine gets 4 graphs CPU, Disk
> activity, network, and memory. Maybe I build a custom graph with
> requests/second if applicable when I am in the mood, but that is it.
> This stuff jumped the shark a long time ago.


More information about the talk mailing list