[nycbug-talk] Cacti Sucks, So what do I replace it with.

Edward Capriolo edlinuxguru at gmail.com
Thu May 5 18:19:33 EDT 2011


Gmail is probably going to destroy in line replies but here goes nothing.

On Thu, May 5, 2011 at 5:43 PM, Jesse Callaway <bonsaime at gmail.com> wrote:
> On Thu, May 5, 2011 at 5:07 PM, Edward Capriolo <edlinuxguru at gmail.com> wrote:
>> On Thu, May 5, 2011 at 4:20 PM, Jesse Callaway <bonsaime at gmail.com> wrote:
>>> On Thu, May 5, 2011 at 1:37 PM, Jason Dixon <jason at dixongroup.net> wrote:
>>>> On Thu, May 05, 2011 at 12:18:56PM -0400, Mark Saad wrote:
>>>>> Talk
>>>>>   I have a good question for you. I started to hate cacti for a few
>>>>> reasons I dont want to get into.
>>>>>
>>>>> I know that a few other trending / monitoring projects have reached
>>>>> critical mass have a good number of people using them.
>>>>> What do you recommend I move to.  Here are my requirements.
>>>>>
>>>>> 1. SNMP Polling
>>>>> 2. RRD , SQLite, or Berklydb data storage
>>>>> 3. I don't want it to lower my tco or bake me a cake .
>>>>> 4. Flexible trend management.  (If I want to trend nfs read operations
>>>>> for 100 servers into one graph I should not have to jump threw hoops)
>>>>>
>>>>> So people have pointed me to
>>>>>
>>>>> 1. zabbix.com
>>>>> 2. munin-monitoring.org
>>>>> 3. ganglia.sourceforge.net
>>>>
>>>> I'm a big fan of Graphite (http://graphite.wikidot.com/). There are a
>>>> lot of agents (Munin, collectd, gmond) that already support it. It will
>>>> also read in any existing RRD files you have, which is really nice. It's
>>>> less of a dashboard than Cacti; currently it excels at metrics storage
>>>> and complex graph creation. But it does server-side rendering and
>>>> supports all creation options as HTTP parameters, so it's easy to adjust
>>>> graphs on the fly, embed them in your own HTML dashboards, etc.
>>>>
>>>> I gave a recent talk at PICC on using Graphite in conjuction with Nagios
>>>> and PNP4Nagios to get more ROI on your existing Nagios installation.
>>>>
>>>> http://www.slideshare.net/obfuscurity/trending-with-purpose
>>>>
>>>> --
>>>> Jason Dixon
>>>> DixonGroup Consulting
>>>> http://www.dixongroup.net/
>>>> _______________________________________________
>>>> talk mailing list
>>>> talk at lists.nycbug.org
>>>> http://lists.nycbug.org/mailman/listinfo/talk
>>>>
>>>
>>> Much appreciated, all. Cacti has it all there, it just needs to be
>>> rewritten from scratch... which isn't going to happen.
>>>
>>> Did not know that munin, collectd, and gmond have stuff that spews to
>>> graphite... nice!
>>>
>>> There is a lot of interest out there in getting a good replacement
>>> going, and many projects. It's good to see all of these efforts come
>>> out at once. Some projects are looking at what others are doing, and
>>> it's making a great feedback cycle... the Bazaar!!!
>>>
>>> -jesse
>>> _______________________________________________
>>> talk mailing list
>>> talk at lists.nycbug.org
>>> http://lists.nycbug.org/mailman/listinfo/talk
>>>
>>
>> Being a big cacti/snmp guy I have to chime in.
>>
>> First let me start by saying I do not like push based systems like
>> ganglia.  (BTW I met one of original ganglia authors. Really cool guy)
>> Why? Counters are supposed to go up. The reason it is done like this
>> is so N independent systems can sample the value at different
>> intervals. For example, if i get an alert from my NMS saying "CPU is
>> high" but I have to wait "5" or "10" minutes to see if it clears or
>> actually SSH on the system, and run top my NMS is NOT useful. In this
>> case cacti has an awesome "Real time" plugin that allows me to look at
>> something in 5,10,20,30...second intervals. Game changer.
>
> Ganglia and Graphite only send the data when it's necessary... You can
> stick with a regular interval or you can send when it's appropriate.
> This is flexibility. Most stuff I'm trending is not appropriate to
> view on a 15 second polling interval. I just don't find the realtime
> graphs entirely useful for trending.
>

15 second intervals are not useful for trending. But they are useful
when things go wrong. I do not want to have to leave my NMS when
monitoring my network.

> I would like to be able to poll and push, ideally. There are benefits
> to both... However the real win with graphite is that you could get
> some alert in the middle of the night, and think... geez time to trend
> this stat. Write a script to throw the data to the collector and then
> go to bed without worrying about polluting OID space with a poorly
> structured table. Write the graphs later when you think more clearly.
>
>>
>> Most users do not learn or understand the features built into SNMP
>> 1) It is trivial to use extend or exec in snmp and pass a request for
>> an OID directly to a script
>> 2) You can use SNMP AGENT or AGENTX technology to link SNMP directly
>> to counters/method in a running process
>> 3) It is widely understood by a wide variety of tools.
>
> SNMP is useful. Good tool. Agentx is not so easy. I don't see what the
> data source has to do with this.
>
>>
>> This fundamental lack in understanding results in much wheel
>> reinvention and clunky solutions for passing data around. Take for
>> example how most people do apache stats. Typically they try to write
>> some wonky script that acquires information using wget from the
>> server_status page. Each time this page changes or adds something new
>> the scripts usually break.
>>
>> On the other side of the pond, look at IIS. Windows performance
>> counters and IIS are implemented !!beautifully!! You open a MMC
>> console connect to a remote server, get a list of objects, IE IIS, get
>> a list of counters IE requests/sec. chose an instance like
>> mywebsite.org, and bam real time counters, rendered on screen, built
>> in support to save this information to a file or SQL database.
>>
>> The open source world has just completely missed the boat in most
>> cases. Rather then look at the simple elegant way windows does this
>> and leverage SNMP agents and already existing SNMP tools, each project
>> takes a different wheel reinventing approach to accomplish the same
>> thing.
>>
>
> Is it the responsibility of every application to maintain an SNMP
> agent, compatible with whatever flavor (okay... ucd-snmp) of snmpd is
> running on whatever OS?
>

Yes. I believe it should be. Program ./configure themselves for
different thread libraries, pointer size, etc. Configuring your agent
is no different. It is a better alternative then relying on a cobbled
collection of shell scripts that hopefully extract the information you
need correctly. Your odds of finding a good template for the thing you
are looking to monitor is low at best.

I also hate seeing all the duplication of effort:

http://codeinthehole.com/archives/8-Monitoring-MySQL-with-Ganglia-and-gmetric.html
http://code.google.com/p/mysql-cacti-templates/
http://www.masterzen.fr/software-contributions/mysql-snmp-monitor-mysql-with-snmp/
http://github.com/kjellm/munin-mysql/tree/master
http://code.google.com/p/appaloosa-zabbix-templates/

Just knock it out once, write an SNMP agent for mysql, produce a nice
well documented mib file done deal. To beat a dead horse look at how
our m$ friends do it.
http://www.brentozar.com/archive/2006/12/dba-101-using-perfmon-for-sql-performance-tuning/.
They have time to make videos and show off while we spend time
debugging and reinventing data collection over and over again.

>> I am back to old school, every machine gets 4 graphs CPU, Disk
>> activity, network, and memory. Maybe I build a custom graph with
>> requests/second if applicable when I am in the mood, but that is it.
>> This stuff jumped the shark a long time ago.
>>
>
>
>
> --
> -jesse
>



More information about the talk mailing list