<br><br><div class="gmail_quote">On Tue, Aug 2, 2011 at 12:52 PM, Jesse Callaway <span dir="ltr"><<a href="mailto:bonsaime@gmail.com">bonsaime@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br><br><div class="gmail_quote">On Tue, Aug 2, 2011 at 10:14 AM, Edward Capriolo <span dir="ltr"><<a href="mailto:edlinuxguru@gmail.com" target="_blank">edlinuxguru@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div><div><br><br><div class="gmail_quote">On Mon, Aug 1, 2011 at 5:16 PM, Jesse Callaway <span dir="ltr"><<a href="mailto:bonsaime@gmail.com" target="_blank">bonsaime@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="border-left:1px solid rgb(204, 204, 204);margin:0pt 0pt 0pt 0.8ex;padding-left:1ex">
<br><br><div class="gmail_quote">On Mon, Aug 1, 2011 at 5:12 PM, Chris Snyder <span dir="ltr"><<a href="mailto:chsnyder@gmail.com" target="_blank">chsnyder@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left:1px solid rgb(204, 204, 204);margin:0pt 0pt 0pt 0.8ex;padding-left:1ex">
<div>On Mon, Aug 1, 2011 at 5:08 PM, Chris Snyder <<a href="mailto:chsnyder@gmail.com" target="_blank">chsnyder@gmail.com</a>> wrote:<br>
><br>
> As you've discovered, Apache doesn't log the request separate from the<br>
> response, so a log analyzer is no help here.<br>
<br>
</div>But wait -- this isn't strictly true. Apache can be made to log the<br>
time taken to serve the request, in microseconds. It just doesn't do<br>
so in the standard log format.<br>
<br>
<a href="http://httpd.apache.org/docs/2.2/mod/mod_log_config.html#formats" target="_blank">http://httpd.apache.org/docs/2.2/mod/mod_log_config.html#formats</a><br>
<br>
But getting awstats or another log analyzer to pay attention is another story.<br>
<div><div>_______________________________________________<br>
talk mailing list<br>
<a href="mailto:talk@lists.nycbug.org" target="_blank">talk@lists.nycbug.org</a><br>
<a href="http://lists.nycbug.org/mailman/listinfo/talk" target="_blank">http://lists.nycbug.org/mailman/listinfo/talk</a><br>
</div></div></blockquote></div><br><br clear="all">correctamundo... <div><br></div><div>Gotta go with sec (simple event correlator) or collectd for the easiest way. Otherwise you're writing your own filter program for the apache logs... which i mean it's kinda cool that you can just add a pipe character to the logfile name, like in perl. But...</div>
<div><br></div><div><br><div><br>-- <br>-jesse<br>
</div></div>
</blockquote></div><br></div></div>To be clear I am using: <br> LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\" %T %D" with_time<br> CustomLog /opt/awstats-7.0/wwwroot/cgi-bin/gui-access-perf.log with_time<br>
<br>%D is the time in microseconds.<div><br><br><a href="http://httpd.apache.org/docs/2.2/mod/mod_log_config.html#formats" target="_blank">http://httpd.apache.org/docs/2.2/mod/mod_log_config.html#formats</a><br>
<br></div><div>I know I can script something and make my own report, but I really do not want to. If find when you write these things yourself you end up taking care of them indefinitely. I was hoping to find some tool that would break down %D by page. hits/average(time to serve),max(time_to_serve),95th percentile(time_to_serve). <br>
</div><div><br></div>
</blockquote></div><br>Could you commit to the apache snmp module? Then you might possibly be able to pawn off maintenance at some point. Er... nah, that wouldn't really work per-page. Just thinking out loud.<br clear="all">
<br>-- <br><font color="#888888">-jesse<br>
</font></blockquote></div><div><br></div><div>I am not trying to show off but I like closing up threads. I buckled and just wrote it myself :(. I used a mix of shell, hadoop, and hive. This is the gist of it:<br></div><div>
<br>sh produce_tts_stats.sh<br></div><div>awk '{print $7 "\t" $NF }' gui-access-perf.log > gui-access-perf_1<br>hadoop dfs -rm /user/hive/warehouse/edward.db/time_to_serve/gui-access-perf_1<br>hadoop dfs -copyFromLocal gui-access-perf_1 /user/hive/warehouse/edward.db/time_to_serve<br>
</div><div><br></div><div>hive -e "create table time_to_serve fields terminated by '\t'" #<---one time step<br></div><div><br></div><div>hive -e "<br>use edward;<br>set mapred.map.tasks=1;<br>set hive.cli.print.header=true;<br>
select url,count(1) as count, max(tts) as tts_max ,min(tts) as tts_min ,avg(tts) as tts_avg from time_to_serve group by url order by tts_avg limit 4000000;<br></div><div>" > outfile</div><div><br></div><div>[edward@etl02 ~]$ head outfile<br>
url count tts_max tts_min tts_avg<br>/ 21429 39520 37 72.10341126510804<br>/robots.txt 1 74 74 74.0<br>/w00tw00t.at.ISC.SANS.DFind:) 1 77 77 77.0<br></div><div><br></div><div>It is a couple more steps with cron and was not really enough data to justify distributed computing. Hive was a nice fit though because it handled all the group stuff I did not want to code up by hand.</div>
<div><br></div><div>Edward</div>