[nycbug-talk] Character Frequency Analyzer

Wed Mar 7 06:35:19 EST 2012

On 3/5/2012 12:47 PM, George Rosamond wrote:
> On 03/05/12 12:39, James wrote:
>> On Mon, Mar 5, 2012 at 9:17 AM, George Rosamond
>> <george at ceetonetechnology.com>  wrote:
>>> A bit OT, but very cool:
>>>
>>> www.characterfrequencyanalyzer.com
>>
>> There is also stan from ports/sysutils on OpenBSD
>>
>> <paste>
>>
>> Stan is a console application that analyzes binary streams and
>> calculates several useful statistical information from the observed
>> data. It features statistical, pattern and bit analysis. Stan has been
>> designed as a "swiss-knife" for first steps in reverse engineering and
>> cryptographic analysis.

I wrote a small script that I guess you could lump in the same category 
as these.  The problem being that many log files contain tons of 
meaningless redundant lines of text and a few really important error 
messages, but we sometimes attribute a cause to the redundant lines of 
text just because of the high frequency of them.  To help this, this 
script would take the log file and take every line, strip out the 
datestamp and every word that didn't exist in the local dict file.  Then 
make that a hash key with the line as a value (continually overwriting 
whatever was there before for that key).  Then the output would be the 
count of those lines and the value, ie. sample, of what that line looked 
like unstripped.  You would effectively get all the error types of the 
logs and a frequency of them in a much more compact view.

-Bjorn