[nycbug-talk] Character Frequency Analyzer
Bjorn Nelson
o_sleep at belovedarctos.com
Wed Mar 7 06:35:19 EST 2012
On 3/5/2012 12:47 PM, George Rosamond wrote:
> On 03/05/12 12:39, James wrote:
>> On Mon, Mar 5, 2012 at 9:17 AM, George Rosamond
>> <george at ceetonetechnology.com> wrote:
>>> A bit OT, but very cool:
>>>
>>> www.characterfrequencyanalyzer.com
>>
>> There is also stan from ports/sysutils on OpenBSD
>>
>> <paste>
>>
>> Stan is a console application that analyzes binary streams and
>> calculates several useful statistical information from the observed
>> data. It features statistical, pattern and bit analysis. Stan has been
>> designed as a "swiss-knife" for first steps in reverse engineering and
>> cryptographic analysis.
I wrote a small script that I guess you could lump in the same category
as these. The problem being that many log files contain tons of
meaningless redundant lines of text and a few really important error
messages, but we sometimes attribute a cause to the redundant lines of
text just because of the high frequency of them. To help this, this
script would take the log file and take every line, strip out the
datestamp and every word that didn't exist in the local dict file. Then
make that a hash key with the line as a value (continually overwriting
whatever was there before for that key). Then the output would be the
count of those lines and the value, ie. sample, of what that line looked
like unstripped. You would effectively get all the error types of the
logs and a frequency of them in a much more compact view.
-Bjorn
More information about the talk
mailing list