[nycbug-talk] Text parsing question

maddaemon at gmail.com maddaemon at gmail.com
Thu Dec 18 11:19:09 EST 2008

On Tue, Dec 16, 2008 at 4:45 PM, Okan Demirmen <okan at demirmen.com> wrote:
> On Mon 2008.12.15 at 18:49 -0500, maddaemon at gmail.com wrote:
>> List,
>> I'm hoping someone can help me with this...
>> I'm trying to search for a pattern in a text file that contains login
>> info from a syslog and weed out entries that are duplicated with
>> differnt IP addresses.
>> For example, here are 2 lines:
>> Dec 15 05:15:56 - abc1234 tried logging in from
>> Dec 15 05:15:56 - abc1234 tried logging in from
>> where is the Windows DC, and the other is the IIP of the
>> webmail server.
>> I need to remove the line that contains the DC _ONLY_WHEN_ there is a
>> duplicate entry (same timestamp) with another IP.  The text file
>> contains hundreds of other entries, and there are single entries where
>> the DC IP is the only entry.  Using the above examples, I need to
>> remove the first line and only retrieve the second line:
>> Dec 15 05:15:56 - abc1234 tried logging in from
>> Does anyone know how to go about doing this?  I was going to try using
>> sed and compare the lines looking for the same timestamp + username +
>> IP1/IP2, but it gave me a headache when I tried to wrap my head around
>> the logic.
> you need context - see http://www.estpak.ee/~risto/sec/

I've checked out SEC for other things, but I'm actually using
OSSEC-HIDS for the real-time alerting, and it's awesome for that.
This is for the daily report that gets generated every morning on the
previous days' syslog data, containing such things as new user
accounts created, accounts deleted, locked out accounts, and so on.
The problem started when we added 2 servers running a new Windows O/S
that use different Windows EventIDs for a failed login attempt.  Since
adding that part, I'm getting numerous duplicates because logging into
webmail produces 2 entries - the webmail server IP (or another service
such as that) and the DC IP.  I'm only interested in the originating
IP for the report.

More information about the talk mailing list