[nycbug-talk] Text parsing question
Francisco Reyes
lists at stringsutils.com
Thu Dec 18 10:10:35 EST 2008
maddaemon at gmail.com writes:
> I need to remove the line that contains the DC _ONLY_WHEN_ there is a
> duplicate entry (same timestamp) with another IP. The text file
> contains hundreds of other entries, and there are single entries where
If python is acceptable....
Test data
Dec 15 05:15:56 - abc1234 tried logging in from 192.168.8.17
Dec 15 05:15:56 - abc1234 tried logging in from 192.168.18.13
Dec 15 06:15:56 - abc1234 tried logging in from 192.168.18.14
Dec 15 06:15:56 - abc1234 tried logging in from 192.168.8.17
Dec 15 07:15:56 - abc1234 tried logging in from 192.168.18.15
Dec 15 08:15:56 - abc1234 tried logging in from 192.168.8.17
Program
#!/usr/bin/python
import sys
line=sys.stdin.readline()
while True:
if not line:
break
items = line.split()
CurrentTimeStamp=items[0]+" "+items[1]+" "+items[2]
TimeStamp=CurrentTimeStamp
PrintIP=""
while CurrentTimeStamp==TimeStamp:
IP=items[9]
if PrintIP=="" or PrintIP=="192.168.8.17":
PrintIP=IP
line=sys.stdin.readline()
if not line:
break
items = line.split()
TimeStamp=items[0]+" "+items[1]+" "+items[2]
print CurrentTimeStamp+" "+PrintIP
Output
Dec 15 05:15:56 192.168.18.13
Dec 15 06:15:56 192.168.18.14
Dec 15 07:15:56 192.168.18.15
Dec 15 08:15:56 192.168.8.17
Should not be difficult to convert to another language.
In case the email trashes the spacing..
http://public.natserv.net/test.py
http://public.natserv.net/test.txt
Hope this is what you were looking for.
More information about the talk
mailing list