[nycbug-talk] Text parsing question

Francisco Reyes lists at stringsutils.com
Thu Dec 18 10:10:35 EST 2008


maddaemon at gmail.com writes:

> I need to remove the line that contains the DC _ONLY_WHEN_ there is a
> duplicate entry (same timestamp) with another IP.  The text file
> contains hundreds of other entries, and there are single entries where

If python is acceptable....
Test data
Dec 15 05:15:56 - abc1234 tried logging in from 192.168.8.17
Dec 15 05:15:56 - abc1234 tried logging in from 192.168.18.13
Dec 15 06:15:56 - abc1234 tried logging in from 192.168.18.14
Dec 15 06:15:56 - abc1234 tried logging in from 192.168.8.17
Dec 15 07:15:56 - abc1234 tried logging in from 192.168.18.15
Dec 15 08:15:56 - abc1234 tried logging in from 192.168.8.17

Program
#!/usr/bin/python
import sys

line=sys.stdin.readline()
while True:
	if not line:
		break
	items = line.split()

	CurrentTimeStamp=items[0]+" "+items[1]+" "+items[2]
	TimeStamp=CurrentTimeStamp
	PrintIP=""
	while CurrentTimeStamp==TimeStamp:
		IP=items[9]
		if PrintIP=="" or PrintIP=="192.168.8.17":
			PrintIP=IP
		line=sys.stdin.readline()
		if not line:
			break
		items = line.split()
		TimeStamp=items[0]+" "+items[1]+" "+items[2]
	
	print CurrentTimeStamp+" "+PrintIP

Output
Dec 15 05:15:56 192.168.18.13
Dec 15 06:15:56 192.168.18.14
Dec 15 07:15:56 192.168.18.15
Dec 15 08:15:56 192.168.8.17


Should not be difficult to convert to another language.
In case the email trashes the spacing..
http://public.natserv.net/test.py
http://public.natserv.net/test.txt

Hope this is what you were looking for.



More information about the talk mailing list