[nycbug-talk] Re: Simple sed question

Marc Spitzer mspitzer
Fri Jan 7 14:43:26 EST 2005


On Fri, 7 Jan 2005 13:16:38 -0500 (EST), Francisco Reyes
<lists at natserv.com> wrote:
> On Fri, 7 Jan 2005, a nice bug wrote:
> 
> > Francisco:
> >> New to sed/awk.. reading a book on them.
> >> Trying to convert:
> >> ### Text Text Text
> >>
> >> to
> >> ###<tab>Text Text Text
> >
> >
> > echo -e "Text Text Text" |sed -e 's/\(Text Text Text\)/\t\1/g'
> >
> > Just a general example for the tab replacement part (it could have
> > many further permutations depending precisely what you need..)
> 
> 
> Of course it would help if I had explained that "Text Text Text" are 3
> unknown columns and not literals. :-(
> Real data sample
> 506     AllianceBer Intl PremGr B     AIPBX   8.29    -2.13   -1.54
> 507     AllianceBer Intl PremGr C     AIPCX   8.29    -2.24   -1.54
> 508     AllianceBer Intl PremGrAd     AIPYX   8.87    -1.88   -0.67
> 509     AllianceBer Intl Val A        ABIAX   14.91   5.59    9.79
> 510  AllianceBer Intl Val A    ABIAX   14.91   5.59    9.79
> 511 AllianceBer Intl Val A    ABIAX   14.91   5.59    9.79
> 512 AllianceBer Intl Val A    ABIAX   14.91   5.59    9.79
> 
> Basically what I am trying to do is to have only from the description
> onward.
> 506 through 509 have a tab
> 510 has 2 spaces
> 511 and 512 have a single space
> 
> The data is coming from OCR and basically I am cleaning it up in sed so by
> the time I get it to awk is in good shape. I figured out all the other
> cleanups this is the only one have not figured out. :-(

The real question is how do you define the data into fields, what delimits
fields and what delimits seperate sub fields in a field.   from looking at the
data above  you have the 4 last fields and the first field are fixed
and everything
between them is field 2,  if that is correct then it is easy and not
nessarly a  regex
problem, and you can now turn it into a safer intermediate form(CSV
for example).
 
untested code:
awk '  { for( i=2 ; i< NF-4 ; i++) {
              tmp_2 = sprintf( "%s %s",tmp_2, $i);}
              printf "%s,%s,%s,%s,%s,%s\n", $1, tmp_2, $(NF-3),
$(NF-2), $(NF-1), $NF
}'  file

marc
  



> _______________________________________________
> % NYC*BUG talk mailing list
> http://lists.nycbug.org/mailman/listinfo/talk
> %Be sure to check out our Jobs and NYCBUG-announce lists
> %We meet the first Wednesday of the month
>




More information about the talk mailing list