[nycbug-talk] Re: Simple sed question

Fri Jan 7 15:42:12 EST 2005

>>506     AllianceBer Intl PremGr B     AIPBX   8.29    -2.13   -1.54
>>507     AllianceBer Intl PremGr C     AIPCX   8.29    -2.24   -1.54
>>508     AllianceBer Intl PremGrAd     AIPYX   8.87    -1.88   -0.67
>>509     AllianceBer Intl Val A        ABIAX   14.91   5.59    9.79
>>510  AllianceBer Intl Val A    ABIAX   14.91   5.59    9.79
>>511 AllianceBer Intl Val A    ABIAX   14.91   5.59    9.79
>>512 AllianceBer Intl Val A    ABIAX   14.91   5.59    9.79
>>
>>Basically what I am trying to do is to have only from the description
>>onward.
>>506 through 509 have a tab
>>510 has 2 spaces
>>511 and 512 have a single space
>>
>>The data is coming from OCR and basically I am cleaning it up in sed so by
>>the time I get it to awk is in good shape. I figured out all the other
>>cleanups this is the only one have not figured out. :-(
>>    
>>
>
>The real question is how do you define the data into fields, what delimits
>fields and what delimits seperate sub fields in a field.   from looking at the
>data above  you have the 4 last fields and the first field are fixed
>and everything
>between them is field 2,  if that is correct then it is easy and not
>nessarly a  regex
>problem, and you can now turn it into a safer intermediate form(CSV
>for example).
> 
>untested code:
>awk '  { for( i=2 ; i< NF-4 ; i++) {
>              tmp_2 = sprintf( "%s %s",tmp_2, $i);}
>              printf "%s,%s,%s,%s,%s,%s\n", $1, tmp_2, $(NF-3),
>$(NF-2), $(NF-1), $NF
>}'  file
>
>marc
>  
>
>
>  
>
cool, good work Marc... I like the concept...