Impossible Record Separator 
Author Message
 Impossible Record Separator

I am using gawk 3.03.  

At the beginning of a script I have:

RS = "$^"

Can I be certain that for any input (even random input),
that this record separator will NEVER be matched?  My intent is
for the entire file to be in the variable $0.  I then intend
to attack it with several gensub's, match's, and substr's.

All I am really asking for is confirmation that this will work
the way I intend.  If gawk is interpreting "$^" in a manner
different than I am expecting, I need to know.  I am expecting
it to be an impossible match.

I am also wondering how this will impact gawk's efficiency,
especially if the input files are a few megabytes long.

--
Wayne M. VanWeerthuizen
ICQ: 15117288
Homepage: http://www.*-*-*.com/



Sun, 01 Jul 2001 03:00:00 GMT  
 Impossible Record Separator


Quote:
(Wayne M. VanWeerthuizen) writes:
>I am using gawk 3.03.  

>At the beginning of a script I have:

>RS = "$^"

>Can I be certain that for any input (even random input),
>that this record separator will NEVER be matched?  My intent is
>for the entire file to be in the variable $0.  I then intend
>to attack it with several gensub's, match's, and substr's.

>All I am really asking for is confirmation that this will work
>the way I intend.  If gawk is interpreting "$^" in a manner
>different than I am expecting, I need to know.  I am expecting
>it to be an impossible match.

>I am also wondering how this will impact gawk's efficiency,
>especially if the input files are a few megabytes long.

Multiple input files on the command line will be treated as multiple records
when using an impossible RS.

You could add the following to ensure that entire files are read into one
variable. (FWIW, RS = "()" seems to be 'impossible'.)

FNR > 1 { print "RS =", RS, "didn't work!"; die = 1; exit }
. . .
END { if ( die ) exit(die); . . . }

Efficiency: the AWK language definition states that (generic) awk reparses $0
into fields every time $0 changes. That'll be frequently if you're manipulating
$0 directly. I'd suggest the following approach for creating a single variable
containing entire files.

#shouldn't matter what RS is; will effectively concatenate input files
NR == 1 { entire = $0 RS }
NR > 1 { entire = sprintf("%s%s%s", entire,  $0, RS) }
END { do_what_you_must_to(entire) }

You can still use split() if you want to parse entire into fields.



Sun, 01 Jul 2001 03:00:00 GMT  
 
 [ 2 post ] 

 Relevant Pages 

1. how to specify a blank line as record separator

2. Need help on record separator

3. Bracked-R Active File Record Separator

4. is there a record separator? RS

5. Perl-style input record separator in Python?

6. tclX: scancontext: Can I change the record separator?

7. records span multiple lines; Record Separator is row of asterisks

8. shutting down in VA4.5 impossible

9. HELP-RecursionLock - impossible situation?

10. IMPOSSIBLE ?

11. Tricky indexing question (probably impossible)

12. Converting Clipper to VB - Seems impossible though.

 

 
Powered by phpBB® Forum Software