Access to Script Name Within Awk Script 
Author Message
 Access to Script Name Within Awk Script

Most of the awk scripts I write are for standalone use outside the
context of a UNIX shell. Often, these scripts are used by persons
whose knowledge of awk is limited, but who may be required to adapt
a single script to multiple projects and different data. As a
consequence, I frequently want to contain some data--usually lists
of paired values--within the script itself, but in such a way that
the user of the script can easily modify those data.

The Bourne shell and its descendants have the "here document"
mechanism. Perl has the built-in file variable DATA and the special
internal value __END__ to allow this same kind of self-containment.
With awk, one can simulate these capabilities by explicitly reading
the script file and extracting data from specially-tagged comment
lines. Here's a simple example:

C:\>cat beast.awk
#% GARFIELD CAT
#% ELSIE    COW
#% LASSIE   DOG
#% PLUTO    DOG

BEGIN {
    while (getline <"beast.awk" > 0)
        if (/^#%/)
            Beast[$2] = $3

    for (Name in Beast)
        print "Beast[" Name "] is a " Beast[Name]

    exit(0)

Quote:
}

#% SNOOPY   DOG
#% PENNIE   HEN
#% ARNOLD   PIG
#% PORKIE   PIG

C:\>awk -f beast.awk
Beast[PORKIE] is a PIG
Beast[ELSIE] is a COW
Beast[PLUTO] is a DOG
Beast[PENNIE] is a HEN
Beast[ARNOLD] is a PIG
Beast[LASSIE] is a DOG
Beast[GARFIELD] is a CAT
Beast[SNOOPY] is a DOG

C:\>

In this contrived example, I've purposefully put data at both the top
and the bottom of the script to demonstrate that the placement of
tagged values within the script is insignificant. Also, in a practical
program, there would likely be some input file or files, and the script
wouldn't end abruptly with an exit() in the BEGIN rule.

The principal weakness of this technique is that, unfortunately,
the name of the script must be hard-wired into the script itself.
The script name is not included in the ARGV array, and there is no
built-in variable in "standard" awk (i.e., nawk) that contains the path
and name of the script. If the script is invoked from a directory other
than the one in which it is stored, or if the name of the script file
is changed, it will yield incorrect results.

This leads me to my question: Is there a way to generalize the
reference to the script file name within the awk script itself? I
realize there are several ways to feed the script name into the script
from the command line, and even more options if I invoke the script
from a UNIX shell. But I'm looking for something that is wholly
independent of the method of invocation--something akin to Perl's
built-in system variable $0. My question is, I think, rhetorical, as
I'm fairly certain that no such method exists in awk. I wonder, then,
if others feel that a built-in variable named, say, SCRIPTNAME (cf.
FILENAME) would be a nice feature to add to the language--one that
wouldn't negatively impact the size, speed, simplicity, and elegance
of awk.

In case it matters, I use MKS Awk, part of the MKS Toolkit for DOS,
Version 4.2h.

-----
Jim Monty

Tempe, Arizona USA



Thu, 06 May 1999 03:00:00 GMT  
 Access to Script Name Within Awk Script

...

Quote:
>The principal weakness of this technique is that, unfortunately,
>the name of the script must be hard-wired into the script itself.
>The script name is not included in the ARGV array, and there is no
>built-in variable in "standard" awk (i.e., nawk) that contains the path
>and name of the script. If the script is invoked from a directory other
>than the one in which it is stored, or if the name of the script file
>is changed, it will yield incorrect results.

>This leads me to my question: Is there a way to generalize the
>reference to the script file name within the awk script itself? I
>realize there are several ways to feed the script name into the script
>from the command line, and even more options if I invoke the script
>from a UNIX shell. But I'm looking for something that is wholly
>independent of the method of invocation--something akin to Perl's
>built-in system variable $0. My question is, I think, rhetorical, as
>I'm fairly certain that no such method exists in awk. I wonder, then,
>if others feel that a built-in variable named, say, SCRIPTNAME (cf.
>FILENAME) would be a nice feature to add to the language--one that
>wouldn't negatively impact the size, speed, simplicity, and elegance
>of awk.

Well, I think your question is interesting, since I've spent a fair amount
of time over the years thinking about these same kinds or problems
(encapsulating AWK scripts as "executables", particularly on systems such as
DOS that go out of their way to make it difficult to do so).

However, the purist in me wants to point out that this is an OS/environment
specific problem and isn't really an AWK question.  Although you don't say
it until the end, I'll assume your platform is DOS and your tool is MKS AWK.
Now, in fact, Thompson AWK for DOS (and a bunch of other platforms) does
give you a usable ARGV[0] - that does tell you the name of the AWK program
you are running, not, as the Unix AWK's all seem to do, the name of the
interpreter you are using.  So, my quick advice to you is:

        Just get Thompson AWK - you have no choice - you'll be glad you did.

(You may or may not recognize the humor in the above line...  If you do,
you're probably spending too much of your life reading Usenet news.)

Finally, I have to say that even though I admire the creativity of your
approach to hard-coding data into your script, the idea of letting
users edit my scripts, with their god-awful editors and fat thumbs,
leaves me queasy.  Why not put it into a plain data file, and set an
environment variable to point to the file?

************************************************************************
... and we thank you for your support.


          hundreds, if not thousands, of dollars, every time he posts -
************************************************************************
rwvpf wpnrrj ibf ijrfer



Fri, 07 May 1999 03:00:00 GMT  
 Access to Script Name Within Awk Script

Quote:


> ...
> >The principal weakness of this technique is that, unfortunately,
> >the name of the script must be hard-wired into the script itself.
> >The script name is not included in the ARGV array, and there is no
> >built-in variable in "standard" awk (i.e., nawk) that contains the path
> >and name of the script. If the script is invoked from a directory other
> >than the one in which it is stored, or if the name of the script file
> >is changed, it will yield incorrect results.

> >This leads me to my question: Is there a way to generalize the
> >reference to the script file name within the awk script itself? I
> >realize there are several ways to feed the script name into the script
> >from the command line, and even more options if I invoke the script
> >from a UNIX shell. But I'm looking for something that is wholly
> >independent of the method of invocation--something akin to Perl's
> >built-in system variable $0. My question is, I think, rhetorical, as
> >I'm fairly certain that no such method exists in awk. I wonder, then,
> >if others feel that a built-in variable named, say, SCRIPTNAME (cf.
> >FILENAME) would be a nice feature to add to the language--one that
> >wouldn't negatively impact the size, speed, simplicity, and elegance
> >of awk.

> Well, I think your question is interesting, since I've spent a fair amount
> of time over the years thinking about these same kinds or problems
> (encapsulating AWK scripts as "executables", particularly on systems such as
> DOS that go out of their way to make it difficult to do so).

> However, the purist in me wants to point out that this is an OS/environment
> specific problem and isn't really an AWK question.

Huh? My question is: Does the awk programming language provide an internal
mechanism for accessing the name of the script? It's very specifically an
awk question. I know several ways, some of which are operating system
dependent, to either feed the name of the script file to the script, or
grab the name of the script file from the environment within the script.
What I seek is a way to do this entirely within the context of the awk
language itself. In the same way that the language defines the built-in
variable FILENAME, which stores the name of the current input file, it
seems reasonable and desirable that the language should also define a
built-in variable that stores the name of the script file (e.g., a
variable named SCRIPTNAME).

Quote:
> [...]  Although you don't say
> it until the end, I'll assume your platform is DOS and your tool is MKS AWK.
> Now, in fact, Thompson AWK for DOS (and a bunch of other platforms) does
> give you a usable ARGV[0] - that does tell you the name of the AWK program
> you are running, not, as the Unix AWK's all seem to do, the name of the
> interpreter you are using.

I've shied away from TAWK (Thompson AWK), despite its many alluring
extensions, in part because I feared such rogue behavior. That the array
element ARGV[0] contains the name of the script and not the name of the
interpreter is a bastardization of the language, not an extension to it.

From _The AWK Programming Language_, p. 63:

     The command-line arguments are available to the awk program in a
     built-in array called ARGV. The value of the built-in variable
     ARGC is one more than the number of arguments. With the command
     line

          awk -f progfile a v=1 b

     ARGC has the value 4, ARGV[0] contains awk, ARGV[1] contains a,
     ARGV[2] contains v=1, and ARGV[3] contains b. ARGC is one more
     than the number of arguments because awk, the name of the
     command, is counted as argument zero, as it is in C programs.
     If the awk program appears on the command line, however, the
     program is not treated as an argument, nor is -f <filename> or
     any -F option. For example, with the command line

          awk -F'\t' '$3 > 100' countries

     ARGC is 2 and ARGV[1] is countries.

Quote:
> [...]  So, my quick advice to you is:

>    Just get Thompson AWK - you have no choice - you'll be glad you did.

> (You may or may not recognize the humor in the above line...  If you do,
> you're probably spending too much of your life reading Usenet news.)

I recognize the line, I think, from earlier posts of yours. But, somehow,
I missed the original joke, so I don't get the humor. Is it something that
can be explained to the uninitiated?

Quote:
> Finally, I have to say that even though I admire the creativity of your
> approach to hard-coding data into your script, the idea of letting
> users edit my scripts, with their god-awful editors and fat thumbs,
> leaves me queasy.

Me too! But there's no practical way for me, in my environment, to prevent
users from hacking the awk scripts I provide them. Knowing this, I prefer
to write scripts that are immanently modifiable. My idea to do something
like this

     #%  1 AL
     #%  2 AK
     #%  3 AZ
     #%  4 AR

     [...]

     #% 49 WI
     #% 50 WY

in lieu of this

     State[1]  = "AL"
     State[2]  = "AK"
     State[3]  = "AZ"
     State[4]  = "AR"

     [...]

     State[49] = "WI"
     State[50] = "WY"

is based in part on the presumption that the former method may be
more forgiving of "god-awful editors and fat thumbs" than the latter.

Quote:
> [...] Why not put it into a plain data file, and set an
> environment variable to point to the file?

Truthfully, I've never used the technique I described in my original
article of hard-coding data on specially tagged comment lines within
the script itself. And the reason I've never done it is simply that
there is no reliable way to know from within the script what the name
of the script file is. For dynamic data, I use external files, and for
comparatively more static data (e.g., a list of the states indexed by
number), I often use explicit assignment of elements of an array
with the BEGIN rule (as above).

-----
Jim Monty

Tempe, Arizona USA



Sun, 09 May 1999 03:00:00 GMT  
 Access to Script Name Within Awk Script

(I wrote)

Quote:
>> However, the purist in me wants to point out that this is an OS/environment
>> specific problem and isn't really an AWK question.

>Huh? My question is: Does the awk programming language provide an internal
>mechanism for accessing the name of the script? It's very specifically an
>awk question. I know several ways, some of which are operating system
>dependent, to either feed the name of the script file to the script, or
>grab the name of the script file from the environment within the script.

Well, OK.  Then the short answer is "No".  And it is like everything else
with AWK - Anytime you ask for an enhancement, the purists will scream
"bastardization" and "Use Perl!".  For what it is worth, although I disagree
with it, I understand what you are saying below about ARGV[0] being "taken",
but most ppl are going to see it as nit-picking of hte highest order.
I mean, geez, T-AWK gives you exactly what you want, in a reasonable
place (ARGV[0] is a long-standing Unix/DOS/whatever standard), and all
you do is spit on it.

Quote:

>I've shied away from TAWK (Thompson AWK), despite its many alluring
>extensions, in part because I feared such rogue behavior. That the array
>element ARGV[0] contains the name of the script and not the name of the
>interpreter is a bastardization of the language, not an extension to it.

Ah well - are we arguing theory here or are we arguing solving a problem?
It's funny because most of my posts are, in fact, theory, but most get
answered as if they were "do my homework for me".
...

Quote:

>>        Just get Thompson AWK - you have no choice - you'll be glad you did.

>> (You may or may not recognize the humor in the above line...  If you do,
>> you're probably spending too much of your life reading Usenet news.)

>I recognize the line, I think, from earlier posts of yours. But, somehow,
>I missed the original joke, so I don't get the humor. Is it something that
>can be explained to the uninitiated?

There's this big, long, absurd thread in the
DOS/Windows/OS2/Windows95/WindowsNT/Amiga/Mac/Commodore64 newsgroups about
the inevitability of Win95.  Some Win95 idiot started the thread; substitute
"Win95" for "Thompson AWK" in the above and you get the picture.

Quote:
>> Finally, I have to say that even though I admire the creativity of your
>> approach to hard-coding data into your script, the idea of letting
>> users edit my scripts, with their god-awful editors and fat thumbs,
>> leaves me queasy.

>Me too! But there's no practical way for me, in my environment, to prevent
>users from hacking the awk scripts I provide them. Knowing this, I prefer

One of the advantages of T-AWK is that it includes a "compiler" which allows
you to protect your source code.  But, come to think of it, haven't I heard
that later versions of MKS do that as well?

Quote:
>Truthfully, I've never used the technique I described in my original
>article of hard-coding data on specially tagged comment lines within
>the script itself. And the reason I've never done it is simply that
>there is no reliable way to know from within the script what the name
>of the script file is. For dynamic data, I use external files, and for
>comparatively more static data (e.g., a list of the states indexed by
>number), I often use explicit assignment of elements of an array
>with the BEGIN rule (as above).

Oh, so this is just theoretical at this point.  OK.  Well, what I usually do
is something like this:

        split("one two three four",nums)

************************************************************************
The Zapruder film was a{*filter*}film.


          hundreds, if not thousands, of dollars, every time he posts -
************************************************************************
rwvpf wpnrrj ibf ijrfer



Sun, 09 May 1999 03:00:00 GMT  
 Access to Script Name Within Awk Script

Quote:


>article of hard-coding data on specially tagged comment lines within
>the script itself. And the reason I've never done it is simply that
>there is no reliable way to know from within the script what the name
>of the script file is. For dynamic data, I use external files, and for
>comparatively more static data (e.g., a list of the states indexed by
>number), I often use explicit assignment of elements of an array
>with the BEGIN rule (as above).

I would consider any data that needs modifying dynamic data.  And for
that reason would put it in a separate file, even if it's only one line.
Why encourage users to look at your scripts by having them modify data
inside them?

But I agree that a SCRIPTNAME variable would be nice.  But what would
you set it to in the case of awk '{ blah }'.  I guess the name of the
temp file it stores it in it is does so, or the null string.

John.



Sun, 09 May 1999 03:00:00 GMT  
 
 [ 5 post ] 

 Relevant Pages 

1. Replacing a string from an input file within awk script

2. trying to call awk from within a shell script

3. Run awk from within sed script?

4. shell commands within awk script

5. using Oracle within an awk script

6. Simple awk command within a perl script

7. awk within ksh script

8. Running Awk scripts from within VB

9. Shell variable within an awk script

10. How to specify input file within awk script?

11. Calling scripts from within scripts, is this possible?

12. Running a TK script using wish80.exe within a tcl script

 

 
Powered by phpBB® Forum Software