Passing and getting vars. from C to awk 
Author Message
 Passing and getting vars. from C to awk

I am trying to find a slick way to pass arguments in and out of an awk
script from C.  Here is my problem.

I am writing a C-Motif interface that allows the user to build a typical
awk script of the form:
BEGIN { }
{

Quote:
}

END { }

On the C side of things, I have huge (possibly millions of values)
arrays that I want the user
to manipulate with their awk script to generate a new huge array.  I
know I could write
out the large arrays to files and then awk them with the script, rewrite
the new variable back to
a huge file and then read them back in with C.  I do not want to do this
since the size of the
ASCII files would be huge.

Is there any way I can call an awk script numerous times and play smart
tricks with
the BEGIN and end statements???  Any help would sure be appreciated.

Chris Grant



Sun, 22 Apr 2001 03:00:00 GMT  
 Passing and getting vars. from C to awk

Quote:

> I am trying to find a slick way to pass arguments in and out of an awk
> script from C.  Here is my problem.

> I am writing a C-Motif interface that allows the user to build a typical
> awk script of the form:
> BEGIN { }
> {

> }
> END { }

> On the C side of things, I have huge (possibly millions of values)
> arrays that I want the user
> to manipulate with their awk script to generate a new huge array.  I
> know I could write
> out the large arrays to files and then awk them with the script, rewrite
> the new variable back to
> a huge file and then read them back in with C.  I do not want to do this
> since the size of the
> ASCII files would be huge.

> Is there any way I can call an awk script numerous times and play smart
> tricks with
> the BEGIN and end statements???  Any help would sure be appreciated.

> Chris Grant


I don't know exactly what you're asking for,  but I have an awk
daemon called glossaryd that runs in the background connected to
2 fifos (read and write).  On startup, it builds a large associative
array from a file.  It processes requests,  matching data passed
in against the data stored in the array,  then it prints out that
data to the appropriate output fifo.

#!/usr/bin/awk -f
function help(helpmsg) {
fmt = sprintf("Format: %s", ARGV[0])
printf "%s [help] infifo outfifo\n",fmt
if (length(helpmsg)) print helpmsg
printf "%*s help     this message\n",length(fmt)," "
printf "%*s debug    print diagnostic messages\n",length(fmt)," "
printf "%*s infifo   input fifo\n",length(fmt)," "
printf "%*s outfifo  output fifo\n",length(fmt)," "
exit_immediately = 1
exit

Quote:
}

BEGIN {
if (2 == ARGC && "help" == ARGV[1])
   help("")

if (ARGC < 3)
   help("Fifo names required")

infifo  = ARGV[1]
outfifo = ARGV[2]
ARGC = 1

if (system ("test -p " infifo)) {
   print "No input fifo " infifo
   exit_immediately = 1
   exit
   }

if (system ("test -p " outfifo)) {
   print "No input fifo " outfifo
   exit_immediately = 1
   exit
   }

cmd = "echo ~/glossary/"
cmd | getline gd
close(cmd)

if (system ("test -d " gd)) {
   print "No glossary directory"
   exit_immediately = 1
   exit
   }
ls = "cd " gd ";ls !(RCS)"
while ((ls | getline LS[++lsct]) > 0);
close(ls)
if (!lsct) {
   print "No glossary files"
   exit_immediately = 1
   exit
   }

SUBSEP = "~"
# the name of the glossary file will determine which glossary to use
# am for applix macros, c for c, awk for awk, C++ for C++, sh for ksh
for (i=1;i<=lsct;i++) {
   g = LS[i]
   f = gd g
   while ((getline < f) > 0) {
      if (/^#/) continue
      gsub(/\\$/,"\a")
      if (/^[ ]/) {
         gsub(/^[ ]+/,"")
         G[g,key] = G[g,key] $0
         }
      else {
         key = ("am" == g) ? tolower($1) : $1
         G[g,key] = (NF < 2) ? "" : $0
         }
      }
   close(f)
   }

oldfs = FS
FS="~"
while (1)
   {
   getline < infifo
   if (1 == NF) {
      if ("quit" == $1) {
         close(infifo)
         close(outfifo)
         exit
         }
      }
   else if (NF >= 2) {
      g = $1
      p = ("am" == g) ? tolower($2) : $2
      oldtext = $2
      gsub(/(^ *)|( *$)/,"",p)
      offset = index(("am" == g) ? tolower(oldtext) : oldtext,p) - 1
      tab = (offset) ? sprintf("%*s",offset," ") : ""
      if (rp = G[g,p]) {
         gsub(/\\\\n/,"\r",rp)
         fct = split(rp,A,/\\n/)
         for (i=1;i<=fct;i++) {
            gsub(/\r/,"\\n",A[i])
            gsub(/\a$/,"\\",A[i])
            print tab A[i] > outfifo
            }
         close(outfifo)
         }
      else {
         if (split(p,P,oldfs) > 1) {
            if (rp = G[g,P[1]]) {
               gsub(/\\\\n/,"\r",rp)
               fct = split(rp,A,/\\n/)
               print oldtext > outfifo
               for (i=2;i<=fct;i++) {
                  gsub(/\r/,"\\n",A[i])
                  gsub(/\a$/,"\\",A[i])
                  print tab A[i] > outfifo
                  }
               close(outfifo)
               }
            else {
               print oldtext > outfifo
               close(outfifo)
               }
            }
         else {
            print oldtext > outfifo
            close(outfifo)
            }
         }
      }
   close(infifo)
   }

Quote:
}

END  {
if (exit_immediately) exit(exit_immediately)

Quote:
}

Opinions expressed herein are my own and may not represent those of my employer.


Sun, 22 Apr 2001 03:00:00 GMT  
 Passing and getting vars. from C to awk


% I am trying to find a slick way to pass arguments in and out of an awk
% script from C.  Here is my problem.

Possibly slick way to get them in: set environment variables called
1, 2, 3, etc and use the ENVIRON[] array.

I can't think of a slick way to pass the values back, though, but
you can avoid using files by opening awk with popen("...", "r") and
printing the output.

For this kind of thing, you really need a language which has an embedding
API. I like Rexx for this kind of thing -- it's easy to learn and easy
to embed -- but it's not to everybody's taste. For whatever reason,
lots of people use TCL for this kind of thing, and there's a new
contender from Brazil called lua. I believe Icon also has an API for
embedding in applications.

It would be an interesting experiment to extend gawk or mawk so that
it could work as an embedded language. You'd have to make the
interpreter loadable as a shared library, and provide a few functions:
 1 Start the interpreter running a given script. Ideally the script could
   be presented as a file, a chunk of memory, or p-code;
 2 Replace stdin and stdout with other i/o mechanisms;
 3 Interpret pipe and system() commands. This is so interaction with
   whatever application you're using could be done in a way that's consistent
   with the shell interaction of the existing language;
 4 Add new built-in functions;
 5 Get and set variable values.

On the chance somebody feels like doing this, here's a proposed API
for an embedded awk. It's geared towards being quick to implement, but
flexible enough to be useful. I appreciate your indulgence if you don't
care about this idea but don't know how to quit out of a message in your
news reader :)

 1: set up the awk variables, optionally call a user-specified function,
    run the script, then call another user-specified function. chunkomemory
    is just an argument which allows state variables or whatever to be
    passed to these user-defined functions. If beginfunc returns non-zero,
    the script doesn't run. The return code from awk_run is either the
    return code from endfunc, an error code if the script had a syntax error
    or whatever, or 0 if there's no enfunc and everything else was cool.

    The scriptout variable is to allow the script to be parsed or simply
    read from a file and then stuck in a chunk of memory allocated by
    the awk interpreter, so it never has to be parsed or read again.

    The scriptcontext argument is supposed to allow for more than one
    script to be run concurrently, supposing the interpreter were thread
    safe.

 typedef int (*afptr)(void * scriptcontext, void * chunkomemory);
 int awk_run(const char * script,       /* ptr to filename, script or p-code */
             int input_options,         /* tells which that was */
             const char ** scriptout,   /* parsed output of script */
             int output_options,        /* whether that should be p-code or text */
             int argc,                  /* ARGC */
             char ** argv,              /* ARGV[] */
             afptr beginfunc,           /* function to run before any BEGIN */
             afptr endfunc,             /* function to run after any END */
             void * chunkomemory        /* argument to beginfunc & endfunc */
             );

 2: hopefully someone will have better ideas on this. My thought was that
    you register a function which would be called when stdin or stdout
    is required (as by getline or print). The register function would have
    to be called by a beginfunc, the stdio functions would call awk_getline
    or awk_putline to do the actual IO. awk_getline would allocate the memory
    for the output, which would be part of some memory pool kept in the script
    context so the user doesn't have to free it, yet doesn't have to guess
    how big to make the buffer.

  int awk_register_stdiofunction(void * scriptcontext,
                                 int which,     /* STDIN or STDOUT */
                                afptr stdiofunc); /* func to register */
  int awk_getline(void * scriptcontext,
                  char **inptr);
  int awk_putline(void * scriptcontext,
                  char *outptr);

 3: This is to allow the application to handle print x | "ls" type
    commands. The command is passed as in cmd, and whichway
    specifies whether we're reading, writing, or closing the command.

 typedef int (*afcmdptr)(void * scriptcontext,
              void * chunkomemory,
              int whichway,     /* STDIN or STDOUT, SYSTEM or CLOSE */
              const char * cmd);
 int awk_register_cmd_handler(void * scriptcontext,
                              afcmdptr cmdhandler);

 4: To allow the application to add new built-in functions. I don't
    know if this is such a great idea, but people would want it, because
    people don't like writing command parsers. The name is to allow one
    C function to be invoked under a variety of names. Eg, an editor could
    have functions called text_color and text_colour, both invoked under
    the same name. The application is responsible for allocating and
    freeing the memory used for return codes (rc).

 typedef int (*affcnptr)(void * scriptcontext,
              void * chunkomemory,
              const char * name,/* name the function was invoked under */
              const int argc,   /* number of arguments */
              const char ** argv,/* array of arguments */
              char ** rc);      /* return code */

 int awk_register_fcn_handler(void * scriptcontext,
                              const char * name,
                              affcnptr cmdhandler);

 5: These would be called from any of the other functions to allow
    manipulation of awk variables. This is, of course, the thing
    that Chris Grant would use if someone were to implement it.

    Setting a value to NULL would have the effect of deleting the variable.

   /* get all the indices into an array */
   awk_get_array_indices(void * scriptcontext,
                  const char * arrayname,
                  char **indexlist, int * indexcount);
   /* get the value of arrayname[index] */
   awk_get_array_value(void * scriptcontext,
                       const char * arrayname,
                       const char * index,
                       char ** value);
   /* get the value of the variable */
   awk_get_variable_value(void * scriptcontext,
                       const char * varname,
                       char ** value);
   /* set the value of arrayname[index] */
   awk_set_array_value(void * scriptcontext,
                       const char * arrayname,
                       const char * index,
                       const char * value);
   /* set the value of the variable */
   awk_set_variable_value(void * scriptcontext,
                       const char * varname,
                       const char * value);

Now that I've thought about it a bit, I'm quite interested in knowing
if anyone has any ideas on the subject, and especially comments on
ways to make a simple but effective API to go with a simple but
effective language.

One possible problem I can see is that this is all using null-terminated
strings, but then, awk is like that.
--

Patrick TJ McPhee
East York  Canada



Tue, 24 Apr 2001 03:00:00 GMT  
 
 [ 3 post ] 

 Relevant Pages 

1. passing vars to awk script

2. HELP: Passing Unix vars to AWK

3. How do I get vars into awk?

4. #!/bin/awk -f style and shell vars

5. Shell vars into Awk executable script not so easy

6. Vars in an awk command/revisit

7. initialize vars to zero in awk (long)

8. passing shell vars to gawk

9. passing vars to shell script

10. passing run-time vars to RRUN

11. Accessing Vars in Group Passed by Address

 

 
Powered by phpBB® Forum Software