calling functions from gawk extensions 
Author Message
 calling functions from gawk extensions

I would like to call gawk functions from a dynamically loaded C
extension for
gawk 3.1.x. After reading the gawk source, I think r_tree_eval(NODE
*tree, int iscond) function in eval.c is the one to use (which in turn
calls func_call). But I am not sure how to construct the NODE
argument. tree->rnode seems like the name of the function which can be
made using tmp_string. tree->lnode contains the arguments and I am
clueless about this one. Don't know bison, so can't really figure it
out looking the parsing code. Any help/suggesstion  will be greatly
appreciated.

Thanks.



Mon, 14 Feb 2005 22:41:54 GMT  
 calling functions from gawk extensions

Quote:

>I would like to call gawk functions from a dynamically loaded C
>extension for gawk 3.1.x. After reading the gawk source, I think
>r_tree_eval(NODE *tree, int iscond) function in eval.c is the one to
>use (which in turn calls func_call). But I am not sure how to construct
>the NODE argument. tree->rnode seems like the name of the function
>which can be made using tmp_string. tree->lnode contains the arguments
>and I am clueless about this one. Don't know bison, so can't really
>figure it out looking the parsing code. Any help/suggesstion will be
>greatly appreciated.

As they often say in these NGs, please show us what you've got so far, and
we'll critique it.  I.e., it is hard to tell how far along in this process
you already are.  Since this isn't a supported idea/feature, I can't think
of any way to go about it other than by poking through the source code and
doing a lot of trial-and-error.  FWIW, I don't think bison has anything to
do with it; bison is involved with the parsing of the GAWK source code;
presumably, you will be building up arg lists in C code, rather than by
trying to parse text.

Also, are you primarily interested in calling GAWK built-ins (*), or
user-defined functions?  And, ultimately, what problem are you trying to
solve?

(*) If so, I don't see the point, since most of the GAWK built-ins have C
counterparts and it would be easier to just call those counterparts directly.



Tue, 15 Feb 2005 01:24:44 GMT  
 calling functions from gawk extensions

Quote:



> >I would like to call gawk functions from a dynamically loaded C
> >extension for gawk 3.1.x. After reading the gawk source, I think
> >r_tree_eval(NODE *tree, int iscond) function in eval.c is the one to
> >use (which in turn calls func_call). But I am not sure how to construct
> >the NODE argument. tree->rnode seems like the name of the function
> >which can be made using tmp_string. tree->lnode contains the arguments
> >and I am clueless about this one. Don't know bison, so can't really
> >figure it out looking the parsing code. Any help/suggesstion will be
> >greatly appreciated.

> As they often say in these NGs, please show us what you've got so far, and
> we'll critique it.  I.e., it is hard to tell how far along in this process
> you already are.  Since this isn't a supported idea/feature, I can't think
> of any way to go about it other than by poking through the source code and
> doing a lot of trial-and-error.  FWIW, I don't think bison has anything to
> do with it; bison is involved with the parsing of the GAWK source code;
> presumably, you will be building up arg lists in C code, rather than by
> trying to parse text.

> Also, are you primarily interested in calling GAWK built-ins (*), or
> user-defined functions?  And, ultimately, what problem are you trying to
> solve?

> (*) If so, I don't see the point, since most of the GAWK built-ins have C
> counterparts and it would be easier to just call those counterparts directly.

Here is an example. Basically I would like to use callback functions
written
in gawk from C extensions.

---------------------------------------------------------------------
Gawk code:

function callme(x)
{
     ......

Quote:
}

BEGIN {
     extension("./test.so", "dlload")  #load the gawk extension from
test.c (SEE BELOW)
     register_callback("callme", ...)
     myfunc(y, z)  # call the extension function myfunc that in turn
calls callme

Quote:
}

---------------------------------
test.c:

#include "awk.h"

static NODE *
do_register_callback(NODE *tree)
{
   #here i keep track of callback function names and no of args etc.

Quote:
}

static NODE *
do_myfunc(NODE *tree)
{
     NODE *arg1, *arg2;
     arg1 = get_argument(tree, 0);
      .......
      #HERE I would like to call the registered callback function
"callme"

      .....

Quote:
}

NODE * dlload(NODE * tree, void *dl)
{
    make_builtin("register_callback", do_register_callback, 2);
    make_builtin("myfunc", do_myfunc, 2);
    return tmp_number((AWKNUM) 0);

Quote:
}

OK, the question is how do I call the callback function. I did look
at the builtin functions,they take an argument of NODE *tree,
which contains the prameter list. For user defined functions, I
beleive
we have to pass the name of the function in addition to the parameter
list.
The question is how do I construct this NODE tree?

Thanks.



Tue, 15 Feb 2005 08:56:48 GMT  
 calling functions from gawk extensions
Hello,

Quote:


> > Don't know bison, so can't really figure it out looking the parsing code.
[...]
> FWIW, I don't think bison has anything to
> do with it; bison is involved with the parsing of the GAWK source code;

Being able to read awkgram.y helps a lot.  One can simply read what is
built when function call is parsed and do the same in C code.

I believe that if john were able to catch the relevant parts of awkgram.y,
he would put it together with what he dig out of eval.c and the would be
no need to ask us for help at all.

So his comment about bison really hit the spot.

Stepan Kasal



Tue, 15 Feb 2005 16:41:52 GMT  
 calling functions from gawk extensions

Quote:
> I would like to call gawk functions from a dynamically loaded C
> extension for gawk 3.1.x.

Hello john,
        here are several unsorted pieces of information:

Node_func_call
        lnode: arglist
        rnode: func_name

arglist is NULL for no parameters, or it is an node of type
Node_expression_list:
        lnode: value
        rnode: next node of type Node_expression_list, or NULL

value is a node of type Node_val or Node_var_array.

I should say explicitely that the head of the expression list
(ie. lnode of the func call) contains pointer to the first
argument, the next to the second, and so on.

func_name is a Node_val node containing the function name; you can
set it like this:
        hook_call->rnode = make_string(name, strlen(name));

The nodes I called "value" (lnodes of the expression list) can also
be constructed by make_string or make_number.

To actually call the function, please call tree_eval(hook_call),
do not call r_tree_eval() directly.  tree_eval is a macro which
calls r_tree_eval for all non-trivial work; such macro-function
pairs are gawk way to speed up the code by inlining the most
frequently used part of each function.

Note 1:  You know better how much you should think about memory leaks.
In case you experience problems, post the code and test data (or
publish it via ftp or http, if they are bigger).

Note 2:  I'd like to change this mechanism for future versions of
gawk.  In case Arnold Robbins accepts my not-yet submitted patch, the
example above would change this way:
        hook_call->vname = name; hook_call->rnode = NULL;
so no Node_val nor make_string would be necessary but you would have to
re-set rnode to NULL each time youchange vname.
(Thus in case you have problems with a gawk version >= 3.1.2, recall
this advice.)

HTH,
        Stepan Kasal



Tue, 15 Feb 2005 17:30:07 GMT  
 calling functions from gawk extensions

Quote:


> > I would like to call gawk functions from a dynamically loaded C
> > extension for gawk 3.1.x.

> Hello john,
>    here are several unsorted pieces of information:

> Node_func_call
>    lnode: arglist
>    rnode: func_name

> arglist is NULL for no parameters, or it is an node of type
> Node_expression_list:
>    lnode: value
>    rnode: next node of type Node_expression_list, or NULL

> value is a node of type Node_val or Node_var_array.

> I should say explicitely that the head of the expression list
> (ie. lnode of the func call) contains pointer to the first
> argument, the next to the second, and so on.

> func_name is a Node_val node containing the function name; you can
> set it like this:
>    hook_call->rnode = make_string(name, strlen(name));

> The nodes I called "value" (lnodes of the expression list) can also
> be constructed by make_string or make_number.

> To actually call the function, please call tree_eval(hook_call),
> do not call r_tree_eval() directly.  tree_eval is a macro which
> calls r_tree_eval for all non-trivial work; such macro-function
> pairs are gawk way to speed up the code by inlining the most
> frequently used part of each function.

> Note 1:  You know better how much you should think about memory leaks.
> In case you experience problems, post the code and test data (or
> publish it via ftp or http, if they are bigger).

> Note 2:  I'd like to change this mechanism for future versions of
> gawk.  In case Arnold Robbins accepts my not-yet submitted patch, the
> example above would change this way:
>    hook_call->vname = name; hook_call->rnode = NULL;
> so no Node_val nor make_string would be necessary but you would have to
> re-set rnode to NULL each time youchange vname.
> (Thus in case you have problems with a gawk version >= 3.1.2, recall
> this advice.)

> HTH,
>    Stepan Kasal

I just figured it out. To call a gawk function "doit" that takes no
argument, I
have this in the C code

     NODE *mytree = node(0, Node_call_func, tmp_string("doit", 4));
     tree_eval(mytree);

Yes, my current problem is gawk's memory management. For example, the
node
created by tmp_string in the code above, is the memory freed on return
from
the function? do I need to call free_temp? Seems like free_temp only
frees
node with TEMP flag set. So if I do a dupnode on it, it just removes
the
TEMP flag, then how do I free it? Was trying to follow the no of
references to a particular node, but did not go very var. I can make
the extension functions to work one way or other but want to be 100%
sure there is no mem leak to sleep well at night. It will be nice to
have a word or two on memory management
in the gawk documentation. Probably will manage the node creation
stuff
myself, seems less work (reinventing the wheel ?). May be that won't
work.

     So far it has been an academic exercise, just came across the
extension
mechanism in gawk doc couple of days ago. It seemed quite powerful,
one
can easily write a gawk extension for say Expat to do xml parsing. The
thing
that is in my head is an extension for the UW c-client library for
mailbox
access (will send it to arnold for inclusion in the ftp site if ...).

     Another thing I would like to ask gawk developers is about the
'include file' mechanism. Is there anyway to have a builtin include
mechanism in gawk?
You really cant use the igawk shell script for example in a cgi
environment.
I did find a way after modifying igawk (its now a gawk script that
 expands the included files and feeds the whole thing to a second gawk
process. Also you have to take care of the POST data.)

     Anyway, thank you very much. Oh on you advice, I can always stick
with gawk-3.1.1 for the rest of my life.

John.



Wed, 16 Feb 2005 00:43:00 GMT  
 calling functions from gawk extensions
Hello,


Quote:
> I just figured it out. To call a gawk function "doit" that takes no
> argument, I have this in the C code

>      NODE *mytree = node(0, Node_call_func, tmp_string("doit", 4));
>      tree_eval(mytree);

I'd use "make_string" in this situation, there is nothing temporary here.
This call allocates two nodes, so in case you want you'd free them
this way:
        freenode(mytree->rnode);
        freenode(mytree);

Quote:
> Yes, my current problem is gawk's memory management. For example, the
> node created by tmp_string in the code above, is the memory freed on
> return from the function? do I need to call free_temp? Seems like
> free_temp only frees node with TEMP flag set. So if I do a dupnode on
> it, it just removes the TEMP flag, then how do I free it?

It took me some time to understand this.  Hope I got it right.
I'll try to explain:

Chapter ZERO
------------
     There is no garbage collector, so you have to free what you've
allocated, explicitly (getnode) or implicitly (node, make_string,...).

Chapter I: Trivial model
------------------------
As a first approximation, let's look at it this way:
- getnode() (and node(), make_string(),...) allocate a new node
- freenode() frees the node
- dupnode() creates a copy of the node.

Quite simple, isn't it?

Chapter II: Temp nodes
----------------------
      It's quite often that we need a node to store a value.  So we
create a new node, assign it the value, make the work, and then free
the node.
      But it's in fact quite often that the value of the new node is
in fact value of another node, so we'd like to use that node, instead
of allocating a new one.
      Thus we either use an existing or allocate a new, remembering
whether the node has been allocated.  Then we can perform all the work
and when night comes and it's time to clean up toys before going to
bed, we recall whether the node has been allocated and call freenode()
if necessery.

Chapter III: The TEMP flag
--------------------------
      The pattern outlined above is not very elegant: you have to use
a local variable each time just to store that bit of information.
It would be much better to store the bit with the node in concern.
So if the node is already available, we suppose that its TEMP flag is
not set.  OTOH, when we create a new node, we set its TEMP flag.
And at the end we can simply do:

        if ((node->flags)&TEMP) freenode(node);

which is what free_temp() actually performs.

Chapter IV: Strings
-------------------
      Under the trivial model defined in Chapter I, when we free a node
containg string, the freenode() function has to free the string first,
before freeing the actual NODE struct.  dupnode(), OTOH, has to
duplicate not only the node, but also the actual text the node points to.
And that may be quite ineffective.
      So an optimization has been implemented: dupnode() doesn't copy the
text, it even doesn't copy the node itself, it just increments the stref
count.  freenode() thus may not free the node (and text) blindly, it has
to examine the stref count:
        if (--node->stref == 0) { free(node->strptr); free(node); }

Chapter V: Other dupnode() tricks
---------------------------------
      Why does dupnode have to copy the node?  The two copies has to
live independent lives, any of them may be freed while the other one
may survive.
      So the first simple trick goes like this: if the node has PERM
flag, we may be sure that that node will never be freed.  Thus we may
ignore all requests for duplicating and return happily the pointer to
the node itself.  It may never happen that the pointer will become
invalid.
      The second trick is more complicated:  what if we are requested
to duplicate a TEMP node?  From the Chapters II and III we see that
it would be incorrect to copy the node with the TEMP flag.  We have
to clear it on the copy.  The copy is a normal node, while the original
will soon be deallocated.
      In such a case, we could regret that the node wasn't available
a while ago---we wouldn't need to create the temp node at all.  But...
wow!  We can avoid copying, just clear the TEMP flag!  That way we
get to the same situation as if there was no need to create the temp
node.  (And the free_temp() call at the end will do nothing.)

Conclusion
----------
I'm afraid this is a bit confused explanation.
Try to flood me with questions, perhaps you get more that way...

Quote:
> but want to be 100% sure there is no mem leak to sleep well at night.

Ask Arnold about his sleep. :-)
There are various methods for detecting memory leaks.  I know none of
them but Arnold seems to have performed such debugging.  Perhaps he
could suggest you something which can be easily used with extensions.

I'd just test the program and look at the size of the process.
If it doesn't increase too much, we can hope our memory leaks won't
kill the system.

Quote:
> It will be nice to have a word or two on memory management
> in the gawk documentation.

I cannot help with this, for several reasons:
1) I'm not very sure about the topic.  I've deduced some facts from
the source and wrote these above, but I may be wrong.  My experience
with gawk is too short.
2) I'm fighting on the other side, looking for ways to clean up things
inside the gawk source.  I suppose that Arnold will reject some of
my patches on the ground of braking the extension interface.
So I'll try to break your life space but don't worry, Arnold will try
to protect you.  :-)
3) Though I've read various portions of gawk source, I've not yet read
the manual for extension writers.
4) I'm not a native speaker so my English has to be brushed up, at least.

Perhaps you could be a very good person for writing the portion of the
manual which will explain the memory management, esp. from extension
writer's point of view.
Feel free to compile any of my posts in.  Making a patch to gawk.texi
would be ideal but you can also write a simple text or html file.

I'm sure that Arnold will read it and correct your mistakes, it takes
much less time then witing the document from scratch.
I'd also be delighted if I could review it.

Then you assign the copyright of the document to FSF, or publish it on
your web page, ad Arnold can include/refer it.

Quote:
>      Another thing I would like to ask gawk developers is about the
> 'include file' mechanism. Is there anyway to have a builtin include
> mechanism in gawk?
> You really cant use the igawk shell script for example in a cgi
> environment.

Sure.  When I was in a similar situation, I used make to create *.awk
files from *.iawk files.  That way I lost all the advantages of working
with interpreted language.

But since the main driving script was a Makefile anyway, I haven't mind
this too much.  Within CGI environment you cannot call make each time
the cgi script is invoked.

It seems interesting, I'll write it to my todo.
(But it doesn't mean anything, sorry, it'll take months till I get to it.)

What do you, Arnold, think about built-in include?

Quote:
>      Anyway, thank you very much. Oh on you advice, I can always stick
> with gawk-3.1.1 for the rest of my life.

:-)  But that way you'll never get built-in include.  ;-)

Quote:
> John.

HTH,
        Stepan


Fri, 18 Feb 2005 17:13:14 GMT  
 calling functions from gawk extensions
.......

1. On the use of tmp_string versus make_string, this is my current
understanding. tmp_string should be used only for returning values
from functions via set_value(..). One should use make_string for any
other purpose and clean up using unref(..) not freenode(..). freenode
does not know anything about the contents of a node, it just returns
the memory for the node( not the memory for the string itself) to the
global pool.

2. The extension that I am trying to write mostly involves writing
gawk wrappers for corresponding c functions and requires translating C
structures to gawk arrays and vice versa. If there is an alternative
to using arrays, please let me know. For a  C structure with an
embedded C structure, I am using a two-dimensional(?) gawk array.
Consider these C structures:

struct envelope {                       struct address {
      struct address *from;                   char *mailbox;
      struct address *to;                     char *host;
        ...                                     ....

Quote:
}                                       }

I have envelope["from", "mailbox"], envelope["to", "mailbox"] and so
on. Any other ideas? By the way, there is no make_array(..) function
to create an array node in gawk source (I did not find a place where
an array is made from scratch other than possibly in the parsing
code). So this is what I did:

static unsigned long __nv;
static char vname[100];

#define make_array(n) \                                      
do { \
int len;
snprintf(vname, 100, "e%d%s", ++__nv, #n);
getnode(n);
(n)->type = Node_var_array;
(n)->flags &= ~SCALAR;
(n)->var_array = NULL;
(n)->array_size = (n)->table_size = 0;
emalloc((n)->vname, char *, len+1, "make_array");
memcpy((n)->vname, vname, len);
(n)->vname[len]='\0';

Quote:
} while(0)

To cleen up:   free_array(n) {  assoc_clear(n); freenode(n); }
Please comment on this, specially if there is any possiblity of memory
leak.

3. Random thoughts on array implementation in gawk:

array nodes can be handled the same way as string nodes. A string node
keeps a pointer to the actual string object and a refrence counter. So
an array node can keep a pointer to the actual array
structure(object?) in additon to the counter. Then there should be no
need to distinguish between Node_var_array and Node_array_ref.

        Two array nodes referring to the same array object:

  |Array Node 1|      | actual array object |        |Array Node 2 |
  |            |      |                     |        |             |
  | a_value --------->|  array_size         |<----------a_value    |
  | a_ref=2    |      |  var_array[]        |        |  a_ref=2    |
  |  .....     |      |   .......           |        |   .......   |
  |-------------       ---------------------          --------------

Other probable benefits (if implemented):
     1) One can write in a gawk script:
          a[1] = ..
          a[2] = ..
          b = a   #'b' is the same as 'a', a reference                
       Then how does one get a fresh copy of an array. Either use a
for loop, or add a new operator/function like b = copy(a).
     2) an array as an element of another array.I do not see a reason
why this should not be allowed.
     3) true multidimensional arrays (Am I out of line here? has been
at least 10 years since tried to write C interpreter).

My apology if none of these make any sense.

4. On extension API, extension writers will need a fairly complete and
stable interface, and documentation that includes a description of the
parse tree (ASCII art will be great). You can't go very far without an
understanding of the parse tree structure. It will take a lot of
effort to hide this from extension writers. The source should be made
unavailble to extension writers :). Here is an example of thing that
can happen. Since i needed to make a two-dimensional array, I have the
subscript made by concatenating the two subscripts seperated by the
SUBSEP string. This is ok (i kind of like it, lot cheaper) as long as
the gawk implementation of 2D array does not change (should be using
concat_exp,then you have to understand the tree structure).

Quote:
> 4) I'm not a native speaker so my English has to be brushed up, at least.
....
> Perhaps you could be a very good person for writing the portion of the
> manual which will explain the memory management, esp. from extension
> writer's point of view.

    I would first like to complete the extension. I will also need
gawk code to parse MIME messages, thinking about translating the
relevant parts from a perl Module (by the way, the source of
inspiration for all of these is the perl module Mail::Cclient). I am
not a gawk(awk) expert so that may take a while. I will be happy to
share my experience with future extension writers. On contributing to
gawk manual, I can only promise to add it to my todo list. Maybe will
find enough motivation and free time to give it a try. I am not a
native english speaker either, so definitely will need help there.

Bye for now. I have spent my vacation digging through gawk source(:.

Thank you very much. Your comments have been very helpful, and I
certainly appreciate your involvement with the development of gawk.

John.



Sat, 19 Feb 2005 08:48:27 GMT  
 calling functions from gawk extensions
Hello,


Quote:
> 1. On the use of tmp_string versus make_string, this is my current
> understanding. tmp_string should be used only for returning values
> from functions via set_value(..). One should use make_string for any
> other purpose

I think there are also other situation then return value when you can
use tmp_string but I'm not sure.

Quote:
> and clean up using unref(..) not freenode(..). freenode
> does not know anything about the contents of a node, it just returns

Sure.  I apologize for that silly mistake.

Quote:
> to create an array node in gawk source (I did not find a place where
> an array is made from scratch other than possibly in the parsing
> code).

I think you are right, there is no place where a new array is created.
(Except some spots in awkgram.y, but these are a bit buggy at the moment.)

The typical life of an array looks like this:
When new array is born, it's just an ordinary variable and noone knows
that it's destined to be an array.  It's created like this:

        node(Nnull_string, Node_var, (NODE *) NULL);

See either the function variable() at the bottom of awkgram.y, or the
end of function push_args in eval.c.

Then it becomes an array and its type changes to Node_var_array.
See assoc_lookup in array.c or do_split in field.c.
Similar code should be also in other places but it isn't (again, I have
a patch stuffed which I should send to Arnold).
(The value (alias lnode) might be unref()-ed, but it doesn't matter
in this case since Nnull_string is a permanent node.)

Quote:
> So this is what I did:

> static unsigned long __nv;
> static char vname[100];

> #define make_array(n) \                                      
> do { \
> int len;
> snprintf(vname, 100, "e%d%s", ++__nv, #n);
> getnode(n);
> (n)->type = Node_var_array;
> (n)->flags &= ~SCALAR;
> (n)->var_array = NULL;
> (n)->array_size = (n)->table_size = 0;
> emalloc((n)->vname, char *, len+1, "make_array");
> memcpy((n)->vname, vname, len);
> (n)->vname[len]='\0';
> } while(0)

> To clean up:   free_array(n) {     assoc_clear(n); freenode(n); }
> Please comment on this, specially if there is any possiblity of memory
> leak.

I see a memory leak with vname.  You should also free(n->vname).

But I wouldn't make such a fuss about vname.  It's used for error
messages only, so there is no need to make anything more then simple

        (n)->vname = "my array";

unless you want more descriptive error messages.
Also note that getnode() initializes flags.

So this should be sufficient:

#define make_array(n) \
do { \
getnode(n);
(n)->type = Node_var_array;
(n)->var_array = NULL;
(n)->array_size = (n)->table_size = 0;
(n)->vname = "an_array";

- Show quoted text -

Quote:
} while(0)
> 3. Random thoughts on array implementation in gawk:

> array nodes can be handled the same way as string nodes. A string node
> keeps a pointer to the actual string object and a refrence counter. So
> an array node can keep a pointer to the actual array
> structure(object?) in additon to the counter. Then there should be no
> need to distinguish between Node_var_array and Node_array_ref.

>    Two array nodes referring to the same array object:

>   |Array Node 1|      | actual array object |        |Array Node 2 |
>   |            |      |                     |        |             |
>   | a_value --------->|  array_size         |<----------a_value    |
>   | a_ref=2    |      |  var_array[]        |        |  a_ref=2    |
>   |  .....     |      |   .......           |        |   .......   |
>   |-------------       ---------------------          --------------

I have also thought about this idea.  The obvious problem is that you
have your reference count (a_ref) in each node.  The reference count
must be in the actual array object, since there is no sane way to keep
all copies consistent otherwise.

So what you call "Array Node" is actually Node_array_ref and what you
call "actual array object" is Node_var_array.  The only changes you
propose is that Node_var_array should have a referecne count and that
each new array should also have one Node_array_ref.

You are right that these changes would give us the possibility to
work with arrays as whole objects.  The cost would be fairly small.
OTOH, unless we need these extensions to awk, we can avoid this
(albeit small) cost.

Quote:
>           b = a   #'b' is the same as 'a', a reference                

I'm afraid this would bring much confusion.

Quote:
>        Then how does one get a fresh copy of an array. Either use a
> for loop, or add a new operator/function like b = copy(a).
>      2) an array as an element of another array.I do not see a reason
> why this should not be allowed.
>      3) true multidimensional arrays (Am I out of line here? has been
> at least 10 years since tried to write C interpreter).

You are right, as far as my limited experience allows me to understand.

But that would change the nature of the awk language.  The variables
are not typed, the values are not typed.
(Though the arrays don't fully fit into this.)

I think there is no will to shift awk that much.  POSIX awk has no types
and GNU awk, though have various extensions, tries to be implementation
of the awk language, without changing its base assumptions.

You may want to look at TAWK, it has dynamic types and I'd guess it has
at least some of the things you mentioned above.  It's not free software,
though.

Quote:
> My apology if none of these make any sense.

They make sense, of course.  The just aren't conservative enough ;-)

Quote:
> 4. On extension API, extension writers will need a fairly complete and
> stable interface, and documentation that includes a description of the

I understand your point.  I think Arnold does as much as he can with
respect to this.  Don't know whether other people help him.

I'm definitely not sure enough about the data structures at the moment,
so I cannot help with this.  I'm rather in the mood to fiddle/brush/break
the data structures.

Quote:
> can happen. Since i needed to make a two-dimensional array, I have the
> subscript made by concatenating the two subscripts seperated by the
> SUBSEP string. This is ok (i kind of like it, lot cheaper) as long as
> the gawk implementation of 2D array does not change

You may be fairly sure the awk language won't change in this aspect (no
true 2D arrays).
So it seems highly probable that the implementation won't change either.

Quote:
> (should be using
> concat_exp, then you have to understand the tree structure).

It's the same Node_expression_list as for list of parameters to
a function call.

But it probably doesn't pay off to construct the tree in such case.
It's simpler to make the concatenation in C, I guess.

Quote:
> not a gawk(awk) expert so that may take a while. I will be happy to
> share my experience with future extension writers. On contributing to
> gawk manual, I can only promise to add it to my todo list. Maybe will

That's enough, of course.  I just wanted to emphasize the openness of
the process.

Quote:
> Bye for now. I have spent my vacation digging through gawk source(:.

Hope you enjoyed it.

Quote:
> Thank you very much. Your comments have been very helpful, and I
> certainly appreciate your involvement with the development of gawk.

I'm very delighted to hear this.  Thank you for these words.

Stepan



Sat, 19 Feb 2005 15:50:54 GMT  
 calling functions from gawk extensions

Quote:


>> > I would like to call gawk functions from a dynamically loaded C
>> > extension for gawk 3.1.x.

The extensions API was done by someone else for gawk 3.0.0. I finally
got it out to the world for 3.1, but I make no claims that it was
well designed.  This thread clearly shows the need for an API
into the gawk internals, so that reading the gawk code isn't necessary.

Designing such is on my maybe-one-day todo list, but it's not high
up there.   Clearly we need a routine for calling an awk function
from C, and also one for dealing with the symbol table in a sane
fashion.

Quote:
>     Another thing I would like to ask gawk developers is about the
>'include file' mechanism. Is there anyway to have a builtin include
>mechanism in gawk?

        Use the Source, Luke.
                -- Obi Wan Stallman

This would have to be done in the gawk lexer.  I see no easy,
backward compatible way to do this, and the lexer is already
somewhat crufty.

Quote:
>You really cant use the igawk shell script for example in a cgi
>environment.
>I did find a way after modifying igawk (its now a gawk script that
> expands the included files and feeds the whole thing to a second gawk
>process. Also you have to take care of the POST data.)

If this works for you, and is fast enough, then leave things alone.

Arnold
--

P.O. Box 354            Home Phone: +972  8 979-0381    Fax: +1 928 569 9018
Nof Ayalon              Cell Phone: +972 51  297-545
D.N. Shimshon 99785     ISRAEL



Sun, 20 Feb 2005 19:07:52 GMT  
 calling functions from gawk extensions

Quote:

>2. The extension that I am trying to write mostly involves writing
>gawk wrappers for corresponding c functions and requires translating C
>structures to gawk arrays and vice versa. If there is an alternative
>to using arrays, please let me know. For a  C structure with an
>embedded C structure, I am using a two-dimensional(?) gawk array.
>Consider these C structures:

>struct envelope {                   struct address {
>      struct address *from;               char *mailbox;
>      struct address *to;                 char *host;
>    ...                                     ....
>}                                   }

>I have envelope["from", "mailbox"], envelope["to", "mailbox"] and so
>on. Any other ideas?

This is a perfectly good use of awk arrays.

- Show quoted text -

Quote:
>3. Random thoughts on array implementation in gawk:

>array nodes can be handled the same way as string nodes. A string node
>keeps a pointer to the actual string object and a refrence counter. So
>an array node can keep a pointer to the actual array
>structure(object?) in additon to the counter. Then there should be no
>need to distinguish between Node_var_array and Node_array_ref.

>    Two array nodes referring to the same array object:

>  |Array Node 1|      | actual array object |        |Array Node 2 |
>  |            |      |                     |        |             |
>  | a_value --------->|  array_size         |<----------a_value    |
>  | a_ref=2    |      |  var_array[]        |        |  a_ref=2    |
>  |  .....     |      |   .......           |        |   .......   |
>  |-------------       ---------------------          --------------

I tried something like this at one point, and it broke horribly.  The
Node_array_ref thing maintains correct semantics and doesn't leak memory.

Part of the problem are the unions in the NODE structure. Making arrays
work as you describe would have required yet another field (a reference
count in the actual array object, as Stepan pointed out).  It wasn't
worth it, at least not for me.

Quote:
>Other probable benefits (if implemented):
>     1) One can write in a gawk script:
>          a[1] = ..
>          a[2] = ..
>          b = a   #'b' is the same as 'a', a reference                

Nope, that would not be awk.  Even if possible, I wouldn't break
the semantics that way.

Quote:
>     2) an array as an element of another array.I do not see a reason
>why this should not be allowed.
>     3) true multidimensional arrays (Am I out of line here? has been
>at least 10 years since tried to write C interpreter).

These would both be nice, but the awk language is frozen, and making
these work would break compatibility too much.  (Use tawk if you want
this, or perl, or python, or ....)

Quote:
>My apology if none of these make any sense.

>4. On extension API, extension writers will need a fairly complete and
>stable interface, and documentation that includes a description of the
>parse tree (ASCII art will be great).

Actually, I disagree.  A good API would allow you to create/access/remove
variables and arrays, be called by awk, call awk functions, and perhaps
force a call to awk's "exit" (invoking the END block).  There should
be no need to understand the parse tree.

Consider the challenge as being to design an API that could be put
on top of ANY awk implmentation, not just gawk.

Arnold
--

P.O. Box 354            Home Phone: +972  8 979-0381    Fax: +1 928 569 9018
Nof Ayalon              Cell Phone: +972 51  297-545
D.N. Shimshon 99785     ISRAEL



Sun, 20 Feb 2005 19:19:45 GMT  
 calling functions from gawk extensions

Quote:


>.......

>1. On the use of tmp_string versus make_string, this is my current
>understanding. tmp_string should be used only for returning values
>from functions via set_value(..). One should use make_string for any
>other purpose and clean up using unref(..) not freenode(..). freenode
>does not know anything about the contents of a node, it just returns
>the memory for the node( not the memory for the string itself) to the
>global pool.

It sounds to me, based on your posts and Stepan's, that, as there is no
garbage collector, anytime you call any of the "allocate memory (create an
object)" functions, without assigning the returned value to a variable, you
are committing memory leak.  I.e., if you don't store the return value in a
variable, you can't subsequently free it.

Quote:
>2. The extension that I am trying to write mostly involves writing
>gawk wrappers for corresponding c functions and requires translating C
>structures to gawk arrays and vice versa. If there is an alternative
>to using arrays, please let me know. For a  C structure with an
>embedded C structure, I am using a two-dimensional(?) gawk array.
>Consider these C structures:

It really sounds to me like you should get TAWK.  The depth of what you are
trying to do, as well as your specific comments, make it sound like the
time/effort necessary to get up to speed on TAWK are warranted.  Note that
most of the features that you seem to want from GAWK are already present in
TAWK.

Quote:
>4. On extension API, extension writers will need a fairly complete and
>stable interface, and documentation that includes a description of the
>parse tree (ASCII art will be great).

I think it is pretty clear from the text in EAP3 (and elsewhere) that the
GAWK extension facility is pretty much what it is and what it is going to
be.  That is, it is not a core concept.

Mind you, I kinda like how the GAWK "shared lib interface" works - the way
the internals of the interpreter are exposed (both at the scripting level
and at the source code hacker level).  The TAWK version is much more closed
(although I've tried to get Pat to open it up a little on a few occasions).
The TAWK "shared lib interface" is, however, more straightforward and
easier to use.



Mon, 21 Feb 2005 22:06:30 GMT  
 calling functions from gawk extensions

Quote:



> >2. The extension that I am trying to write mostly involves writing
> >gawk wrappers for corresponding c functions and requires translating C
> >structures to gawk arrays and vice versa. If there is an alternative
> >to using arrays, please let me know. For a  C structure with an
> >embedded C structure, I am using a two-dimensional(?) gawk array.
> >Consider these C structures:

> >struct envelope {                      struct address {
> >      struct address *from;                  char *mailbox;
> >      struct address *to;                    char *host;
> >       ...                                     ....
> >}                                      }

> >I have envelope["from", "mailbox"], envelope["to", "mailbox"] and so
> >on. Any other ideas?

> This is a perfectly good use of awk arrays.

> >3. Random thoughts on array implementation in gawk:

> >array nodes can be handled the same way as string nodes.

> I tried something like this at one point, and it broke horribly.  The
> Node_array_ref thing maintains correct semantics and doesn't leak memory.

> Part of the problem are the unions in the NODE structure. Making arrays
> work as you describe would have required yet another field (a reference
> count in the actual array object, as Stepan pointed out).  It wasn't
> worth it, at least not for me.

> >Other probable benefits (if implemented):
> >     1) One can write in a gawk script:
> >          a[1] = ..
> >          a[2] = ..
> >          b = a   #'b' is the same as 'a', a reference                

> Nope, that would not be awk.  Even if possible, I wouldn't break
> the semantics that way.

> >     2) an array as an element of another array.I do not see a reason
> >why this should not be allowed.
> >     3) true multidimensional arrays (Am I out of line here? has been
> >at least 10 years since tried to write C interpreter).

> These would both be nice, but the awk language is frozen, and making
> these work would break compatibility too much.  (Use tawk if you want
> this, or perl, or python, or ....)

> >My apology if none of these make any sense.

> >4. On extension API, extension writers will need a fairly complete and
> >stable interface, and documentation that includes a description of the
> >parse tree (ASCII art will be great).

> Actually, I disagree.  A good API would allow you to create/access/remove
> variables and arrays, be called by awk, call awk functions, and perhaps
> force a call to awk's "exit" (invoking the END block).  There should
> be no need to understand the parse tree.

> Consider the challenge as being to design an API that could be put
> on top of ANY awk implmentation, not just gawk.

> Arnold
> >3. Random thoughts on array implementation in gawk:

> >array nodes can be handled the same way as string nodes.

> I tried something like this at one point, and it broke horribly.  The
> Node_array_ref thing maintains correct semantics and doesn't leak memory.

> Part of the problem are the unions in the NODE structure. Making arrays
> work as you describe would have required yet another field (a reference
> count in the actual array object, as Stepan pointed out).  It wasn't
> worth it, at least not for me.

There is currently no way to create an explicit reference to an
array(lets call it what it is i.e associative array aka hash) or for
that matter to a string, and there never will be. So I agree, there is
 no point in trying to change something that ain't broken. However,
here is my two cents on the design of the NODE structure. Instead of
one size fits all structure, how about a hierarchical one with a Node
structure at the top having the  common elements (e.g. 'type') and a
void * pointer referring  a structure specific to an element. These
element's would be Node_num, Node_string, etc. Pointer dereferencing
has to be done based on the element type, but there will be no
additional cost in most cases (one needs only to find if (tree->type
== ...) in the source). I am mentioning this only because, as I gather
from Stephan's posts, he is doing some work on redesigning the Node
structure. It is possible to come up with the optimal structure in the
current framework, but just consider the
necessity of adding/removing some elements from it in the future. Awk
language is 'frozen', but the gawk source is not. I am just trying to
understand the rational behind the current design choice.

- Show quoted text -

Quote:

> >Other probable benefits (if implemented):
> >     1) One can write in a gawk script:
> >          a[1] = ..
> >          a[2] = ..
> >          b = a   #'b' is the same as 'a', a reference                

> Nope, that would not be awk.  Even if possible, I wouldn't break
> the semantics that way.

> >     2) an array as an element of another array.I do not see a reason
> >why this should not be allowed.
> >     3) true multidimensional arrays (Am I out of line here? has been
> >at least 10 years since tried to write C interpreter).

> These would both be nice, but the awk language is frozen, and making
> these work would break compatibility too much.  (Use tawk if you want
> this, or perl, or python, or ....)

Thanks for reminding me the reason why I use awk, ok, at least one of
the
reasons. I promise to keep this in mind when I am using awk.

You wish some of these languages could be 'frozen'. Seems like I am
comfortable with
only 'frozen' langugages. Need to change my attitude (:-.

As I think about it, I kinda like the current API. I have nothing else
to add to this
list for which extension writers will need to know about gawk
internals. Currently,
the only place one need's to know about the parse tree in some detail
is when he/she want's to call a gawk function
(don't know about invoking the END block). We have figured that out in
this thread(
thanks to the kind folks participating in this NG). Adding another
'level of indirection'
 on top of current API would only make things slower. So my vote is to
leave the API
 as is (talk about fickle minded!).

On having a builtin include mechanism in gawk, I have changed my mind
on this one
too. I think my current technique is good enough. By the way, is ftp
site
mentioned in EAP still availble for uploading awk stuff? I will be
happy to share
my gawk stuff for running cgi scripts. Anyone interested is welcome to
send me an email.



Wed, 23 Feb 2005 07:34:01 GMT  
 calling functions from gawk extensions
Hello.


Quote:
> here is my two cents on the design of the NODE structure. Instead of
> one size fits all structure, how about a hierarchical one with a Node
> structure at the top having the common elements (e.g. 'type') and a
> void * pointer referring  a structure specific to an element. These

One should also keep in mind one big advantage of the curent "one node
fits all" design: the getnode/freenode pair can effectively reuse
freed nodes.

When I was thinking about the cleanup of the NODE design, I've also
considered these tricks:
1) splitting the biggest nodes a structure of more nodes (eg. the string
node has to keep not only the string itself, but also it's numeric value
and compiled regexp value, so that they nedd not be recomputed later);
2) allocating "twins", ie. two NODE structures, one immediately following
the other.

But generally, I value the advantages outlined above so high that I wanted
to keep (most of) the data in a tree of a NODE.
And, of course, the union should be better organized and documented.

Quote:
> I am mentioning this only because, as I gather from Stephan's posts,
> he is doing some work on redesigning the Node structure.

Haven't done any significant to the NODE structure yet, just thinking.

And, as soon as Arnold has mentioned that he sees the current design as
the biggest limitation, and mentiones mawk and compiling to butecode,
I adhered to a crazy plan of writing a virtual machine and let awk
compile to bytecode.
And that could change the situation a lot, so any cleanup of the node
structure should rather be done after this big change.

So it's not probable that I'll get to NODE cleanup anytime soon.

Quote:
> I am just trying to understand the rational behind the current
> design choice.

As Arnold just has told us, the reasons are historical.

But please note that even the bytecode compiler will need the tree
structure.  It can be used for some checks, analysis of the program
and optimization.

Quote:
> Adding another 'level of indirection'
> on top of current API would only make things slower.

Not necessarily.  More exactly, it'll only slow down the process of
dynamic linking and registering new functions, but it need not slow
down the execution.  (And even if it did, I cannot beleive that that
could be the bottleneck of the application.)

Have a nice day,
        Stepan



Thu, 24 Feb 2005 20:51:21 GMT  
 
 [ 14 post ] 

 Relevant Pages 

1. GAWK: Question on using filefuncs.so (creating arrays in extension functions)

2. VC++ calling fortran function and fortran function calling a c++ function

3. How to find out name of calling function from called function

4. gawk: dynamic extension and pgawk support for Win32

5. gawk 3.x: "static" extensions

6. readfile extension for gawk

7. GAWK 3.0.95 - question on the extensions stuff

8. GAWK 3.0.95 - Bug (er, problem) report on extension stuff

9. GAWK and the networking extensions: Future plans?

10. Calling functions from functions from functions ...

11. C Extensions: calling across extensions

12. Gawk for win32 slower than Gawk for Dos_32?

 

 
Powered by phpBB® Forum Software