Vector arguments to external procedures 
Author Message
 Vector arguments to external procedures

A while back I asked about how to pass the address of an element of a
vector to an external procedure. My thanks to Roger, Jon and Aaron for
helpful replies.

Here is a rather long summary, for anyone interested, of how I see the
problem now. I'd again be grateful for any further suggestions relating
to the new set of problems and ideas towards the end.

Getting the address
-------------------

There were 3 suggestions:

1. Roger suggesting using fast_subscrintvec to get the base address, and
offsetting it in ordinary Pop. This is the kind of evil trick I was
looking for originally. (I knew it existed, but couldn't remember it.)
Roger also pointed out the garbage collection implications, which turn
out to be the main issue - see below.

2. Jon suggested using an external procedure to offset the address of
the vector, returning it to Pop. I don't know why I hadn't thought of
this, but I hadn't. (I proposed an external interface procedure between
Pop and each real procedure.)

3. Aaron suggested using sysFIELD to get the address. I translated this
into the following code:

    define lconstant vaddr(type);
        ;;; Supports e.g. intvec_addr(n, vec) -> exptr
        ;;; Address of n'th element of vector v (type not checked).
        pop11_need_nextitem("(") -> ;
        pop11_comp_expr_seq_to(")") -> ;
        sysFIELD(false, conspair(type, 1), false, 8:1000);
    enddefine;

    define syntax intvec_addr = vaddr(% "int" %); enddefine;
    ;;; etc.

I'm inclined to use method 3. It returns an exptr struct containing the
address, which I guess creates a little garbage, but avoids its having
to masquerade as an integer and so get translated between a Pop
(big)integer and a machine integer/address. (The external call interface
accepts an exptr struct and automatically passes its contents to the
external procedure, so it's quite neat: exacc foo(intvec_addr(i, v)).)

Dealing with garbage collections
--------------------------------

Thanks for pointing this issue out Roger.

The problem is that if the address is found in user-level Pop, there
might be a GC before it gets passed to the external procedure. This
applies to all three methods above.

Since external argument handling is done in assembly (aextern.s) there
isn't much chance of dealing with it by modifying the system. So the
question is whether I can protect the code from the garbage collector.
I'm not sure I can, but here are some ideas:

1. Call sysgarbage first to get it out of the way. No good! (a) There
might be another one anyway. (b) Big overhead. (c) Unnecessary GCs
destroy my very effective garbage reduction strategy (see
http://www.*-*-*.com/ ).

2. Use fixed-address vectors. Unsatisfactory. (a) I don't want the
calling program to have to do anything special. (b) Large fixed-address
structures will make for inefficient use of memory.

3. Tinker with the system parameters to try to defer a GC - e.g.
increase popmemlim temporarily. I don't know whether I can make this
totally reliable - does anyone else?

4. Guard the code with sys_lock_heap and sys_unlock_heap. This is no
good as it stands as sys_unlock_heap undoes *all* previous calls to
sys_lock_heap. However, a modification is promising:

    sys_lock_heap();
    exacc ext_foo(intvec_addr(n, v)) =>
    pop_heap_lock_count - 1 -> pop_heap_lock_count;

In practice, this seems to work. One problem is that pop_heap_lock_count
is not documented, though the source and simple tests suggest that
decrementing it undoes the last sys_lock_heap. There's also an overhead
from locking and unlocking the heap - what it does isn't trivial, I
think. Another concern is that one might get an unnecessary
run-out-of-memory mishap while the heap is locked. This *looks*
unlikely, but I'm not sure how to analyse it - does anyone have any
thoughts?

5. Use pop_after_gc to exit to the caller if a GC occurs during the
critical code, then try again if the external procedure had not run
before the GC happened. I think this would be a good strategy (low
overhead unless a GC occurs) except for one thing: I don't know how to
tell in general whether the external procedure had already run when the
GC happened. (If it had already run, it mustn't be run again because it
might have deliberately overwritten its input data.) I feel as if this
ought to be trivial - but in practice I'm stuck.

Overall, I'm beginning to wonder whether to fall back on my original
idea, but will probably explore option 4 above further first. Has anyone
any ideas? Comments extremely welcome!

David



Sun, 16 Oct 2005 20:56:04 GMT  
 Vector arguments to external procedures
Hi David,

Quote:
> Dealing with garbage collections
> --------------------------------

> Thanks for pointing this issue out Roger.

> The problem is that if the address is found in user-level Pop, there
> might be a GC before it gets passed to the external procedure. This
> applies to all three methods above.

One way around this is to use an external procedure to allocate the memory,
instead of using the pop heap. That is, use malloc() to get the memory
for the vector or array.

It depends on what you are putting in these
structures how messy this might be (if it is just an array or numbers, it's
probably not too much hassle -- if you need pointers to pop objects, then
it gets more complicated).

If there are unwelcome interactions with poplog memory management, then
a way to do this is *once* when your program initialises itself, to allocate,
a *big* poplog vector, lock the poplog heap (once only) so it can't move,
then pass out this big vector to the external code. Then allocate space from
within this object.

This could certainly be made to work and would be efficient and safe, but
I can't remember whether it is easy to persuade standard C or C++ functions
to allocate from within a known address range. I assume you would prefer
not to write your own version of malloc() and free() ... ? :-) Although I confess
I probably *would* write my own jlc_malloc() and jlc_free(), since that would
be less effort than finding a better solution :-). But only if calling the system
malloc() interfered with Poplog in some way which mattered for the application.

Jonathan



Sun, 16 Oct 2005 22:42:55 GMT  
 Vector arguments to external procedures
Hi David,

With regard to a user GC relocating a vector and destroying the
validity of the lovingly crafted address, I have a good idea.  [Well,
I think it is a good idea.  I am so tired that my eyes are making the
screen swim alarmingly, so it may be nonsense.]

How about adding a pop_after_gc "hook" that is responsible for
correcting the addresses that get mucked up?  The simplest way to do
this is to modify Aaron's suggestion by "marking" the resulting exptr
struct by adding it to a temporary property along with enough
information to re-do the computation.  Then the pop_after_gc hack
would re-run the computation.

Enough information means both "n" and "vec", I guess.  So there is
quite a lot of overhead, really.  However, you could marginally
improve on that by sharing the information.  After all, one probably
repeatedly generates either identical or sequential &vec[n] values.
In either cases a sharing opportunity arises which can be exploited
by caching "n" and "vec" and checking whether it is the same or one
bigger than a previous value.  But such a time/space tradeoff is
probably not what you want.

--
Steve



Wed, 19 Oct 2005 04:15:24 GMT  
 Vector arguments to external procedures
 > Hi David,
 >
 > With regard to a user GC relocating a vector and destroying the validity
 > of the lovingly crafted address, I have a good idea.  [Well, I think it
 > is a good idea.  I am so tired that my eyes are making the screen swim
 > alarmingly, so it may be nonsense.]
 >
 > How about adding a pop_after_gc "hook" that is responsible for
 > correcting the addresses that get mucked up?  The simplest way to do
 > this is to modify Aaron's suggestion by "marking" the resulting exptr
 > struct by adding it to a temporary property along with enough
 > information to re-do the computation.  Then the pop_after_gc hack would
 > re-run the computation.
 >
 > Enough information means both "n" and "vec", I guess.  So there is quite
 > a lot of overhead, really.  However, you could marginally improve on
 > that by sharing the information.  After all, one probably repeatedly
 > generates either identical or sequential &vec[n] values. In either cases
 > a sharing opportunity arises which can be exploited by caching "n" and
 > "vec" and checking whether it is the same or one bigger than a previous
 > value.  But such a time/space tradeoff is probably not what you want.
 >

Just out of curiosity, how does the GC know when to collect destroy
stuff? Is it when your data stops being referenced by some variable or
something of the like? Because in such case it would seem (and it
probably is) trivially simple to keep stuff in memory for as long as
necessary by just keeping a list/array/vector/<insert favourite data
struct here> of objects you don't want GCed yet.

just a thought

david

--
=================================================
   The future of HTML mail is clearly > /dev/null.
=================================================
   Two of the most famous products of Berkeley are
LSD and Unix. I don t think that is a coincidence
=================================================



Wed, 19 Oct 2005 09:25:53 GMT  
 Vector arguments to external procedures
Hi David,

Quote:
>Just out of curiosity, how does the GC know when to collect destroy stuff?

The basic answer is when the system cannot regenerate a pointer to an
object, the object is garbage.  Garbage collectable systems are those
which obey some rules that allow this property to be inferred from
something reasonably simple.  Typically, this is just "having no
pointers" to the object.

Quote:
>  Is it when your data stops being referenced by some variable or
>something of the like? Because in such case it would seem (and it
>probably is) trivially simple to keep stuff in memory for as long as
>necessary by just keeping a list/array/vector/<insert favourite data
>struct here> of objects you don't want GCed yet.

The problem David Young has is that the Poplog garbage collector is a
_relocating_ collector.  In other words the pointer values are
changed during garbage collection - disrupting his strategy of
passing pointer offsets.  There is no mechanism in Poplog for
protecting an object against relocation and it would be technically
demanding to do such a thing.

The root cause of the difficulty is that pointer offsets are not a
recognized data type in Poplog.  This is a long standing omission of
a "well-known" [*] requirement.  The result is that there is no
representation of external pointer offsets either.

[*] "Well-known" in this context means that you have put up with one
of my extended rants about garbage collection.  Other "well-known"
requirements include cooperation with manual deallocation, the
ability to perform speculative garbage collection, a key-specific
post-GC hook, pointer elision, pointer replacement, pointer reversal,
and proper monitoring tools.

--
Steve



Wed, 19 Oct 2005 11:41:29 GMT  
 Vector arguments to external procedures

Quote:

> Date: Sat, 3 May 2003 03:41:29 +0000 (UTC)

> Hi David,

> >Just out of curiosity, how does the GC know when to collect destroy stuff?
> ...
> ...
> The problem David Young has is that the Poplog garbage collector is a
> _relocating_ collector.  In other words the pointer values are
> changed during garbage collection - disrupting his strategy of
> passing pointer offsets.  There is no mechanism in Poplog for
> protecting an object against relocation and it would be technically
> demanding to do such a thing.

I think this is not correct.

First of all, in REF SYSTEM this information is provided:

sys_lock_heap()                                              [procedure]
        Tells the system to 'lock' the heap at its current endpoint. The
        effect of this is  that all structures in  the heap before  this
        point will automatically be treated as non-garbage (considerably
        reducing the amount of work needed  to be done on them  during a
        garbage collection). Structures created after the  sys_lock_heap
        will be treated normally.

I think that implies that items in the locked portion of the heap will
not be re-located during a garbage collection, as implied in some
of the earlier discussions on this topic.

More importantly, REF EXTERNAL_DATA in section 8 addresses this problem.

It was essential to deal with this for the X interface to work properly
since that allows pop-11 structures, e.g. callback procedures, strings,
or vectors of strings in the case of scrolling text widgets, to be given
to external (C) programs provided by the X libraries.

For people without Poplog, the REF file is accessible here:

    http://www.cs.bham.ac.uk/research/poplog/doc/popref/external_data

Here's the introduction to section 8:

8  Fixed-Address Poplog Structures for External Use
---------------------------------------------------

The sections above deal with the representation of external data  inside
Poplog; this section considers the use of native Poplog data  structures
externally, i.e,  when  passed  to external  functions  etc.  The  major
problem likely to arise in this context concerns garbage collection, for
the following reasons:

    (1) The Poplog garbage collector will discard (and reuse the  memory
        occupied by)  any  structure  which  has  no  references  to  it
        remaining inside Poplog;

    (2) When  an  (ordinary)  structure  is  retained  by  the   garbage
        collector, it may be moved to  a new location (i.e. its  address
        may change);

    (3) The garbage collector has no  knowledge of references to  Poplog
        structures stored by external functions, and so cannot take them
        into account when deciding  to retain structures  in (1), or  to
        correct the addresses they refer to in (2).

However, since garbage collections can only occur while executing Poplog
code, the  above present  no  problem PROVIDING  the external  use  of a
structure always finishes before control is ever returned to Poplog.

    The problem  occurs only  when  an external  agent  in some  way  or
another retains a structure during a return to Poplog execution  (during
which a garbage collection happens), and then later expects to use  that
structure again (at which time the externally-stored reference may  have
become invalid,  either  through  (1)  or (2)).  That  is,  the  control
sequence

?       external code  ??>  Poplog  ??>  external code
?                         (GC caused)

must occur  (either  by returning  to  Poplog and  then  re-calling  the
external function,  or  by  calling-back to  Poplog  from  the  external
function and then returning from the callback, etc).

    While this scenario will  not arise in  many programs (e.g.  because
data is always passed anew to an  external function on each call, so  it
never  relies  upon  stored  references),  in  other  programs  it  will
difficult or impossible to avoid.  Poplog therefore provides a  solution
to (2)  by making  possible the  creation of  fixed-address  structures,
whose memory locations do not change;  this is done with the  procedures
described below.  In  all  cases,  they  call  an  ordinary  constructor
procedure, but in such  a way that the  result is allocated memory  in a
separate part  of the  heap reserved  for fixed-address  objects --  and
where  its  address  is  guaranteed  to  remain  unchanged  by   garbage
collection.

    However, while  this  solves  (2),  the  potential  problem  of  (1)
remains: that  is, external  code  could have  stored references  to  an
(albeit fixed-address)  structure  for  which  there  are  no  remaining
references inside  Poplog  itself (and  so  a garbage  collection  would
destroy the  object). The  procedures below  therefore also  provide  an
option to automatically hold the structures they create on a global list
(from which,  when  no longer  required,  they  can be  freed  with  the
procedure free_fixed_hold). A  given program only  need use this  option
for structure(s) for which it does not anyway retain suitable references
itself (e.g. in variables or other data structures).

=======================================================================

Further details are in the file, describing procedures cons_fixed,
init_fixed, copy_fixed, is_fixed, free_fixed_hold, sys_grbg_fixed
and others.

WARNING: I do not recommend mixing fixed address structures (e.g. X
window utilities, widgets, etc. ) with the use of sys_lock_heap and
sys_unlock_heap. My impression is that once you have used a fixed
address structure after locking the heap, if you then unlock the heap
you cannot re-use the previously locked space before the fixed
structure.

I formed this impression after repeatedly locking and unlocking the heap
using X facilities and found the the heap size kept growing, whereas
if I did not lock and unlock it remained more constant.

But I have not repeated this experiment recently.

Aaron

Quote:

> The root cause of the difficulty is that pointer offsets are not a
> recognized data type in Poplog.  This is a long standing omission of
> a "well-known" [*] requirement.  The result is that there is no
> representation of external pointer offsets either.

> [*] "Well-known" in this context means that you have put up with one
> of my extended rants about garbage collection.  Other "well-known"
> requirements include cooperation with manual deallocation, the
> ability to perform speculative garbage collection, a key-specific
> post-GC hook, pointer elision, pointer replacement, pointer reversal,
> and proper monitoring tools.

> --                    
> Steve                  

====
Aaron Sloman, ( http://www.cs.bham.ac.uk/~axs/ )
School of Computer Science, The University of Birmingham, B15 2TT, UK

PAPERS: http://www.cs.bham.ac.uk/research/cogaff/ (And free book on Philosophy of AI)
FREE TOOLS: http://www.cs.bham.ac.uk/research/poplog/freepoplog.html


Thu, 20 Oct 2005 04:43:30 GMT  
 Vector arguments to external procedures
Hi,

Following on from Aaron's reply to my comment.

Quote:
>There is no mechanism in Poplog for
>  > protecting an object against relocation and it would be technically
>>  demanding to do such a thing.

>I think this is not correct.

I should have been a little more expansive.  What I was trying to say
is that Poplog does not have a POST-CREATION, PER OBJECT mechanism
for protecting against relocation.  In other words, you can't
dynamically decide to "fix" an object.

In particular, -sys_lock_heap- doesn't count in this regard because
it is not a per-object method.  -cons_fixed- is something that has to
be applied at creation time (presumably to simplify the management of
heap fragmentation) and so doesn't count either.

By the way, this shouldn't be read as a serious criticism of Poplog.
I am unaware of any relocating garbage collector that has such a
feature.  Furthermore, it really would be very technically demanding
to implement.  Although one can easily hallucinate a hybrid garbage
collection strategy that makes it possible, one should bear in mind
that this effectively doubles the complexity of a garbage collector.

--
Steve



Fri, 21 Oct 2005 01:39:40 GMT  
 Vector arguments to external procedures
Dear Steve and Aaron,

Thanks for your suggestions.

Unfortunately I think both the things Aaron mentions have drawbacks for
what I'm trying to do. First, sys_lock_heap. It's correct that sys_lock
heap can be used to solve my problem, since the locked part of the heap
is left alone by the GC. My worries are

    (a) locking and unlocking the heap involves non-trivial amounts of
    work, so the overhead might be considerable;

    (b) sys_unlock_heap unlocks the *whole* heap, which is no good as it
    might have been previously locked for a good reason, e.g. after
    compiling a large library. It looks as if decrementing
    pop_heap_lock_count should do the right thing, but I am wary of
    using something that isn't documented - maybe this variable wasn't
    documented because it can cause problems if used in this way.
    (sys_unlock_heap just sets pop_heap_lock_count to 0 by the way.)

Second, fixed address structures. Again, these provide one kind of
solution. Again I have two objections:

    (a) The arrays have to be specially created. I don't want to have to
    modify all my existing programs to make fixed address arrays! The
    idea is that the interface to Lapack (or whatever) should be a handy
    thing to have available, not something that determines how any other
    part of the program has to be written. (Of course I could create
    fixed address arrays locally and copy data into them - but copying
    data is what I'm trying to avoid in the first place.)

    (b) The data I'm handling often comes in large chunks (e.g. it's
    images), with a short lifetime, and of unpredictable size. I suspect
    that making much of it fixed address will fragment the heap
    disastrously.

I think, however, that I might be able to base something on Steve's
recent message:

Quote:
> How about adding a pop_after_gc "hook" that is responsible for
> correcting the addresses that get mucked up?  The simplest way to do
> this is to modify Aaron's suggestion by "marking" the resulting exptr
> struct by adding it to a temporary property along with enough
> information to re-do the computation.  Then the pop_after_gc hack
> would re-run the computation.

One concern is whether I can grab the exptr structs quickly enough. The
code for sysFIELD ($usepop/pop/src/fields.p line 823) looks suspiciously
as if it gets the address before it makes the structure

            PUSH_FIELD_ADDR();
            exptr_result(not(addr_mode))   ;;; makes the struct I think

- so it could trigger a GC and invalidate its own result! Seems unlikely
that the author wouldn't have thought of this - is anyone expert enough
to comment? I guess I can cover this possibility (and that of triggering
a GC when the exptr goes on the stack) by using pop_after_gc to check
whether a GC occurred during address acquisition, and do it again if
necessary.

A similar possibility that I'm also thinking about is always to get the
addresses just before the external call, storing vec and n in a local
structure, and to loop if a GC occurs during this phase. I *think* that
the code sysFIELD plants for the external call itself can't cause a GC
(the argument processing code is in assembler) - if that's correct, this
can be made to work.

Anyway, that was all for info to anyone still interested. I'll have a go
at something and let you know when I've got code I believe to be
correct.

Thanks for help so far.

David



Sat, 22 Oct 2005 21:06:13 GMT  
 
 [ 8 post ] 

 Relevant Pages 

1. Vector arguments to external procedures

2. Vector arguments to external procedures - correction

3. Vector arguments to external procedures

4. procedure as dummy argument (was: EXTERNAL and INTRINSIC statement)

5. Procedure Argument/External

6. Passing module procedures to external procedures -- type issues

7. Defining procedures whose argument list is result of procedure

8. passing functions with variable number of arguments as procedure arguments

9. Procedures as arguments - set dummy arguments

10. no vector = vector*vector in BLAS?

11. vector valued function as argument

12. passing arguments to external os/2 .CMD files

 

 
Powered by phpBB® Forum Software