Proposal: "Virtual References" 
Author Message
 Proposal: "Virtual References"

In an attempt to partially address the recurring discussion concerning
reference counting vs. garbage collection, I would like to propose an
extension to python which should help in the creation of "well structured"
cyclic graphs.  In particular, it should allow, at the least, trees with
parent back-pointers and doubly-linked lists to be created, without
worrying (too much) about reference cycles.

The basic mechanism I'd like to suggest is that of a "virtual reference,"
or a "vref" from here on.  A vref is essentially a handle on an object that
does not increment the object's reference count, which means that holding a
vref on an object will not keep the object from being destroyed.  This
would allow the Python programmer, for example, to create the
aforementioned tree structure, and allow it to be automatically destroyed
when it is no longer in use.  This is accomplished by changing all of the
parent back-references into vrefs, so that they no longer create reference
cycles which keep the tree from being destroyed.

In order to implement this mechanism, the Python core must ensure that no
-real- pointers are ever left referencing objects which no longer exist.
The implementation I would like to propose involves two basic additions to
the current Python core:

1. A new "vref" type, through which the Python programmer creates and
   manipulates virtual references.  Internally, it is basically a C-level
   Python object with a pointer to the Python object it is a reference to.
   Unlike all other Python code, however, it does not change the reference
   count of this object.  In addition, it includes two pointers to
   implement a doubly-linked list, which is used below.

2. The addition of a new field to the basic Python object [PyObject_Head in
   object.h], which is either NULL, or points to the head of a list of all
   vref objects which reference it.  When a vref object attaches itself to
   another object, it adds itself to this linked list.  Then, if an object
   with any vrefs on it is deallocated, it may walk this list and ensure
   that all of the vrefs on it point to some safe value, e.g. Nothing.

This implementation would hopefully have a minimal impact on the current
Python core -- when no vrefs exist, it would only add one pointer to all
objects, and a check for a NULL pointer every time an object is
deallocated.

Back at the Python language level, I have considered two possible semantics
for the vref object --

==> Pointer semantics:

  In this model, a vref behaves essentially like a Python-level pointer;
  the Python program must explicitly dereference the vref to manipulate the
  actual object it references.

  An example vref module using this model could include the function "new";
  When used as 'MyVref = vref.new(MyObject)', it returns a new vref object
  such that that MyVref.object == MyObject.  MyVref.object would then
  change to Nothing if MyObject is ever deallocated.

  For a concrete example, we may introduce some new C-style syntax:

  & -- unary operator, creates a vref on an object, same as vref.new().
  * -- unary operator, dereference a vref, same as VrefObject.object.

  We can then define:

  1.     type(&MyObject) == vref.VrefType
  2.        *(&MyObject) == MyObject
  3. (*(&MyObject)).attr == MyObject.attr
  4.          &&MyObject == Nothing
  5.           *MyObject -> exception

  Rule #4 is subtle, but comes about because we have made a vref to (a vref
  with no real references).  Thus the outer vref is cleared to Nothing when
  the inner one inevitably disappears.

==> Proxy semantics:

  In this model, the Python programmer manipulates vref objects just as if
  she were manipulating the object it is a reference of.  This is
  accomplished by implementing the vref so that all operations on it are
  redirected to its referenced object.  With this model, the dereference
  operator (*) no longer makes sense; instead, we have only the reference
  operator (&), and define:

  1.  type(&MyObject) == type(MyObject)
  2.        &MyObject == MyObject
  3. (&MyObject).attr == MyObject.attr
  4.       &&MyObject == MyObject

  Again, rule #4 is important -- here, the outer vref is in fact a
  reference to the original object, and -not- the inner vref.  This is
  because all operations applied to a vref actually apply to its object, so
  that creating a vref of a vref actually results in creating a vref of the
  latter's object.

The first, pointer semantics, has the advantage that it would be very easy
to implement; the vref type is extremely simple, requiring at minimum a
single attribute, object, and a function to create a reference.

However, I really like the proxy semantics.  Not only does it put less
of a burden on the Python programmer, but it allows you to do nice
things like use a vref anywhere you would use the actual object.
Unfortunately, it would probably an extreme pain, if not practically
impossible, to implement in the current Python implementation.  I do have
some thoughts, though, on how to do this, if it seems interesting; one
possibility is to introduce new type-checking functions which handle the
vref.  This would hopefully older C modules which don't expect vrefs to
simply return a type error, until they can be fixed.

Finally, there are some other additional capabilities that this system
could provide.  One that seems particularily interesting to me involves
allowing the Python programmer to add "destructor" function to a vref --
this Python function would be called immediately prior to the referenced
object being deallocated, allowing a Python program to invisibly attach
itself to another object and watch for it to disappear.  This seems neat,
though I haven't actually come up with any practical uses for it, yet... :)

-------------------------+--------------------------------------------------
__  Dianne Kyra Hackborn | "I say there is more =stupidity= than =hydrogen=,
\/    Oregon State Univ. | and THAT is the basic building block of the

  CS Graduate {*filter*} |                                    -- Frank Zappa
       <URL: http://www.*-*-*.com/ ~hackbod/>



Tue, 24 Mar 1998 03:00:00 GMT  
 Proposal: "Virtual References"

Quote:
>[Elegant and clear description of virtual references omitted.]

Cool.  The only problems I see are with the overhead of one extra
pointer for *every* object (including e.g. ints) and the requirement
to call the vref cleanup routine when the refcount goes to zero.
This would have to be added to the DECREF macro.  Currently this does
essentially the following:

        if (--p->ob_refcnt == 0) p->ob_type->tp_dealloc(p);

It could be rewritten as

        if (--p->ob_refcnt == 0) dealloc(p);

where dealloc() would be a function defined as follows:

void dealloc(object *p) {
        <do the vref cleanup>
        p->ob_type->tp_dealloc(p);

Quote:
}

This would probably create slightly smaller code for the DECREF macro
(since p->ob_type->tp_dealloc is replaced by the external symbol
dealloc) but also somewhat slower since there would be two function
calls instead of one (plus the vref cleanup code, which would be a
single NULL test if vrefs aren't used).

I wonder if it would make sense to have an implementation where only
certain types could be linked to by vrefs.  These types would indicate
their vref support by a flag in their type object, and their dealloc
function would have to call the vref cleanup function.  The chain of
vrefs could even be maintained by a function whose address is in the
type object, thus leaving it up to the object type to decide where to
store the head of the vref chain.

While this seems less elegant, it completely removes the overhead for
data types that don't need vrefs.  Initially, I would probably only
implement vrefs to class instances, since this is enough for most
practical situations.

Regarding your proxy semantics: I'd like to see how you would
implement this without breaking lots of old code.


URL: <http://www.python.org/~guido/>



Tue, 24 Mar 1998 03:00:00 GMT  
 Proposal: "Virtual References"

Quote:
> [Much snippage to save space]

> I like the general idea of the vref object, but I have a question about
> the proposed semantics. I wonder how often within a program you need to
> manipulate a vref AS a vref, rather than manipulating the object you're
> pointing at. After all, every name in python is really just pointing to
> something else. So why can't we just have:

>     a = SomeObject()
>     v = vref.vref(a)  # or whatever
>     v.method()        # calls a.method

> Having to add & and * and all that stuff just doesn't seem to fit with
> the wonderfully clean python syntax. Is there ever a case when you need
> to know a vref is a vref anyway? At first glance (admittedly a rather
> bleary one) I can't see a case where supporting that would be useful. If
> you need to be able to tell, I'd much rather have an isvref() function to
> test the name, and have (as in the above example) typeof(v) = typeof(a).

If type(v) is type(a), the two object types would be entirely
indistinguishable (an object's type defines its behavior).  You could
have a vtype(v) function that would return the type except if v were a
vref then it would return the type of whatever it refers to (sort of
like the distinction between the stat() and lstat() system calls in
Unix).

Another problem would be that, unless the current object
implementation framework is changed, a vref object would have to
declare whether it's a mapping or a sequence by having a pointer to a
structure containing mapping-specific or sequence-specific
implementation functions in its type object.  If it declares to be
both, ambiguities arise (e.g. which function should be called for
v[x]) and I'm not sure that the runtime currently always makes the
correct choice.

--Guido



Sun, 29 Mar 1998 03:00:00 GMT  
 Proposal: "Virtual References"

Quote:
> Another problem would be that, unless the current object
> implementation framework is changed, a vref object would have to
> declare whether it's a mapping or a sequence by having a pointer to a
> structure containing mapping-specific or sequence-specific
> implementation functions in its type object.  If it declares to be
> both, ambiguities arise (e.g. which function should be called for
> v[x]) and I'm not sure that the runtime currently always makes the
> correct choice.

Just dynamically create the vref type object the first time there is
an attempt to make the vref object for a given type.  This way, the
vref type object can look at the type object that its objects are
going to be referencing and fill in its mapping-specific or
sequence-specific functions accordingly.  Of course, this means that
there will be a seperate vref type object for each object type that
has a vref to one of its instances.  But, that might also be useful
since the type of two vref object will only be the same if they
reference the same type of object.

--
Donald Beaudry                                Silicon Graphics
U.I. Technology/DMS                           1 Cabot Road

                  "So much code, so little time..."



Sun, 29 Mar 1998 03:00:00 GMT  
 Proposal: "Virtual References"


Quote:

>In an attempt to partially address the recurring discussion concerning
>reference counting vs. garbage collection, I would like to propose
>[virtual references]....

Techniques like this could do more than patch refcounting.
You could also use "virtual references" to implement transparent
persistent/remote/replicated/shared objects.  So for example
Python could refer to an object Citation which under the surface
encodes to a number of table entries in an Oracle database, and
any reads or writes to Citation automatically translate to
appropriate database accesses.  You can almost do this now, but
not quite.

This, of course, would break lots of existing code (as guido
suggests) if Oracle decides to rollback the accesses, but it
would be worth it.

Quote:
>==> Proxy semantics:

>  In this model, the Python programmer manipulates vref objects just as if
>  she were manipulating the object it is a reference of.  This is
>  accomplished by implementing the vref so that all operations on it are
>  redirected to its referenced object.  With this model, the dereference
>  operator (*) no longer makes sense; instead, we have only the reference
>  operator (&), and define:

>  1.  type(&MyObject) == type(MyObject)
>  2.        &MyObject == MyObject
>  3. (&MyObject).attr == MyObject.attr
>  4.       &&MyObject == MyObject


allow a vref to act like the object itself, and become an instance
of <stale vref> if the referred object becomes invalid (by
deallocation, deletion from database, server not responding).
This would break code, but document it and let 'em figger it out.

To allow programmers to make new vref behaviours the vref method
should allow some sort of parameterization by the Python programmer
that defines how to access the referenced object.  Off the top of
my head I don't know how that should work, so I'll let Diane
solve it.               -a.
===
Looks like OJ's gonna take another STAB at marriage.



Sun, 29 Mar 1998 03:00:00 GMT  
 Proposal: "Virtual References"

Quote:

>Having to add & and * and all that stuff just doesn't seem to fit with
>the wonderfully clean python syntax. Is there ever a case when you need
>to know a vref is a vref anyway?

Yes.  But the question is: how?  I'd vote for raising a vref.error on
any operation.  Old code - that should not catch any unknown exceptions
anyway, unless it is part of a de{*filter*} - would be killed the right way
if a server is not responding, or any other behaviour that is not
explicable without understanding of a vref.

Quote:
> At first glance (admittedly a rather
>bleary one) I can't see a case where supporting that would be useful. If
>you need to be able to tell, I'd much rather have an isvref() function to
>test the name,

Yes! Yes!

Quote:
>and have (as in the above example) typeof(v) = typeof(a).

First, it's 'type', not 'typeof'. Second, and more important:

NO! I'd rather have a vrefInstance.__vrefdata__.type()
I bet Dianne will come up with some ClassType (but I wouldn't mind
something completely different ;-)

Quote:
>-Chris





Sun, 29 Mar 1998 03:00:00 GMT  
 Proposal: "Virtual References"

| > I like the general idea of the vref object, but I have a question about
| > the proposed semantics. I wonder how often within a program you need to
| > manipulate a vref AS a vref, rather than manipulating the object you're
| > pointing at. After all, every name in python is really just pointing to
| > something else. So why can't we just have:
| >
| >     a = SomeObject()
| >     v = vref.vref(a)  # or whatever
| >     v.method()        # calls a.method
| >
| > Having to add & and * and all that stuff just doesn't seem to fit with
| > the wonderfully clean python syntax. Is there ever a case when you need
| > to know a vref is a vref anyway? At first glance (admittedly a rather
| > bleary one) I can't see a case where supporting that would be useful. If
| > you need to be able to tell, I'd much rather have an isvref() function to
| > test the name, and have (as in the above example) typeof(v) = typeof(a).

Hm, I thought that was how I'd intended the "proxy" semantics to work; that
one only introduced the '&' operator, which is really just a C-alike for
'vref.vref()'.  Is there some detail to this that I am missing?

Of course, the problem with all this is that adding it to the current
Python implementation would be...  interesting. ;)

| If type(v) is type(a), the two object types would be entirely
| indistinguishable (an object's type defines its behavior).  You could
| have a vtype(v) function that would return the type except if v were a
| vref then it would return the type of whatever it refers to (sort of
| like the distinction between the stat() and lstat() system calls in
| Unix).

Do you mean type(vrefObject) would return a 'VRefType', and vtype() return
the object to which it is a reference?  How about having type(vrefObject)
return the type of the object to which it is a reference, and add something
like is_vref() to check if it is actually a vref?

| Another problem would be that, unless the current object
| implementation framework is changed, a vref object would have to
| declare whether it's a mapping or a sequence by having a pointer to a
| structure containing mapping-specific or sequence-specific
| implementation functions in its type object.  If it declares to be
| both, ambiguities arise (e.g. which function should be called for
| v[x]) and I'm not sure that the runtime currently always makes the
| correct choice.

How do class instances get around this?  They seem to have pointers to both
types of methods...  But just to be twisted, what if the vref object
actually had two types, the only different being which set of methods they
have, and dynamically changes itself to match the object it is
referencing...? ;)

I've been putting some thought into how to get a proxy-style vref working.
Some ideas I'm thinking of are

- Add a new #define Py<TypeName>_GetObject(op), which will either return an
  object of that type, or NULL if 'op' is not that type and not a vref to
  that type.  E.g.,

  #define Py<TypeName>_GetObject(op) \
    ( ((op)->ob_type == &Py<TypeName>_Type) ? (op) \
      : ( (op->ob_type == &PyVRef_Type \
           && ((PyVRefObject*)op)->vr_owner->ob_type == &Py<TypeName>_Type) \
         ? (op) : NULL ) )

  Or something like that. :}  By leaving the old type-checking method in,
  this would allow code which isn't prepared to deal with vrefs to simply
  fail with a type error.

- Add a new flag to the type object, indicating whether a type is able
  to deal with vrefs.  This would be used by the vref object, when
  redirecting operations to its owning object, to determine whether or not
  it should do so.  [Or it might make more sense to just disallow vrefs to
  objects which don't have this flag set, in the first place.]  I'm
  assuming this could be a problem; otherwise it  might work to just always
  add a reference to its owner object and hand the operation off to it.

Btw, I've done a initial performance test for this, by adding to Python
1.2:

        * An ob_vrefs field to PyObject_HEAD
        * New code in _Py_NewReference() to initialize ob_vrefs to 0
        * New code in _Py_Dealloc() to call a cleanup function if
          ob_vrefs is non-0.
        * A bare-bones implementation of a pointer-style "vref" type.
          [Which is not yet accessible to the Python program, which means
          that no vrefs can actually be created, so the code is never
          called. ;)]

Some statistics for this --

Standard Python 1.2:  466944 Oct 12 11:46 python.basic*
 VRef'ed Python 1.2:  487424 Oct 12 11:31 python.vref*

[primes.py, as in Python distribution.]

Quote:
> bench.py -n 5 -s "primes.py 2 100000" ./python.basic ./python.vref

Testing for 5 iterations with Python script "primes.py 2 100000".
1 : ./python.basic time = 31.2
1 : ./python.vref time = 32.4
2 : ./python.basic time = 31.6
2 : ./python.vref time = 32.4
3 : ./python.basic time = 31.2
3 : ./python.vref time = 32.4
4 : ./python.basic time = 31.4
4 : ./python.vref time = 33.4
5 : ./python.basic time = 30.9
5 : ./python.vref time = 32.8
./python.basic average: 31.26 (100.0% of baseline)
./python.vref average: 32.68 (104.5% of baseline)

[primes.py, with printing removed.]

Quote:
> bench.py -n 5 -s "primes.py 2 100000" ./python.basic ./python.vref

Testing for 5 iterations with Python script "primes.py 2 100000".
1 : ./python.basic time = 29.8
1 : ./python.vref time = 31.4
2 : ./python.basic time = 30.4
2 : ./python.vref time = 31.2
3 : ./python.basic time = 30.9
3 : ./python.vref time = 31.4
4 : ./python.basic time = 29.7
4 : ./python.vref time = 30.8
5 : ./python.basic time = 29.5
5 : ./python.vref time = 30.9
./python.basic average: 30.06 (100.0% of baseline)
./python.vref average: 31.14 (103.6% of baseline)

-------------------------+-----------------------------------------------
__  Dianne Kyra Hackborn | "One man's `prayer' is another man's `unknown
\/    Oregon State Univ. | quantity.'"                                  

Life life, I need a life |
       <URL:http://www.cs.orst.edu/~hackbod/>



Tue, 31 Mar 1998 03:00:00 GMT  
 Proposal: "Virtual References"

Quote:

>Btw, speaking of GC, has anyone thought of doing a reference counting /
>garbage collection hybrid?  E.g., do most of the work with reference
>counting: you KNOW an object is gone when it has zero references, so you
>can normally depend on that to clean things up.  Then you can have a much
>lower priority GC system running behind it, to eventually catch any
>circular links which are created.

Actually, I almost posted about just this topic earlier this week, in
response to the most recent occurance of the compiler thread, with
it's reference to free space stuff.  (Thats a fairly convoluted
sentence, isn't it...)  In any case... I think something like this
would be a substantial benefit, even if it was only a debugging tool
(rather like Purify) which let you find out where you were leaking so
you could fix it.

I've had a couple of applications recently that have been prototyped
in Python but then converted to C/C++ because they needed well defined
allocation behavior, (basically Unix daemons) and I didn't have a warm
fuzzy feeling that the Python version could be depended on not to
leak.  At least with a C version we could run it with Purify and find
any dumb allocation problems.

As for the speed up arguments for a compiler... MIPS are cheap and
getting cheaper.  While I think performance is important, reliability
an robustness are much bigger concerns.

Tom



Tue, 31 Mar 1998 03:00:00 GMT  
 
 [ 8 post ] 

 Relevant Pages 

1. string.join(["Tk 4.2p2", "Python 1.4", "Win32", "free"], "for")

2. Language change proposal: make assertions "reusable"

3. NANS Proposal "pronounciation"

4. Proposal: "in" pseudo-operator

5. wish: "struct" module extension proposal

6. PROPOSAL: [].__doc__, "".__doc__, ...

7. Proposal: remove the "global" statement

8. Proposal: "while variable from condition"

9. Proposal: "while variable from condition" (fwd)

10. Proposal: "while variable from condition"

11. Proposal for module-nesting, as "packages"

12. proposal: addition to "tag names" command

 

 
Powered by phpBB® Forum Software