UserList.__getslice__(): copy.copy(self.data) vs. self.__class__(self.data). 
Author Message
 UserList.__getslice__(): copy.copy(self.data) vs. self.__class__(self.data).

In the python Reference Manual, Section 3.3.5, "Additional methods for
emulation of sequence types", we find the following entry:

  ...

  __getslice__ (self, i, j)

  Called to implement evaluation of self[i:j]. The returned
  object should be of the same type as self. Note that missing
  i or j in the slice expression are replaced by zero or
  sys.maxint, respectively, and no further transformations on
  the indices is performed. The interpretation of negative
  indices and indices larger than the length of the sequence
  is up to the method.

  ...

The current implementation of UserList.__getslice__(),  looks like this:

    def __getslice__(self, i, j):
        i = max(i, 0); j = max(j, 0)
        userlist = self.__class__()
        userlist.data[:] = self.data[i:j]
        return userlist

Though this follows the guidelines outlined in the reference manual, it
has an interesting side effect: it instantiates a new object of the same
class but it loses the current values of all attributes.  

Is this desireable behavior?   Personally, I don't believe that it is.  
My thinking is that the current internal state of the class should pass
to the newly instantiated object.  I think a better implementation would
be:  

    def __getslice__(self, i, j):
        i = max(i, 0); j = max(j, 0)
        userlist = copy.copy(self)
        userlist.data[:] = self.data[i:j]
        return userlist

Also, should the i and j arguments be "adjusted" before being used to
access the list in self.data?  Again, in the Python Reference Manual,
section 5.3.3 "Slicings," we find:

  The lower and upper bound expressions, if present, must evaluate
  to plain integers; defaults are zero and the sequence's length,
  respectively. If either bound is negative, the sequence's length
  is added to it.

So, the runtime normalizes negative numbers so that small-enough (large
enough??<g>) negative numbers start counting from the end of the
sequence.  i.e., aList[-1] returns the last element in the aList.  As
currently written, UserList converts to zero any negative numbers that
would otherwise raise an IndexError.  The resulting behavior is that a
slice is returned rather than an IndexError being raised, thus:

  >>> ul=UserList.UserList([0,1,2,3,4])
  >>> ul[-1] # last element
  4
  >>> ul[-3:-1] # 3rd- and 2nd-to-the-last elements
  [2,3]
  >>> ul[-10] # calls UserList__getitem__(self,i)
  Traceback (innermost last):
    File "<interactive input>", line 1, in ?
    File "UserList.py", line 29, in __getitem__
      def __delitem__(self, i): del self.data[i]
  IndexError: list index out of range
  >>> ul[-10:-1] # should raise IndexError
  [0, 1, 2, 3]
  >>>

Again, at least to me, this behavior seems to be to be inconsistent with
a "real" list object.

I think this is a better implementation:

    def __getslice__(self, i, j):
        userlist = copy.copy(self)
        userlist.data[:] = self.data[i:j]
        return userlist

I'd be more than happy to implement these changes (there are a couple of
places where self.__class__() is called and where method arguments are
normalized to zero) and submit the context diffs -- unless I'm
overwhelmed with arguments to the contrary.  Since UserList.py is part of
the standard distribution, and could break existing code if changes are
made, I wanted to bounce this off of the community before proceeding.

I'm using UserList in a current project and I've already implemented
these changes into a NewUserList class.  Submitting context diffs would
be a piece of cake.

Any feedback?  Should I proceed?

--
-=< tom >=-

Software Engineering Consultant          | Archimedes was searching for"
Advanced Systems Design, Tallahassee FL. |



Sat, 31 Aug 2002 03:00:00 GMT  
 UserList.__getslice__(): copy.copy(self.data) vs. self.__class__(self.data).

Quote:
> Though this follows the guidelines outlined in the reference manual,
> it has an interesting side effect: it instantiates a new object of the
> same class but it loses the current values of all attributes.

> Is this desireable behavior?  Personally, I don't believe that it is.
> My thinking is that the current internal state of the class should
> pass to the newly instantiated object.

Sometimes, that's what you want, and, in my experience, sometimes not.
UserList is intended as a base class, so you can override any methods
that aren't doing what you want.  For instance, you could do something
like this:

from UserList import UserList

class NewUserList (UserList):

    def __getslice__ (self, i, j):
        i = max(i, 0); j = max(j, 0)
        userlist = copy.copy(self)
        userlist.data[:] = self.data[i:j]
        return userlist

... and use NewUserList in your project instead.

Hope this helps.
Alex.



Sat, 31 Aug 2002 03:00:00 GMT  
 UserList.__getslice__(): copy.copy(self.data) vs. self.__class__(self.data).
Alex:

Thanks so much for responding.

In an article posted 14 Mar 2000 17:37:00 -0500,

Quote:
> Sometimes, that's what you want, and, in my experience, sometimes not.

It seems to me that *not* carrying forward the currently active
attributes is counter-intuitive (of course, that's probably just me....).  
*I* can't think of a case where I'd want to lose the attributes I'd built
up over the course of the current program execution when I slice off a
sub-set of the list of things I'm tracking.  Can you give me an example
of where losing the additional attributes after taking a subset of an
existing collection is desirable?  

Consider an inventory system and a class that implements an order, with
the list holding each of the order items.  Each of the items "belongs" to
the order and shares the order's attributes: order number, ship date,
customer id, shipping address, etc.  Assume further that the user changes
her order and she now wants only the middle three of the five items
ordered:

  neworder = order[1:4]

The neworder instance shouldn't have to go back to the order instance to
collect the order-specific attributes (order id, customer id, etc.)

Or perhaps the last two items in the order are back ordered:

  backordered = order[len(order)-1:len(order)]
  backordered.status = Order.BACKORDERED
  backordered.shipdate = None
  backordered.bodate = DateTime.today()

In this instance, losing all the original order information would be a
huge pain, and would be counter-intuitive -- I'd expect the order ID,
customer ID, shipping address, etc, to end up in the new back ordered
instance.

With the other approach, all of the order attributes are lost and would
have to be copied from the original order.  Ouch.  What if I forget one?  
Enter one sneaky little bug, hard to catch.

It just seems to me that if one goes to the trouble to create a class
(FancyList) that has attributes beyond the data attribute, then each of
the items in the list are "children" of FancyList and share the
attributes of FancyList.  It's a one-to-many relationship.  It seems to
me that slicing off a subset of the list would only rarely disconnect the
items in the subset from super-set. It also seems to me that such
behavior would be the exception rather than the rule.

Quote:
> UserList is intended as a base class, so you can override any methods
> that aren't doing what you want.  

True, but to my thinking the current implementation is neither intuitive
nor does it mimic a "real" list object.  Shouldn't the counter-intuitive
and the non-list-like implementation be the overridden implementation
rather than the "normal" one?  We learn how lists work when we learn
Python, but UserList doesn't act like a "real" list, and we have to be
aware of those differences when we use it - or suffer for it.  

I'm not terribly experienced with Python and I stumbled over this quirk
myself.  I expected UserList to act like a true list.  When I decided to
inherit from it I gave the code a cursory glance and, seeing no
indication that UserList's behavior departs from that of a native list, I
started coding.  The test cases I wrote bumped up against the deviations
-- I was treating the derived class like a real list, and it failed to
act like a native list object under certain conditions. This led to my
research and this series of messages and what-not.

My assertion is that UserList should mimic a true list object as
precisely as possible and any deviation from "acting like a native list"
should be left as an exercise for the implementor.  

My biggest worry over UserList is that it doesn't properly handle out-of-
bounds negative numbers.  Instead of raising an IndexError for what is
likely a programming error, it quietly returns an unexpected result.  Not
good.  Silently returning incorrect results goes against everything I
know about "defensive" programming.  Steve McConnell would have a cow.

The purpose of my article was to elicit feedback on whether UserList.py
should be changed to mimic a real list.  If I understand you correctly,
you believe that UserList, seemingly flawed as it is, should be left as
it is?  

I've made my case as strongly as I can.  Now I'll just sit back and tally
the "votes." <wink>

--
-=< tom >=-

Software Engineering Consultant          | Archimedes was searching for"
Advanced Systems Design, Tallahassee FL. |



Sat, 31 Aug 2002 03:00:00 GMT  
 UserList.__getslice__(): copy.copy(self.data) vs. self.__class__(self.data).
Hi, Bernhard:

In an article posted 15 Mar 2000 00:25:29 +0100,

Quote:
> >  Note that missing
> >  i or j in the slice expression are replaced by zero or
> >  sys.maxint, respectively,
...
> It seems negative indices are automatically subtracted from the length.

Actually, this behavior matches the description in the Language Reference
Manual.

Quote:
> UserList has only one instance variable, data, and that is handled
> correctly. If you need to copy more variables in a derived class you
> should probably override this method, IMO.

OK.

Quote:
> Note that all slicings succeed for normal lists. If both indices of the
> slice are too large or too small (i.e. too negative ;-) ) the result is
> an empty list.

Uh.... I stand before you hat in hand begging forgiveness.... I just went
back and tried this in interactive mode:

  >>> l=[0,1,2,3,4,5,6]
  >>> l[-10]
  Traceback (innermost last):
    File "<interactive input>", line 1, in ?
  IndexError: list index out of range
  >>> l[-100:-99]
  [0, 1]
  >>> l[-100:-50]
  [0, 1, 2, 3, 4, 5, 6]
  >>> l[-100:-50]
  []
  >>> l[-10:-100]
  []
  >>> # damn!

And, of course, you are correct.  The error resulted from an index error
(I.e, the runtime called the native equivalent of __getitem__(self, I)).  
I don't know what I was doing before (other than embarrassing myself :-/)

Quote:
> A normal list doesn't raise an IndexError here. It returns [0, 1, 2, 3]
> Even ul[-10:-100] succeeds for a normal list, returning [] .

Yes it does... see above.

OK, you and Alex win: UserList stays as it is, and I sit back down in my
chair and shut-the-heck-up <g>.

I was thinking that we'd be left with why test_userlist.py was
failing.... but NOT!  Even that seems to have remedied itself:

  D:\Python\Lib\test>python test_userlist.py

  D:\Python\Lib\test>

So, I'm truly sorry to have wasted everyone's time.  I was *way* off-
based with my testing/assessment/findings.

Thank you Bernhard and Alex for setting me straight.

<sheepish g>

--
-=< tom >=-
Thomas D. Funk                           |        "Software is the lever
Software Engineering Consultant          | Archimedes was searching for"
Advanced Systems Design, Tallahassee FL. |



Sat, 31 Aug 2002 03:00:00 GMT  
 UserList.__getslice__(): copy.copy(self.data) vs. self.__class__(self.data).

Quote:

> In the Python Reference Manual, Section 3.3.5, "Additional methods for
> emulation of sequence types", we find the following entry:

>   ...

>   __getslice__ (self, i, j)

>   Called to implement evaluation of self[i:j]. The returned
>   object should be of the same type as self. Note that missing
>   i or j in the slice expression are replaced by zero or
>   sys.maxint, respectively, and no further transformations on
>   the indices is performed. The interpretation of negative
>   indices and indices larger than the length of the sequence
>   is up to the method.

That appears to be incorrect in Python 1.5.2. Using a slightly modified
UserList.py that prints the i and j args of the __getslice__ method, I
get:

Python 1.5.2 (#1, Nov 13 1999, 12:17:58)  [GCC egcs-2.91.66 19990314/Linux (egcs- on linux2
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam

Quote:
>>> from UserList2 import UserList
>>> l = [0,1,2,3,4,5]
>>> u = UserList(l)
>>> u[-2:-1]

__getslice__ 4 5
[4]

It seems negative indices are automatically subtracted from the length.

Quote:
> The current implementation of UserList.__getslice__(),  looks like this:

>     def __getslice__(self, i, j):
>         i = max(i, 0); j = max(j, 0)
>         userlist = self.__class__()
>         userlist.data[:] = self.data[i:j]
>         return userlist

> Though this follows the guidelines outlined in the reference manual, it
> has an interesting side effect: it instantiates a new object of the same
> class but it loses the current values of all attributes.  

> Is this desireable behavior?   Personally, I don't believe that it is.  

UserList has only one instance variable, data, and that is handled
correctly. If you need to copy more variables in a derived class you
should probably override this method, IMO.

[...]

Quote:
> Also, should the i and j arguments be "adjusted" before being used to
> access the list in self.data?  Again, in the Python Reference Manual,
> section 5.3.3 "Slicings," we find:

>   The lower and upper bound expressions, if present, must evaluate
>   to plain integers; defaults are zero and the sequence's length,
>   respectively. If either bound is negative, the sequence's length
>   is added to it.

It's not really relevant for this post, but note, that this only applies
to simple slicings, i.e. those of the form lower:upper. Extended
slicings, i-e- those with two colons, are handled differently.

Quote:
> So, the runtime normalizes negative numbers so that small-enough (large
> enough??<g>) negative numbers start counting from the end of the
> sequence.  i.e., aList[-1] returns the last element in the aList.  As
> currently written, UserList converts to zero any negative numbers that
> would otherwise raise an IndexError.

Negative i and j are mapped to 0 because they're already the result of
subtracting them from the length. If they're still negative when
__getslice__ is called they would again be subtracted from the length of
self.data and that would likely produce incorrect results. The correct
result in that case is an empty list.

Note that all slicings succeed for normal lists. If both indices of the
slice are too large or too small (i.e. too negative ;-) ) the result is
an empty list.

Quote:
> The resulting behavior is that a
> slice is returned rather than an IndexError being raised, thus:

>   >>> ul=UserList.UserList([0,1,2,3,4])
>   >>> ul[-1] # last element
>   4
>   >>> ul[-3:-1] # 3rd- and 2nd-to-the-last elements
>   [2,3]
>   >>> ul[-10] # calls UserList__getitem__(self,i)
>   Traceback (innermost last):
>     File "<interactive input>", line 1, in ?
>     File "UserList.py", line 29, in __getitem__
>       def __delitem__(self, i): del self.data[i]
>   IndexError: list index out of range
>   >>> ul[-10:-1] # should raise IndexError
>   [0, 1, 2, 3]

A normal list doesn't raise an IndexError here. It returns [0, 1, 2, 3]
Even ul[-10:-100] succeeds for a normal list, returning [] .

--
Bernhard Herzog   | Sketch, a drawing program for Unix



Sun, 01 Sep 2002 03:00:00 GMT  
 UserList.__getslice__(): copy.copy(self.data) vs. self.__class__(self.data).

Quote:

> Hi, Bernhard:

> In an article posted 15 Mar 2000 00:25:29 +0100,

> > >  Note that missing
> > >  i or j in the slice expression are replaced by zero or
> > >  sys.maxint, respectively,
> ...

You snipped the important part:

   and no further transformations on
  the indices is performed. The interpretation of negative
  indices and indices larger than the length of the sequence
  is up to the method.

Quote:
> > It seems negative indices are automatically subtracted from the length.

> Actually, this behavior matches the description in the Language Reference
> Manual.

I was referring to the last sentence which claims that negative indices
have to be interpreted by the method, which is not true for simple
slices. The other part of the reference manual that you quoted describes
this correctly.

Quote:
> OK, you and Alex win: UserList stays as it is, and I sit back down in my
> chair and shut-the-heck-up <g>.

> I was thinking that we'd be left with why test_userlist.py was
> failing.... but NOT!  Even that seems to have remedied itself:

>   D:\Python\Lib\test>python test_userlist.py

>   D:\Python\Lib\test>

> So, I'm truly sorry to have wasted everyone's time.  I was *way* off-
> based with my testing/assessment/findings.

Oh, this is probably Guido's time machine again. Perhaps there was a bug
in python 1.5.2 and Guido used his time machine to fix it, but for some
strange reason the ripples in the space-time continuum this always
causes have only just cought up with you. <wink>

--
Bernhard Herzog   | Sketch, a drawing program for Unix



Sun, 01 Sep 2002 03:00:00 GMT  
 UserList.__getslice__(): copy.copy(self.data) vs. self.__class__(self.data).

Quote:
> I was thinking that we'd be left with why test_userlist.py was
> failing.... but NOT!  Even that seems to have remedied itself:

>   D:\Python\Lib\test>python test_userlist.py

>   D:\Python\Lib\test>

That sort of thing happens to me all the time.  Sneaky machines are
always conspiring to make me look silly. :)

Alex.



Sun, 01 Sep 2002 03:00:00 GMT  
 
 [ 7 post ] 

 Relevant Pages 

1. Self-reproduction or self-changing of Web pages

2. Self-reproduction or self-change of Web page

3. Self-reproduction or self-changing of Web pages

4. self-replicating-code, self-replicating-messages

5. self-describing data

6. Self-Modifying Abstract Data Types

7. unformatted i/o - SOLUTION, and self-describing data formats

8. get a self data in a method call

9. Cls class instanceVariableNames: self class instVarNames.

10. Classes for Self-like programming in Smalltalk

11. Classes for Self-like programming in ST

 

 
Powered by phpBB® Forum Software