PEP 234: Iterators 
Author Message
 PEP 234: Iterators

Here's a revised version of the Iterator PEP (started by Ping).  I
plan to make this a standard language feature in python 2.2.

Everything described here is already available from the current CVS
tree for Python 2.2, but feedback on the PEP can still cause the
implementation to change.

If you want your feedback to reach the relevant parties, please send

list; I cannot find the time to read the comp.lang.python newsgroup.

--Guido van Rossum (home page: http://www.*-*-*.com/ ~guido/)

PEP: 234
Title: Iterators
Version: $Revision: 1.9 $

Status: Draft
Type: Standards Track
Python-Version: 2.1
Created: 30-Jan-2001
Post-History:

Abstract

    This document proposes an iteration interface that objects can
    provide to control the behaviour of 'for' loops.  Looping is
    customized by providing a method that produces an iterator object.
    The iterator provides a 'get next value' operation that produces
    the nxet item in the sequence each time it is called, raising an
    exception when no more items are available.

    In addition, specific iterators over the keys of a dictionary and
    over the lines of a file are proposed, and a proposal is made to
    allow spelling dict.kas_key(key) as "key in dict".

    Note: this is an almost complete rewrite of this PEP by the second
    author, describing the actual implementation checked into the
    trunk of the Python 2.2 CVS tree.  It is still open for
    discussion.  Some of the more esoteric proposals in the original
    version of this PEP have been withdrawn for now; these may be the
    subject of a separate PEP in the future.

C API Specification

    A new exception is defined, StopIteration, which can be used to
    signal the end of an iteration.

    A new slot named tp_iter for requesting an iterator is added to
    the type object structure.  This should be a function of one
    PyObject * argument returning a PyObject *, or NULL.  To use this
    slot, a new C API function PyObject_GetIter() is added, with the
    same signature as the tp_iter slot function.

    Another new slot, named tp_iternext, is added to the type
    structure, for obtaining the next value in the iteration.  To use
    this slot, a new C API function PyIter_Next() is added.  The
    signature for both the slot and the API function is as follows:
    the argument is a PyObject * and so is the return value.  When the
    return value is non-NULL, it is the next value in the iteration.
    When it is NULL, there are three possibilities:

    - No exception is set; this implies the end of the iteration.

    - The StopIteration exception (or a derived exception class) is
      set; this implies the end of the iteration.

    - Some other exception is set; this means that an error occurred
      that should be propagated normally.

    In addition to the tp_iternext slot, every iterator object must
    also implement a next() method, callable without arguments.  This
    should have the same semantics as the tp_iternext slot function,
    except that the only way to signal the end of the iteration is to
    raise StopIteration.  The iterator object should not care whether
    its tp_iternext slot function is called or its next() method, and
    the caller may mix calls arbitrarily.  (The next() method is for
    the benefit of Python code using iterators directly; the
    tp_iternext slot is added to make 'for' loops more efficient.)

    To ensure binary backwards compatibility, a new flag
    Py_TPFLAGS_HAVE_ITER is added to the set of flags in the tp_flags
    field, and to the default flags macro.  This flag must be tested
    before accessing the tp_iter or tp_iternext slots.  The macro
    PyIter_Check() tests whether an object has the appropriate flag
    set and has a non-NULL tp_iternext slot.  There is no such macro
    for the tp_iter slot (since the only place where this slot is
    referenced should be PyObject_GetIter()).

    (Note: the tp_iter slot can be present on any object; the
    tp_iternext slot should only be present on objects that act as
    iterators.)

    For backwards compatibility, the PyObject_GetIter() function
    implements fallback semantics when its argument is a sequence that
    does not implement a tp_iter function: a lightweight sequence
    iterator object is constructed in that case which iterates over
    the items of the sequence in the natural order.

    The Python bytecode generated for 'for' loops is changed to use
    new opcodes, GET_ITER and FOR_ITER, that use the iterator protocol
    rather than the sequence protocol to get the next value for the
    loop variable.  This makes it possible to use a 'for' loop to loop
    over non-sequence objects that support the tp_iter slot.  Other
    places where the interpreter loops over the values of a sequence
    should also be changed to use iterators.

    Iterators ought to implement the tp_iter slot as returning a
    reference to themselves; this is needed to make it possible to
    use an iterator (as opposed to a sequence) in a for loop.

Python API Specification

    The StopIteration exception is made visiable as one of the
    standard exceptions.  It is derived from Exception.

    A new built-in function is defined, iter(), which can be called in
    two ways:

    - iter(obj) calls PyObject_GetIter(obj).

    - iter(callable, sentinel) returns a special kind of iterator that
      calls the callable to produce a new value, and compares the
      return value to the sentinel value.  If the return value equals
      the sentinel, this signals the end of the iteration and
      StopIteration is raised rather than returning normal; if the
      return value does not equal the sentinel, it is returned as the
      next value from the iterator.  If the callable raises an
      exception, this is propagated normally; in particular, the
      function is allowed to raise StopError as an alternative way to
      end the iteration.  (This functionality is available from the C
      API as PyCallIter_New(callable, sentinel).)

    Iterator objects returned by either form of iter() have a next()
    method.  This method either returns the next value in the
    iteration, or raises StopError (or a derived exception class) to
    signal the end of the iteration.  Any other exception should be
    considered to signify an error and should be propagated normally,
    not taken to mean the end of the iteration.

    Classes can define how they are iterated over by defining an
    __iter__() method; this should take no additional arguments and
    return a valid iterator object.  A class is a valid iterator
    object when it defines a next() method that behaves as described
    above.  A class that wants to be an iterator also ought to
    implement __iter__() returning itself.

Dictionary Iterators

    The following two proposals are somewhat controversial.  They are
    also independent from the main iterator implementation.  However,
    they are both very useful.

    - Dictionaries implement a sq_contains slot that implements the
      same test as the has_key() method.  This means that we can write

          if k in dict: ...

      which is equivalent to

          if dict.has_key(k): ...

    - Dictionaries implement a tp_iter slot that returns an efficient
      iterator that iterates over the keys of the dictionary.  During
      such an iteration, the dictionary should not be modified, except
      that setting the value for an existing key is allowed (deletions
      or additions are not, nor is the update() method).  This means
      that we can write

          for k in dict: ...

      which is equivalent to, but much faster than

          for k in dict.keys(): ...

      as long as the restriction on modifications to the dictionary
      (either by the loop or by another thread) are not violated.

    If this proposal is accepted, it makes sense to recommend that
    other mappings, if they support iterators at all, should also
    iterate over the keys.  However, this should not be taken as an
    absolute rule; specific applications may have different
    requirements.

File Iterators

    The following proposal is not controversial, but should be
    considered a separate step after introducing the iterator
    framework described above.  It is useful because it provides us
    with a good answer to the complaint that the common idiom to
    iterate over the lines of a file is ugly and slow.

    - Files implement a tp_iter slot that is equivalent to
      iter(f.readline, "").  This means that we can write

          for line in file:
              ...

      as a shorthand for

          for line in iter(file.readline, ""):
              ...

      which is equivalent to, but faster than

          while 1:
              line = file.readline()
              if not line:
                  break
              ...

    This also shows that some iterators are destructive: they consume
    all the values and a second iterator cannot easily be created that
    iterates independently over the same values.  You could open the
    file for a second time, or seek() to the beginning, but these
    solutions don't work for all file types, e.g. they don't work when
    the open file object really represents a pipe or a stream socket.

Rationale

    If all the parts of the proposal are included, this addresses many
    concerns in a consistent and flexible fashion.  Among its chief
    virtues are the following three -- no, four -- no, five -- points:

    1. It provides an extensible iterator interface.

    1. It allows performance enhancements to list iteration.

    3. It allows big performance enhancements to dictionary iteration.

    4. It allows one to provide an interface for just iteration
       without pretending to
...

read more »



Sat, 18 Oct 2003 11:26:03 GMT  
 PEP 234: Iterators

Quote:
>           for line in file:
>               ...

Cool.

Quote:
> Among its chief virtues are the following three -- no, four -- no,
> five -- points:

I'd add:

    6. It makes code iterating over non-sequences more concise and
       readable.

Quote:
>     - The name iter() is an abbreviation.  Alternatives proposed
>       include iterate(), harp(), traverse(), narrate().

IMHO, a verb conveys the wrong meaning because the function returns
an iterator object -- it doesn't iterate over anything itself.

The clearest name for this function would be `iterator' <surprise>,
but `iter' would work just as well as `len', `str', `repr', and all the
other abbreviated builtins do.

`harp' and the like are just silly -- whoever wants to use an
obfuscated programming language probably knows where to get it.

Quote:
>     - Using the same name for two different operations (getting an
>       iterator from an object and making an iterator for a function
>       with an sentinel value) is somewhat ugly.  I haven't seen a
>       better name for the second operation though.

I assume `ugly' refers to the implementation.

As both operations return an iterator object it seems natural to use
the same name for them. Having to remember two different names for
iterator-returning operations looks ugly to me.

Quote:
>     - There is still discussion about whether

>           for x in dict: ...

>       should assign x the successive keys, values, or items of the
>       dictionary.  The symmetry between "if x in y" and "for x in y"
>       suggests that it should iterate over keys.  This symmetry has been
>       observed by many independently and has even been used to "explain"
>       one using the other.  This is because for sequences, "if x in y"
>       iterates over y comparing the iterated values to x.  If we adopt
>       both of the above proposals, this will also hold for
>       dictionaries.

If `if x in dict' is implemented (which I hope) making `for x in dict'
iterate over something other than the keys would be very confusing.

Quote:
> We could also add methods to dictionaries that return different kinds
> of iterators

Please do so.

    for k, v in dict.itemiter():
        ...

is both more readable and more efficient than

    for k in dict:
        v = dict[k]
        ...

The item-iterator can also be passed around whereas the other form
cannot.

--

Glasauergasse 32                                       Tel: +43 1 876 62 36
A-1130 Vienna, Austria                                 Fax: +43 1 877 66 92



Sat, 18 Oct 2003 15:18:54 GMT  
 PEP 234: Iterators

Quote:
>Dictionary Iterators

>     This means that we can write

>          for k in dict: ...

>      which is equivalent to, but much faster than

faster at runtime, and less typing too ;-)

Quote:

>          for k in dict.keys(): ...

>      as long as the restriction on modifications to the dictionary
>      (either by the loop or by another thread) are not violated.

What happens if you do violate this? I assume that damage is limited
to seing elements twice, or skipping some elements and not seeing them
at all.

Toby{*filter*}enson



Sat, 18 Oct 2003 18:35:15 GMT  
 PEP 234: Iterators

Quote:

> What happens if you do violate this? I assume that damage is limited
> to seing elements twice, or skipping some elements and not seeing them
> at all.

You (probably) get an exception.  If you're really careful, you could
probably miss elements, and if you're really really careful you could
probably get it to hit elements twice (esp. if you do something dense
like classes whose __hash__ always returns 1, for example).

It's probably worrying that I can cook up things like:

->> d = {1:1, 2:2, 3:3, 4:4}
/>> for i in d:
|..     print d[i]
|..     if i == 4:
|..         d[9] = 5
\__
4
3
2
1
5
->> d = {1:1, 2:2, 3:3, 4:4}
/>> for i in d:
|..     print d[i]
|..     if i == 4:
|..         d[12] = 12
\__
4
3
2
1

Often, though, you'll get:

->> d = {1:1, 2:2}
/>> for i in d:
|..     print d[i]
|..     if i == 2:
|..         d[12] = 12
\__
2
Traceback (most recent call last):
  File "<input>", line 1, in ?
RuntimeError: dictionary changed size during iteration

basically-the-message-is-"don't-do-that"-ly y'rs
M.

--
  If you give someone fortran, he has Fortran.
  If you give someone Lisp, he has any language he pleases.
    -- Guy L. Steele Jr, quoted by David Rush in comp.lang.scheme.scsh



Sat, 18 Oct 2003 23:35:17 GMT  
 PEP 234: Iterators

Quote:

> > What happens if you do violate this? I assume that damage is limited
> > to seing elements twice, or skipping some elements and not seeing them
> > at all.

> You (probably) get an exception.

It would be nice if this could be made reliable.

--

Shareware computer games           -           http://rainerdeyke.com
"In ihren Reihen zu stehen heisst unter Feinden zu kaempfen" - Abigor



Sat, 18 Oct 2003 23:46:11 GMT  
 PEP 234: Iterators

Quote:




> > > What happens if you do violate this? I assume that damage is limited
> > > to seing elements twice, or skipping some elements and not seeing them
> > > at all.

> > You (probably) get an exception.

> It would be nice if this could be made reliable.

Tricky, that.  It could be made somewhat more reliable, but I think
making it totally certain would have prohibitive drawbacks,
particularly as stuff like:

for i in d:
    d[i] += 1

*is* supported.

Cheers,
M.

--
  You sound surprised.  We're talking about a government department
  here - they have procedures, not intelligence.
                                            -- Ben Hutchings, cam.misc



Sun, 19 Oct 2003 01:30:56 GMT  
 PEP 234: Iterators

[ getting an exception when you modify a dict during iteration ]

Quote:
> It would be nice if this could be made reliable.

> Tricky, that.  It could be made somewhat more reliable, but I think
> making it totally certain would have prohibitive drawbacks,
> particularly as stuff like:

> for i in d:
>     d[i] += 1

> *is* supported.

I think you get a RuntimeError if the dictionary gets resized: the dict
iterator object can check cheaply for that event. Perhaps if the dict
object grew a magic counter, which would get incremented when a new
key is added or a key is deleted, and the dict iterator would check for
*that* value, this would be more reliable. I can't judge whether this
approach would be too expensive or not.

Just



Sun, 19 Oct 2003 02:27:26 GMT  
 PEP 234: Iterators

Quote:

> [ getting an exception when you modify a dict during iteration ]
> > It would be nice if this could be made reliable.


> > Tricky, that.  It could be made somewhat more reliable, but I think
> > making it totally certain would have prohibitive drawbacks,
> > particularly as stuff like:

> > for i in d:
> >     d[i] += 1

> > *is* supported.

> I think you get a RuntimeError if the dictionary gets resized: the dict
> iterator object can check cheaply for that event.

Yes.

Quote:
> Perhaps if the dict object grew a magic counter, which would get
> incremented when a new key is added or a key is deleted, and the
> dict iterator would check for *that* value, this would be more
> reliable.

It's not very hard, as it turns out:

Index: dictobject.c
===================================================================
RCS file: /cvsroot/python/python/dist/src/Objects/dictobject.c,v
retrieving revision 2.81
diff -c -1 -r2.81 dictobject.c
*** dictobject.c        2001/05/01 12:10:21     2.81
--- dictobject.c        2001/05/01 22:12:43
***************
*** 103,104 ****
--- 103,105 ----
        int ma_poly;  /* appopriate entry from polys vector */
+       int ma_altercount; /* magic to prevent dict mutation on iteration */
        dictentry *ma_table;
***************
*** 144,145 ****
--- 145,147 ----
        mp->ma_used = 0;
+       mp->ma_altercount = 0;
        mp->ma_lookup = lookdict_string;
***************
*** 367,370 ****
        else {
!               if (ep->me_key == NULL)
                        mp->ma_fill++;
                else
--- 369,374 ----
        else {
!               if (ep->me_key == NULL) {
                        mp->ma_fill++;
+                       mp->ma_altercount++;
+               }
                else
***************
*** 547,548 ****
--- 551,553 ----
                goto empty;
+       mp->ma_altercount++;
        ep = (mp->ma_lookup)(mp, key, hash);
***************
*** 1481,1483 ****
        dictobject *di_dict;
!       int di_size;
        int di_pos;
--- 1486,1488 ----
        dictobject *di_dict;
!       int di_altercount;
        int di_pos;
***************
*** 1495,1497 ****
        di->di_dict = dict;
!       di->di_size = dict->ma_size;
        di->di_pos = 0;
--- 1500,1502 ----
        di->di_dict = dict;
!       di->di_altercount = dict->ma_altercount;
        di->di_pos = 0;
***************
*** 1513,1517 ****

!       if (di->di_size != di->di_dict->ma_size) {
                PyErr_SetString(PyExc_RuntimeError,
!                               "dictionary changed size during iteration");
                return NULL;
--- 1518,1522 ----

!       if (di->di_altercount != di->di_dict->ma_altercount) {
                PyErr_SetString(PyExc_RuntimeError,
!                               "dictionary altered during iteration");
                return NULL;

Quote:
> I can't judge whether this approach would be too expensive or not.

Me neither.  It's at least possible.  I could run a pybench run, but
right now I'm going to bed.

see-you-tomorrow-ly y'rs,
M.

--
  "The future" has arrived but they forgot to update the docs.
                                        -- R. David Murray, 9 May 2000



Sun, 19 Oct 2003 06:15:00 GMT  
 PEP 234: Iterators
I really like the idea of returning different kinds of iterators for
dictionaries as described:

Quote:
>           for key, value in dict.iteritems(): ...

>           for value in dict.itervalues(): ...

>           for key in dict.iterkeys(): ...

But in case nobody has suggested it yet, perhaps the default case:

for x in dict

should iterate over items, and that:

key,value in dict

would then maintain the desirable symmetry.

This would open up some further interesting possibilities such as specifying
either the key or the value
as an object implementing the __eq__ operation in various ways (regular
expressions, etc...).
... but I'm sure you've all already thought of that: :-)

-- Peter Caven

---------------------



Quote:
> Here's a revised version of the Iterator PEP (started by Ping).  I
> plan to make this a standard language feature in Python 2.2.

--snipped--

Quote:
>     - There is still discussion about whether

>           for x in dict: ...

>       should assign x the successive keys, values, or items of the
>       dictionary.  The symmetry between "if x in y" and "for x in y"
>       suggests that it should iterate over keys.  This symmetry has been
>       observed by many independently and has even been used to "explain"
>       one using the other.  This is because for sequences, "if x in y"
>       iterates over y comparing the iterated values to x.  If we adopt
>       both of the above proposals, this will also hold for
>       dictionaries.

>       The argument against making "for x in dict" iterate over the keys
>       comes mostly from a practicality point of view: scans of the
>       standard library show that there are about as many uses of "for x
>       in dict.items()" as there are of "for x in dict.keys()", with the
>       items() version having a small majority.  Presumably many of the
>       loops using keys() use the corresponding value anyway, by writing
>       dict[x], so (the argument goes) by making both the key and value
>       available, we could support the largest number of cases.  While
>       this is true, I (Guido) find the correspondence between "for x in
>       dict" and "if x in dict" too compelling to break, and there's not
>       much overhead in having to write dict[x] to explicitly get the
>       value.  We could also add methods to dictionaries that return
>       different kinds of iterators, e.g.

>           for key, value in dict.iteritems(): ...

>           for value in dict.itervalues(): ...

>           for key in dict.iterkeys(): ...

--snipped--


Sun, 19 Oct 2003 14:39:06 GMT  
 PEP 234: Iterators

Quote:

> I really like the idea of returning different kinds of iterators for
> dictionaries as described:

> >           for key, value in dict.iteritems(): ...

> >           for value in dict.itervalues(): ...

> >           for key in dict.iterkeys(): ...

Me too.

Quote:
> But in case nobody has suggested it yet, perhaps the default case:

> for x in dict

> should iterate over items, and that:

> key,value in dict

> would then maintain the desirable symmetry.

I *think* the problem with this is that we want to write:

    if key in dict:
        ...

You're almost never going to want to write "if item in dict", and
having different meanings for "x in dict" in different contexts would
be insane.

On reflection, I think I think (<wink>) of dicts more as containing
keys, and associating data with each key.  I don't think of them as
containing pairs.

Quote:
> This would open up some further interesting possibilities such as specifying
> either the key or the value
> as an object implementing the __eq__ operation in various ways (regular
> expressions, etc...).
> ... but I'm sure you've all already thought of that: :-)

Eh?  

Cheers,
M.

--
  Clue: You've got the appropriate amount of hostility for the
  Monastery, however you are metaphorically getting out of the
  safari jeep and kicking the lions.                         -- coonec
               -- http://home.xnet.com/~raven/Sysadmin/ASR.Quotes.html



Sun, 19 Oct 2003 19:01:38 GMT  
 PEP 234: Iterators

Quote:

>> I really like the idea of returning different kinds of iterators for
>> dictionaries as described:

>> >           for key, value in dict.iteritems(): ...
>> >           for value in dict.itervalues(): ...
>> >           for key in dict.iterkeys(): ...

>Me too.

What about xitems, xvalues, xkeys (as in xrange?)

Sincerely yours, Roman Suzi
--

_/ Wednesday, May 02, 2001 _/ Powered by Linux RedHat 6.2 _/
_/ "Cats have nine lives - but sleep through eight of them." _/



Sun, 19 Oct 2003 19:44:55 GMT  
 PEP 234: Iterators

Quote:



> >> I really like the idea of returning different kinds of iterators for
> >> dictionaries as described:

> >> >           for key, value in dict.iteritems(): ...
> >> >           for value in dict.itervalues(): ...
> >> >           for key in dict.iterkeys(): ...

> >Me too.

> What about xitems, xvalues, xkeys (as in xrange?)

Please read the PEP: iterators largely make these redundant. Eg. dict.iteritems()
doesn't return a list of items (we already have dict.items() for that) but an
_iterator_, allowing you to iterate over the items _without_ creating a list
containing them all...

Just



Sun, 19 Oct 2003 20:19:56 GMT  
 PEP 234: Iterators

Quote:

> > What about xitems, xvalues, xkeys (as in xrange?)

> Please read the PEP: iterators largely make these
> redundant. Eg. dict.iteritems() doesn't return a list of items (we
> already have dict.items() for that) but an _iterator_, allowing you
> to iterate over the items _without_ creating a list containing them
> all...

Much as xrange presently doesn't return a list of items, but an object
that returns the next item with each successive call to __getitem__.
I think his point was that having methods on the mapping objects named
xitems, xvalues, and xkeys instead of iteritems, itervalues, and
iterkeys would be inline with names of the present xrange and
xreadlines functions.

--

"I wouldn't be surprised if the architecture of Intel's microprocessors
were eventually linked to the eventual fall of mankind." Steve Gibson



Sun, 19 Oct 2003 21:09:25 GMT  
 PEP 234: Iterators

Quote:


> > > What about xitems, xvalues, xkeys (as in xrange?)

> > Please read the PEP: iterators largely make these
> > redundant. Eg. dict.iteritems() doesn't return a list of items (we
> > already have dict.items() for that) but an _iterator_, allowing you
> > to iterate over the items _without_ creating a list containing them
> > all...

> Much as xrange presently doesn't return a list of items, but an object
> that returns the next item with each successive call to __getitem__.
> I think his point was that having methods on the mapping objects named
> xitems, xvalues, and xkeys instead of iteritems, itervalues, and
> iterkeys would be inline with names of the present xrange and
> xreadlines functions.

But .iteritems(), etc. are much better names.  So there.

Cheers,
M.

--
  If trees could scream, would we be so cavalier about cutting them
  down? We might, if they screamed all the time, for no good reason.
                                                        -- Jack Handey



Sun, 19 Oct 2003 21:25:50 GMT  
 PEP 234: Iterators

Quote:




>> >> I really like the idea of returning different kinds of iterators for
>> >> dictionaries as described:

>> >> >           for key, value in dict.iteritems(): ...
>> >> >           for value in dict.itervalues(): ...
>> >> >           for key in dict.iterkeys(): ...

>> >Me too.

>> What about xitems, xvalues, xkeys (as in xrange?)

>Please read the PEP: iterators largely make these redundant. Eg. dict.iteritems()
>doesn't return a list of items (we already have dict.items() for that) but an
>_iterator_, allowing you to iterate over the items _without_ creating a list
>containing them all...

I mean, will not be xrange renamed to iterrange - it must be also
an iterator, or will it stay as is, with special xrange object
instead of iterator?

I mean, if explicit iterators are introduced to Python, why not to
change xrange into iterator for uniformity, in the same PEP?

Quote:
>Just

Sincerely yours, Roman Suzi
--

_/ Wednesday, May 02, 2001 _/ Powered by Linux RedHat 6.2 _/
_/ "Cats have nine lives - but sleep through eight of them." _/


Sun, 19 Oct 2003 21:05:22 GMT  
 
 [ 25 post ]  Go to page: [1] [2]

 Relevant Pages 

1. PEP 234: Iterators

2. PEP 234: Iterators

3. PEP 234: Iterators (fwd)

4. Discussion about PEP 234: iterators

5. PEP 234: Iterators (fwd)

6. PEP 234 little bug?

7. Jun for Java 234

8. API routine CryptEncrypt() returns error 234

9. PEP 276 Simple Iterator for ints (fwd)

10. PEP 276 Simple Iterator for ints (fwd)

11. PEP 276 Simple Iterator for ints

12. PEP 276 Simple Iterator for ints

 

 
Powered by phpBB® Forum Software