Discussion about PEP 234: iterators 
Author Message
 Discussion about PEP 234: iterators

This is in reaction to the new iterator proposal
< http://www.*-*-*.com/ ;:

My own reaction is "great!" one misgiving: it does not address a very
common need: to iterate over a dictionary in key-sorted order. Is there
some variant syntax that could handle this case?

(Better yet, and going out on thin ice here, could this be made the
default? I realize that's unlikely and may anger some people to even
suggest it, but I would like to point out that some languages actually
store dictionaries with their keys ordered, so it's at least technically
feasible).

If this issue cannot be addressed, could we at least get "sort" to
return the sorted collection, so one could do:

for key in dictionary.keys().sort():

-- Russell



Wed, 06 Aug 2003 01:41:05 GMT  
 Discussion about PEP 234: iterators


Quote:
> This is in reaction to the new iterator proposal
> <http://python.sourceforge.net/peps/pep-0234.html>:

> My own reaction is "great!" one misgiving: it does not address a very
> common need: to iterate over a dictionary in key-sorted order. Is there
> some variant syntax that could handle this case?

> (Better yet, and going out on thin ice here, could this be made the
> default? I realize that's unlikely and may anger some people to even
> suggest it, but I would like to point out that some languages actually
> store dictionaries with their keys ordered, so it's at least technically
> feasible).

One can choose to store mappings using sorted order on keys (that,
for example, is how C++'s standard std::map is specified to work),
but performance for all other tasks can suffer quite substantially
when compared to hash tables.  Python uses dictionaries so widely
that _any_ hit on dictionary performance would not be acceptable.

I would suggest leaving dictionaries alone and supplying another
instance of mapping, based on red-black trees or some variant
thereon, if the sorted-keys requirement is indeed so widespread.

Quote:
> If this issue cannot be addressed, could we at least get "sort" to
> return the sorted collection, so one could do:

> for key in dictionary.keys().sort():

Why not

def sort(adisposablelist):
    adisposablelist.sort()
    return adisposablelist

and then

for key in sort(dictionary.keys()):

Is the syntax-sugar difference so big, after all?

Alex



Wed, 06 Aug 2003 03:39:56 GMT  
 Discussion about PEP 234: iterators
I think dictionary iteration should match iterations on other objects/types
as closely as possible unless performance is impacted.

for elem in list                # does not automatically sort
for key in dict                # so this should not automatically sort

for key:value in dict        # proposed
for index:item in list        # so this should work

if elem in list                # this works
if key in dict                # so this should work

map( fun, list )            # this works
map( func, dict )        # so this should work with functions that take a key as
a parameter

filter( boolfun, list )        # gives a possibly smaller list
filter( boolfun, dict )        # so this should return a possibly smaller
dictionary

Thanks,

Raymond



Wed, 06 Aug 2003 06:09:47 GMT  
 Discussion about PEP 234: iterators


Quote:
> I think dictionary iteration should match iterations on other
objects/types
> as closely as possible unless performance is impacted.

Ok then, let me put on my pedantic hat.
.
.
Oof, that's a tight fit.

Quote:
> for elem in list                # does not automatically sort
> for key in dict                # so this should not automatically sort

> for key:value in dict        # proposed
> for index:item in list        # so this should work

Alright, looking at what you have so far, you've established a one to one
correspondence between dictionary keys and sequence indices as well as
between dictionary values and sequence values. That seems to make sense, but
this next bit is where we run into trouble -- consistency runs head on into
practicality and peoples expectations.

Quote:
> if elem in list                # this works
> if key in dict                # so this should work

If one is going to go by the correspondence that you've set up above, then
really the dictionary equivalent should be:

if value in dict:

Unfortunately, people tend to think of a dictionary as containing keys more
than they think of it as containing definitions (values). So the form you
propose is better from that point of view. It's also probably more useful in
a practical sense as well. It is kind of distressing that it's not
consistent with the correspondence that's set up between keys<->indices and
values<->values though.

Quote:
> map( fun, list )            # this works
> map( func, dict )        # so this should work with functions that take a
key as
> a parameter

This is not obvious to me. I would think that this would need a function
that takes values as parameters and would produce:

{key1 : func(val1), key2 : func(val2) ....}

I can't even figure out how this would work if func operated on keys. That
seems to be a bad sign for the obviousness of this construct.....

Quote:
> filter( boolfun, list )        # gives a possibly smaller list
> filter( boolfun, dict )        # so this should return a possibly smaller
> dictionary

But how would it work? If were to guess at the meaning of something like
this I'd get:

dict = {}
for key, value in dict.keys():
   if boolfun(value):
      dict[key] = value

But being as we disagreed above, I'm not sure that we agree here.

I too would like to see everything as consistent as possible (but no
consistenter[sic]), but I'm not sure that agreement can be reached on that.

-tim



Wed, 06 Aug 2003 06:39:16 GMT  
 Discussion about PEP 234: iterators

Quote:



> > I think dictionary iteration should match iterations on other
> objects/types
> > as closely as possible unless performance is impacted.
> ...

> > if elem in list                # this works
> > if key in dict                # so this should work

> If one is going to go by the correspondence that you've set up above, then
> really the dictionary equivalent should be:

> if value in dict:

> Unfortunately, people tend to think of a dictionary as containing keys more
> than they think of it as containing definitions (values). So the form you
> propose is better from that point of view. It's also probably more useful in
> a practical sense as well. It is kind of distressing that it's not
> consistent with the correspondence that's set up between keys<->indices and
> values<->values though.

Actually, given my Smalltalk background, I tend to see Dictionaries as
containing Associations (an Assocaiation is a key value pair)  Thus looping over
a Dictionary returns an association on each iteration.

To carry this to Python, I would like to see looping over Dictionaries return a
Tuple of the key,value pair.  Allowing either:

    for assoc in dictionary:
        print "key %s = value %s" % (assoc[0], assoc[1])

or

    for (key, value) in dictionary:
        print "key %s = value %s" % (key,value)

Take care,
Jay O'Connor



Wed, 06 Aug 2003 06:56:38 GMT  
 Discussion about PEP 234: iterators
[Russell E. Owen]

Quote:
> This is in reaction to the new iterator proposal
> <http://python.sourceforge.net/peps/pep-0234.html>:

> My own reaction is "great!" one misgiving: it does not address a very
> common need: to iterate over a dictionary in key-sorted order. Is there
> some variant syntax that could handle this case?

There's always *some* variant syntax that could be forced to work.  For
example,

   for key lambda dict:

<wink>.

Quote:
> (Better yet, and going out on thin ice here, could this be made the
> default?

No.  dicts can grow very large, and forcing people to pay for a sort when
they don't want it would raise justified howls of protest.  If you want it
sorted, other replies have shown you easy ways to get it sorted.

Quote:
> I realize that's unlikely and may anger some people to even suggest it,

Not at all.  I only get pissed off when I have to *reply* <wink>.

Quote:
> but I would like to point out that some languages actually  store
> dictionaries with their keys ordered, so it's at least technically
> feasible).

In fact, the language Guido worked on before python (ABC) implemented dicts
as balanced binary trees.  So they were always ordered.  He really disliked
it, because the truly common dict operations (insert and lookup) were much
slower than they are using Python's hash tables.  Note too that Python
(unlike ABC) doesn't define a total ordering over all values of all types,
so it's not currently possible "even in theory" to store all dicts in sorted
order ("sorted order" isn't currently well-defined in all cases).

you're-one-two-line-function-away-from-bliss-ly y'rs  - tim



Wed, 06 Aug 2003 13:19:49 GMT  
 Discussion about PEP 234: iterators

Quote:

> [snip discussion of consistency of "if thing in dict:"]

> Unfortunately, people tend to think of a dictionary as containing keys
> more than they think of it as containing definitions (values). So the form
> you propose is better from that point of view. It's also probably more
> useful in a practical sense as well. It is kind of distressing that it's
> not consistent with the correspondence that's set up between
> keys<->indices and values<->values though.

Hmm... I tend to think of a dictionary as containing values more than keys.
 The keys are just there so you can get to the values, right?  Others tend
to think dicts contain (key, value) pairs.  Everybody has a different
opinion on this one.  Rather than supplying an arbitrary default, which
would lead to confusion, how about we let the programmer specify things
explicitly.

How about we make it:
if key in dict.keys():
and
if value in dict.values():
and maybe even
if (key, value) in dict.items():

That way it's perfectly clear what the programmer intended.  I think
there's a really good chance of Guido accepting this proposal, and I've
already written a fully backwards-compatible patch.  Anybody up for a trip
back in time?

-n8

--
_.~'^`~._.~'^`~._.~'^`~._.~'^`~._.~'^`~._
             Nathaniel Gray
   California Institute of Technology
     Computation and Neural Systems
     n8gray <at> caltech <dot> edu
_.~'^`~._.~'^`~._.~'^`~._.~'^`~._.~'^`~._



Thu, 07 Aug 2003 15:10:47 GMT  
 Discussion about PEP 234: iterators

Quote:

> I think dictionary iteration should match iterations on other
> objects/types as closely as possible unless performance is impacted.

> for elem in list                # does not automatically sort
> for key in dict                # so this should not automatically sort

What about
for value in dict          # Should this sort?

Quote:
> for key:value in dict        # proposed
> for index:item in list        # so this should work

> if elem in list                # this works
> if key in dict                # so this should work

Don't forget
if value in dict

Quote:
> map( fun, list )            # this works
> map( func, dict )        # so this should work with functions that take a
> key as a parameter

And those that take a value as a parameter too, right?

Don't worry, I'm sure RMS can throw something together with soundex that'll
be able to tell when you meant keys and when you meant values.

I know I'm being a stickler, but the default of "if thing in dict" is not
at all clear to me, and it appears I'm not the only one.  Anecdotal
evidence from Thomas Wouters (I believe) suggests that a significant number
of programmers expect "thing" to be a value, not a key.  Others expect a
(key, value) pair.  It just seems there's no clear default.  Oh, and some
guy named Guido used to agree with me on this one too...

Unlike many of the other folks in opposition to "if key in dict", I'm not
totally turned off by the "for key:value in dict" syntax and all of its
variants.  The syntax is easy to learn and it's explicit, if shorthand, for
the programmer's intent.  Why not use the same syntax for the "if key:value
in dict" version?  That way nobody'll be happy!  ;^)

(except for me, but I won't complain)

-n8

--
_.~'^`~._.~'^`~._.~'^`~._.~'^`~._.~'^`~._
             Nathaniel Gray
   California Institute of Technology
     Computation and Neural Systems
     n8gray <at> caltech <dot> edu
_.~'^`~._.~'^`~._.~'^`~._.~'^`~._.~'^`~._



Thu, 07 Aug 2003 16:29:38 GMT  
 Discussion about PEP 234: iterators
Hello all,

I'd like to thank you for your thoughts and comments about
PEP 234 and invite you to join a discussion group on the
topic of iteration in Python.


as a place where we can focus a conversation on this issue.
If you're interested in talking about it, please join!

    http://groups.yahoo.com/group/python-iter

Thanks!

-- Ping



Thu, 07 Aug 2003 18:48:58 GMT  
 Discussion about PEP 234: iterators


Quote:

>Hmm... I tend to think of a dictionary as containing values more than keys.
> The keys are just there so you can get to the values, right?  Others tend
>to think dicts contain (key, value) pairs.  Everybody has a different
>opinion on this one.  Rather than supplying an arbitrary default, which
>would lead to confusion, how about we let the programmer specify things
>explicitly.

>How about we make it:
>if key in dict.keys():
>and
>if value in dict.values():
>and maybe even
>if (key, value) in dict.items():

The problem with this is absolutely horrendous speed/memory performance
with any large dictionary.  Your code generates a list AND THEN DOES A
LINEAR SEARCH.  No, no, no, no, no, no, no.  We've already got

if dict.has_key(key):

and

if key in dict:

is a thin layer of sugar-coating designed to create greater usage
equivalence between lists and dicts.  I'm not particularly fond of it,
but it does have the advantage of not being forced to remember whether
has_key() has an underscore in it.
--

Androgynous poly {*filter*} vanilla {*filter*} het    <*>     http://www.*-*-*.com/
Hugs and backrubs -- I break Rule 6

Why doesn't "Just Say NO" include caffeine, nicotine, {*filter*}, Prozac,
and Ritalin?  --Aahz



Fri, 08 Aug 2003 00:00:00 GMT  
 Discussion about PEP 234: iterators

Quote:


> >How about we make it:
> >if key in dict.keys():
> >if value in dict.values():
> >if (key, value) in dict.items():

> The problem with this is absolutely horrendous speed/memory performance
> with any large dictionary.  Your code generates a list AND THEN DOES A
> LINEAR SEARCH.  No, no, no, no, no, no, no.  We've already got

Instead of inventing new syntax for inefficent idioms, we
could just make the idioms more efficent, by having values(),
keys(), and items() return proxy objects with __contains__()
and sequential __getitem__() functions.

Alas, this isn't backwards compatible with old code that
expects a real list object.  Possible solutions, in descending
order of sanity:

- make new functions called xkeys(), xvalues(), etc. to
  return the proxies.  Pretty ugly, but consistent with
  xrange() and xreadlines().

- just live with it.

- make 'dict.keys' return the proxy object, which is
  callable.  Make proxy.__call__() return an actual
  list from the proxy so dict.keys() works as before.
  So cute I'm ashamed I even thought of it.


  currently specializing in poorly thought out solutions.



Fri, 08 Aug 2003 11:32:41 GMT  
 Discussion about PEP 234: iterators

Quote:

>- make 'dict.keys' return the proxy object, which is
>  callable.  Make proxy.__call__() return an actual
>  list from the proxy so dict.keys() works as before.
>  So cute I'm ashamed I even thought of it.

What a bright idea!  This is the best solution I've seen so far.  I'm
surprised no one has commented on this yet.  So let me spell it out to
generate more interest.

With this proposal, all of these work as expected

for k in dict.keys:
for k, v in dict.items:
for v in dict.values:

without creating extra list.  And all of these work as expected

if k in dict.keys:
if k, v in dict.items:
if v in dict.values:

Only the last need linear search, which is natural.

There will be no confusion over what 'in' means for dict.  There is no
clutter of code.  And "explicit is better than implicit".

Furthermore, all the old codes work as well, like

a = dict.keys()
v = dict.values()[3]
x, y = dict.items()[2]

Whatever the actual implementation would be, this seems to offer the
cleanest syntax space for iterators and has_item proxies and so on whenever
they become available.  

In the actual implementation, all iterators need to provide three magical
methods: __getitem__, __contains__ and __call__ with the usual sematic
restrictions:

__contains__ return true iff __getitem__ will get it at least once.
__call__ will return a list of objects returned by __getitem__.
__getitem__ raise IndexError at the end.

Huaiyu



Sat, 09 Aug 2003 11:23:22 GMT  
 
 [ 18 post ]  Go to page: [1] [2]

 Relevant Pages 

1. PEP 234: Iterators

2. PEP 234: Iterators

3. PEP 234: Iterators

4. PEP 234: Iterators (fwd)

5. PEP 234: Iterators (fwd)

6. PEP 234 little bug?

7. Meta: PEP discussion (was Re: PEP 255: Simple Generators)

8. Jun for Java 234

9. API routine CryptEncrypt() returns error 234

10. PEP 276 Simple Iterator for ints (fwd)

11. PEP 276 Simple Iterator for ints (fwd)

12. PEP 276 Simple Iterator for ints

 

 
Powered by phpBB® Forum Software