NEWBIE QUESTIION: Comparing Lists in Python 
Author Message
 NEWBIE QUESTIION: Comparing Lists in Python

Hi everyone,

I haven't been using python for very long, so I hope you will forgive me if
I am asking a stupid question, or doing this in a stupid way.

I am trying to compare two lists in Python, which have been passed into it.
I want to return the difference between the two lists as again two lists

eg:

list 1 = item1, item5, item8, item10
list2 = item 3, item4, item5

I want to return

returnlist1 = item1, item8, item10
returnlist2 = item3, item4

I haven't been working in python very long, but from what I have read, the
best way is to convert these to a dictionary, and compare there.  This is
what I have done so far.

list1 = list1.split(',')
list2 = list2.split(',')

returnlist1 = []
returnlist2 = []

d = {}
for item in list1:
  if item not in list2:
   d[item] = item
if d:
 returnlist1 = d.values()

d2 = {}
for item in list2:
 if item not in list1:
    return item
    d2[item] = item
if d2:
 returnlist2 = d2.values()

return [returnlist1,returnlist2]

I found if I didn't include the split, but just passed in a list like
"list1=[item1, item3, item] it would work only on 1 character at a time and
not recognise it was a split.

Please let me know if I am doing this in a very stupid way.  The problem is
that it isn't always recognising the lists, or comparing correctly.  It
finds returnlist1 OK, but not returnlist2.

I'm using python scripts in Zope if that may also be causing a problem.

Thanks very much

Di

--
Dianne van Dulken
(put out the cats to talk to me)

See the cutest puppy in the known universe (as voted by Rick and i)
http://www.*-*-*.com/



Mon, 18 Oct 2004 09:52:01 GMT  
 NEWBIE QUESTIION: Comparing Lists in Python
The most general way would be to just copy the list instead of using a dict
and remove the elements:

Quote:
>>> diff = list1[:]
>>> for item in diff:

...  if item in list2:
...   diff.remove(item)
...
Quote:
>>> diff

[1, 2, 3]

List comprehensions would be my favorite (although many will disagree):

Quote:
>>> list1 = [1,2,3,4,5]
>>> list2 = [4,5,6,7,8]
>>> returnList1 = [val for val in list1 if val not in list2]
>>> returnList2 = [val for val in list2 if val not in list1]
>>> returnList1
[1, 2, 3]
>>> returnList2

[6, 7, 8]

Filter will also work:

Quote:
>>> filter(lambda a,list2=list2:not a in list2, list1)

[1, 2, 3]

--
-



Quote:
> Hi everyone,

> I haven't been using python for very long, so I hope you will forgive me
if
> I am asking a stupid question, or doing this in a stupid way.

> I am trying to compare two lists in Python, which have been passed into
it.
> I want to return the difference between the two lists as again two lists

> eg:

> list 1 = item1, item5, item8, item10
> list2 = item 3, item4, item5

> I want to return

> returnlist1 = item1, item8, item10
> returnlist2 = item3, item4

> I haven't been working in python very long, but from what I have read, the
> best way is to convert these to a dictionary, and compare there.  This is
> what I have done so far.

> list1 = list1.split(',')
> list2 = list2.split(',')

> returnlist1 = []
> returnlist2 = []

> d = {}
> for item in list1:
>   if item not in list2:
>    d[item] = item
> if d:
>  returnlist1 = d.values()

> d2 = {}
> for item in list2:
>  if item not in list1:
>     return item
>     d2[item] = item
> if d2:
>  returnlist2 = d2.values()

> return [returnlist1,returnlist2]

> I found if I didn't include the split, but just passed in a list like
> "list1=[item1, item3, item] it would work only on 1 character at a time
and
> not recognise it was a split.

> Please let me know if I am doing this in a very stupid way.  The problem
is
> that it isn't always recognising the lists, or comparing correctly.  It
> finds returnlist1 OK, but not returnlist2.

> I'm using python scripts in Zope if that may also be causing a problem.

> Thanks very much

> Di

> --
> Dianne van Dulken
> (put out the cats to talk to me)

> See the cutest puppy in the known universe (as voted by Rick and i)
> http://www.dogmac.com/bartholomew



Mon, 18 Oct 2004 10:32:27 GMT  
 NEWBIE QUESTIION: Comparing Lists in Python
[Dianne van Dulken]

Quote:
> I want to return the difference between the two lists as again two lists

> eg:

> list 1 = item1, item5, item8, item10
> list2 = item 3, item4, item5

> I want to return

> returnlist1 = item1, item8, item10
> returnlist2 = item3, item4

You can do this easily with list comprehension.  Here's a demo...

#!/usr/bin/env python

def not_in_list(l1, l2):
  """
  Return a list of elements from l1 that are not in l2.
  """
  return [x for x in l1 if x not in l2]

l1 = [1,5,8,10]
l2 = [3,4,5]

print not_in_list(l1, l2)
print not_in_list(l2, l1)

// mark



Mon, 18 Oct 2004 10:23:23 GMT  
 NEWBIE QUESTIION: Comparing Lists in Python

Quote:

> The most general way would be to just copy the list instead of using a
> dict and remove the elements:

>>>> diff = list1[:]
>>>> for item in diff:
> ...  if item in list2:
> ...   diff.remove(item)
> ...
>>>> diff
> [1, 2, 3]

If the length of both lists are proportional to some number N, this
algorithm will take a time of O(N*N*N) -- *cubic*.  One factor of N
comes from the for statement, one from operator 'in' when applied to
a list on the RHS, and one from the remove operation.

Quote:
> List comprehensions would be my favorite (although many will disagree):

>>>> list1 = [1,2,3,4,5]
>>>> list2 = [4,5,6,7,8]
>>>> returnList1 = [val for val in list1 if val not in list2]
>>>> returnList2 = [val for val in list2 if val not in list1]

This is O(N*N), since you don't have the N factor from the remove.

The way to get O(N) behavior is to construct auxiliary dictionaries:

dict1 = dict(zip(list1,list1))
returnList2 = [x for x in list2 if x not in dict1]

"x not in C" takes O(N) time if container C is a sequence with N
elements, but (roughly) O(1) time if C is a dictionary.  Building a
dictionary of N items is roughly O(N).

Premature optimization is the root of all evil (Knuth) -- but
that basically means MICRO-optimization, trying to shave 20% or
80% off some algorithm for some fixed big-O behavior.  Getting
the right big-O behavior is generally not a wasted effort, unless
you can guarantee that all containers around will be quite small.

Python generally makes getting sane big-O behavior quite easy,
thanks to the superb performance of dictionaries above all (and,
in other contexts, the great performance of list's sort method,
and a few other typical hotspots).  You do have to know the big
tricks and pitfalls, such as "x in sequence", somelist.remove,
building strings out of pieces with + or += -- these are the
big three typical timesinks of hastily written Python code.

Alex



Mon, 18 Oct 2004 17:28:58 GMT  
 NEWBIE QUESTIION: Comparing Lists in Python
[Alex Martelli]

Quote:
> The way to get O(N) behavior is to construct auxiliary dictionaries:

> dict1 = dict(zip(list1,list1))
> returnList2 = [x for x in list2 if x not in dict1]

> "x not in C" takes O(N) time if container C is a sequence with N
> elements, but (roughly) O(1) time if C is a dictionary.  Building a
> dictionary of N items is roughly O(N).

Alex, thanks for the zip tip.  I don't usually reach for that particular
hammer--this usage broadens my horizons.

I whipped together the program below to time the difference between a raw
compare and a compare with an auxiliary dictionary (in the program I
contrast these two approaches as raw and zip).  It seems that with a small
list size (10-100), the performance boost of the zip is negligible?

Here are some results:

Usage: time_not_in.py count share.

count is the size of each list.
share is the number of elements they share.

$ time_not_in.py 100 10
raw: 0.004 seconds.
zip: 0.001 seconds.

$ time_not_in.py 1000 100
raw: 0.394 seconds.
zip: 0.010 seconds.

$ time_not_in.py 10000 1000
raw: 44.949 seconds.
zip: 0.176 seconds.

Cheers,

// mark

#! /usr/bin/env python

import time

def not_in_list_zip(l1, l2):
  """
  Return a list of elements from l1 that are not in l2.
  """
  d = dict(zip(l2, l2))
  return [x for x in l1 if x not in d]

def not_in_list_raw(l1, l2):
  """
  Return a list of elements from l1 that are not in l2.
  """
  return [x for x in l1 if x not in l2]

def time_raw(l1, l2):
    f = not_in_list_raw
    return time_it(f, l1, l2)

def time_zip(l1, l2):
    f = not_in_list_zip
    return time_it(f, l1, l2)

def time_it(f, l1, l2):
    start = time.time()
    d1 = f(l1, l2)
    d2 = f(l2, l1)
    end = time.time()
    return d1, d2, end - start

def show_time(s, t):
    print "%s: %1.3f seconds." % (s, t)

def main():
    import os, sys
    prog = os.path.basename(sys.argv[0])
    usage = "Usage: %s count share." % prog
    default_count = 10
    default_share = 2
    try:
        count = int(sys.argv[1])
    except IndexError:
        count = default_count
    try:
        share = int(sys.argv[2])
    except IndexError:
        share = default_share
    l1 = range(0, count)
    l2 = range(count - share, 2 * count - share)
    raw_d1, raw_d2, raw_t = time_raw(l1, l2)
    zip_d1, zip_d2, zip_t = time_zip(l1, l2)
    show_time("raw", raw_t)
    show_time("zip", zip_t)

if __name__ == "__main__":
    main()



Mon, 18 Oct 2004 21:45:38 GMT  
 NEWBIE QUESTIION: Comparing Lists in Python

Quote:

> [Alex Martelli]
>> The way to get O(N) behavior is to construct auxiliary dictionaries:

>> dict1 = dict(zip(list1,list1))
>> returnList2 = [x for x in list2 if x not in dict1]

>> "x not in C" takes O(N) time if container C is a sequence with N
>> elements, but (roughly) O(1) time if C is a dictionary.  Building a
>> dictionary of N items is roughly O(N).

> Alex, thanks for the zip tip.  I don't usually reach for that particular
> hammer--this usage broadens my horizons.

You're welcome!

Quote:
> I whipped together the program below to time the difference between a raw
> compare and a compare with an auxiliary dictionary (in the program I
> contrast these two approaches as raw and zip).  It seems that with a small
> list size (10-100), the performance boost of the zip is negligible?

*yes*!  The performance boost is actually with the *dict* -- the zip is
just part of the O(N) price you _pay_ to get the big-O-boost.  It being
a big-O-boost, it starts showing up for "large enough" lists -- and only
measurement can practically determine what's "large enough" in one
particular case.

Quote:
> $ time_not_in.py 100 10
> raw: 0.004 seconds.
> zip: 0.001 seconds.

looks like 300% better in this case -- not too bad after all, although
describing it as "negligible" may generally be correct.  Still, if I
could speed up every program by a "negligible" four times I might not
mind too much:-).

Quote:
> $ time_not_in.py 1000 100
> raw: 0.394 seconds.
> zip: 0.010 seconds.

Here, the speedup of about 40 times (4000% or so) I would not describe
as "negligible" any more.  Matter of taste, I guess.

Quote:
> $ time_not_in.py 10000 1000
> raw: 44.949 seconds.
> zip: 0.176 seconds.

I'm glad we seem to agree that a speedup by a few hundred times
(tens of thousands percent) is NOT negligible:-).

Alex



Mon, 18 Oct 2004 23:26:23 GMT  
 NEWBIE QUESTIION: Comparing Lists in Python
In determining to compare two lists (for list_minus, list_intersect, or whatever), the question arise whether to look up the shorter list in a dictionary constructed from the long list or to look up the longer list in a dictionary constructed from the shorter list.

As the list gets longer (hundreds of thousands or millions), the latter starts becoming much faster, so a general utility should construct the dictionary from the shorter list and look up the longer list in it. I assume that the time to construct a dictionary is non-linear.



Tue, 19 Oct 2004 04:02:35 GMT  
 
 [ 8 post ] 

 Relevant Pages 

1. newbie: comparing mutually recursive lists

2. Questiion on list

3. Newbie: int list list

4. Newbie Question, list = list + x VS append (x)

5. newbie: forth compared to C

6. Newbie (Sorry, how does Fortran 90 compare to C speed wise)

7. newbie: comparing

8. Newbie: problem comparing values

9. C4: Questiion AboutTemplates

10. Questiion on GPIB Triggering

11. How to compare infinite lists?

12. programming questiion

 

 
Powered by phpBB® Forum Software