NEWBIE QUESTIION: Comparing Lists in Python
Author |
Message |
Dianne van Dulke #1 / 8
|
 NEWBIE QUESTIION: Comparing Lists in Python
Hi everyone, I haven't been using python for very long, so I hope you will forgive me if I am asking a stupid question, or doing this in a stupid way. I am trying to compare two lists in Python, which have been passed into it. I want to return the difference between the two lists as again two lists eg: list 1 = item1, item5, item8, item10 list2 = item 3, item4, item5 I want to return returnlist1 = item1, item8, item10 returnlist2 = item3, item4 I haven't been working in python very long, but from what I have read, the best way is to convert these to a dictionary, and compare there. This is what I have done so far. list1 = list1.split(',') list2 = list2.split(',') returnlist1 = [] returnlist2 = [] d = {} for item in list1: if item not in list2: d[item] = item if d: returnlist1 = d.values() d2 = {} for item in list2: if item not in list1: return item d2[item] = item if d2: returnlist2 = d2.values() return [returnlist1,returnlist2] I found if I didn't include the split, but just passed in a list like "list1=[item1, item3, item] it would work only on 1 character at a time and not recognise it was a split. Please let me know if I am doing this in a very stupid way. The problem is that it isn't always recognising the lists, or comparing correctly. It finds returnlist1 OK, but not returnlist2. I'm using python scripts in Zope if that may also be causing a problem. Thanks very much Di -- Dianne van Dulken (put out the cats to talk to me) See the cutest puppy in the known universe (as voted by Rick and i) http://www.*-*-*.com/
|
Mon, 18 Oct 2004 09:52:01 GMT |
|
 |
logisti #2 / 8
|
 NEWBIE QUESTIION: Comparing Lists in Python
The most general way would be to just copy the list instead of using a dict and remove the elements: Quote: >>> diff = list1[:] >>> for item in diff:
... if item in list2: ... diff.remove(item) ... Quote: >>> diff
[1, 2, 3] List comprehensions would be my favorite (although many will disagree): Quote: >>> list1 = [1,2,3,4,5] >>> list2 = [4,5,6,7,8] >>> returnList1 = [val for val in list1 if val not in list2] >>> returnList2 = [val for val in list2 if val not in list1] >>> returnList1 [1, 2, 3] >>> returnList2
[6, 7, 8] Filter will also work: Quote: >>> filter(lambda a,list2=list2:not a in list2, list1)
[1, 2, 3] -- -
Quote: > Hi everyone, > I haven't been using python for very long, so I hope you will forgive me if > I am asking a stupid question, or doing this in a stupid way. > I am trying to compare two lists in Python, which have been passed into it. > I want to return the difference between the two lists as again two lists > eg: > list 1 = item1, item5, item8, item10 > list2 = item 3, item4, item5 > I want to return > returnlist1 = item1, item8, item10 > returnlist2 = item3, item4 > I haven't been working in python very long, but from what I have read, the > best way is to convert these to a dictionary, and compare there. This is > what I have done so far. > list1 = list1.split(',') > list2 = list2.split(',') > returnlist1 = [] > returnlist2 = [] > d = {} > for item in list1: > if item not in list2: > d[item] = item > if d: > returnlist1 = d.values() > d2 = {} > for item in list2: > if item not in list1: > return item > d2[item] = item > if d2: > returnlist2 = d2.values() > return [returnlist1,returnlist2] > I found if I didn't include the split, but just passed in a list like > "list1=[item1, item3, item] it would work only on 1 character at a time and > not recognise it was a split. > Please let me know if I am doing this in a very stupid way. The problem is > that it isn't always recognising the lists, or comparing correctly. It > finds returnlist1 OK, but not returnlist2. > I'm using python scripts in Zope if that may also be causing a problem. > Thanks very much > Di > -- > Dianne van Dulken > (put out the cats to talk to me) > See the cutest puppy in the known universe (as voted by Rick and i) > http://www.dogmac.com/bartholomew
|
Mon, 18 Oct 2004 10:32:27 GMT |
|
 |
Mark McEaher #3 / 8
|
 NEWBIE QUESTIION: Comparing Lists in Python
[Dianne van Dulken] Quote: > I want to return the difference between the two lists as again two lists > eg: > list 1 = item1, item5, item8, item10 > list2 = item 3, item4, item5 > I want to return > returnlist1 = item1, item8, item10 > returnlist2 = item3, item4
You can do this easily with list comprehension. Here's a demo... #!/usr/bin/env python def not_in_list(l1, l2): """ Return a list of elements from l1 that are not in l2. """ return [x for x in l1 if x not in l2] l1 = [1,5,8,10] l2 = [3,4,5] print not_in_list(l1, l2) print not_in_list(l2, l1) // mark
|
Mon, 18 Oct 2004 10:23:23 GMT |
|
 |
Alex Martell #4 / 8
|
 NEWBIE QUESTIION: Comparing Lists in Python
Quote:
> The most general way would be to just copy the list instead of using a > dict and remove the elements: >>>> diff = list1[:] >>>> for item in diff: > ... if item in list2: > ... diff.remove(item) > ... >>>> diff > [1, 2, 3]
If the length of both lists are proportional to some number N, this algorithm will take a time of O(N*N*N) -- *cubic*. One factor of N comes from the for statement, one from operator 'in' when applied to a list on the RHS, and one from the remove operation. Quote: > List comprehensions would be my favorite (although many will disagree): >>>> list1 = [1,2,3,4,5] >>>> list2 = [4,5,6,7,8] >>>> returnList1 = [val for val in list1 if val not in list2] >>>> returnList2 = [val for val in list2 if val not in list1]
This is O(N*N), since you don't have the N factor from the remove. The way to get O(N) behavior is to construct auxiliary dictionaries: dict1 = dict(zip(list1,list1)) returnList2 = [x for x in list2 if x not in dict1] "x not in C" takes O(N) time if container C is a sequence with N elements, but (roughly) O(1) time if C is a dictionary. Building a dictionary of N items is roughly O(N). Premature optimization is the root of all evil (Knuth) -- but that basically means MICRO-optimization, trying to shave 20% or 80% off some algorithm for some fixed big-O behavior. Getting the right big-O behavior is generally not a wasted effort, unless you can guarantee that all containers around will be quite small. Python generally makes getting sane big-O behavior quite easy, thanks to the superb performance of dictionaries above all (and, in other contexts, the great performance of list's sort method, and a few other typical hotspots). You do have to know the big tricks and pitfalls, such as "x in sequence", somelist.remove, building strings out of pieces with + or += -- these are the big three typical timesinks of hastily written Python code. Alex
|
Mon, 18 Oct 2004 17:28:58 GMT |
|
 |
Mark McEaher #5 / 8
|
 NEWBIE QUESTIION: Comparing Lists in Python
[Alex Martelli] Quote: > The way to get O(N) behavior is to construct auxiliary dictionaries: > dict1 = dict(zip(list1,list1)) > returnList2 = [x for x in list2 if x not in dict1] > "x not in C" takes O(N) time if container C is a sequence with N > elements, but (roughly) O(1) time if C is a dictionary. Building a > dictionary of N items is roughly O(N).
Alex, thanks for the zip tip. I don't usually reach for that particular hammer--this usage broadens my horizons. I whipped together the program below to time the difference between a raw compare and a compare with an auxiliary dictionary (in the program I contrast these two approaches as raw and zip). It seems that with a small list size (10-100), the performance boost of the zip is negligible? Here are some results: Usage: time_not_in.py count share. count is the size of each list. share is the number of elements they share. $ time_not_in.py 100 10 raw: 0.004 seconds. zip: 0.001 seconds. $ time_not_in.py 1000 100 raw: 0.394 seconds. zip: 0.010 seconds. $ time_not_in.py 10000 1000 raw: 44.949 seconds. zip: 0.176 seconds. Cheers, // mark #! /usr/bin/env python import time def not_in_list_zip(l1, l2): """ Return a list of elements from l1 that are not in l2. """ d = dict(zip(l2, l2)) return [x for x in l1 if x not in d] def not_in_list_raw(l1, l2): """ Return a list of elements from l1 that are not in l2. """ return [x for x in l1 if x not in l2] def time_raw(l1, l2): f = not_in_list_raw return time_it(f, l1, l2) def time_zip(l1, l2): f = not_in_list_zip return time_it(f, l1, l2) def time_it(f, l1, l2): start = time.time() d1 = f(l1, l2) d2 = f(l2, l1) end = time.time() return d1, d2, end - start def show_time(s, t): print "%s: %1.3f seconds." % (s, t) def main(): import os, sys prog = os.path.basename(sys.argv[0]) usage = "Usage: %s count share." % prog default_count = 10 default_share = 2 try: count = int(sys.argv[1]) except IndexError: count = default_count try: share = int(sys.argv[2]) except IndexError: share = default_share l1 = range(0, count) l2 = range(count - share, 2 * count - share) raw_d1, raw_d2, raw_t = time_raw(l1, l2) zip_d1, zip_d2, zip_t = time_zip(l1, l2) show_time("raw", raw_t) show_time("zip", zip_t) if __name__ == "__main__": main()
|
Mon, 18 Oct 2004 21:45:38 GMT |
|
 |
Alex Martell #6 / 8
|
 NEWBIE QUESTIION: Comparing Lists in Python
Quote:
> [Alex Martelli] >> The way to get O(N) behavior is to construct auxiliary dictionaries: >> dict1 = dict(zip(list1,list1)) >> returnList2 = [x for x in list2 if x not in dict1] >> "x not in C" takes O(N) time if container C is a sequence with N >> elements, but (roughly) O(1) time if C is a dictionary. Building a >> dictionary of N items is roughly O(N). > Alex, thanks for the zip tip. I don't usually reach for that particular > hammer--this usage broadens my horizons.
You're welcome! Quote: > I whipped together the program below to time the difference between a raw > compare and a compare with an auxiliary dictionary (in the program I > contrast these two approaches as raw and zip). It seems that with a small > list size (10-100), the performance boost of the zip is negligible?
*yes*! The performance boost is actually with the *dict* -- the zip is just part of the O(N) price you _pay_ to get the big-O-boost. It being a big-O-boost, it starts showing up for "large enough" lists -- and only measurement can practically determine what's "large enough" in one particular case. Quote: > $ time_not_in.py 100 10 > raw: 0.004 seconds. > zip: 0.001 seconds.
looks like 300% better in this case -- not too bad after all, although describing it as "negligible" may generally be correct. Still, if I could speed up every program by a "negligible" four times I might not mind too much:-). Quote: > $ time_not_in.py 1000 100 > raw: 0.394 seconds. > zip: 0.010 seconds.
Here, the speedup of about 40 times (4000% or so) I would not describe as "negligible" any more. Matter of taste, I guess. Quote: > $ time_not_in.py 10000 1000 > raw: 44.949 seconds. > zip: 0.176 seconds.
I'm glad we seem to agree that a speedup by a few hundred times (tens of thousands percent) is NOT negligible:-). Alex
|
Mon, 18 Oct 2004 23:26:23 GMT |
|
 |
Mike Brenne #7 / 8
|
 NEWBIE QUESTIION: Comparing Lists in Python
In determining to compare two lists (for list_minus, list_intersect, or whatever), the question arise whether to look up the shorter list in a dictionary constructed from the long list or to look up the longer list in a dictionary constructed from the shorter list. As the list gets longer (hundreds of thousands or millions), the latter starts becoming much faster, so a general utility should construct the dictionary from the shorter list and look up the longer list in it. I assume that the time to construct a dictionary is non-linear.
|
Tue, 19 Oct 2004 04:02:35 GMT |
|
|
|