Integer and String Dictionary Keys for Fast Access 
Author Message
 Integer and String Dictionary Keys for Fast Access

Hi,

I'd like each element of a dictionary to have two keys, one a string and the
other an integer. Each of course would be unique relative to the other keys
and associated with a single dictionary value.

The advantage is that key lookups by integer are no doubt much faster than
by strings. At least that's what I find with GAUSS, a matrix programming
language. GAUSS let's users refer to a dictionary value by its string key or
integer key. Referring to the value by its integer key significantly speeds
execution in optimization routines.

Any suggestions?

Thanks in advance.
Lance



Tue, 22 Mar 2005 07:18:44 GMT  
 Integer and String Dictionary Keys for Fast Access

Quote:
> Hi,

> I'd like each element of a dictionary to have two keys, one a string and
> the other an integer. Each of course would be unique relative to the other
> keys and associated with a single dictionary value.

> The advantage is that key lookups by integer are no doubt much faster than
> by strings. At least that's what I find with GAUSS, a matrix programming
> language. GAUSS let's users refer to a dictionary value by its string key
> or integer key. Referring to the value by its integer key significantly
> speeds execution in optimization routines.

> Any suggestions?

> Thanks in advance.
> Lance

test it out in python, you may be surprised.  I don't think the type really
matters for dictionary look ups.


Tue, 22 Mar 2005 07:38:42 GMT  
 Integer and String Dictionary Keys for Fast Access

Quote:
> > The advantage is that key lookups by integer are no doubt much faster than
> > by strings.

> test it out in python, you may be surprised.  I don't think the type really
> matters for dictionary look ups.

Timings:
0.286319971085  # d[1]
0.23636496067   # d[1000]
0.231885075569  # d['a']
0.233772993088  # d['a' * 10000]
0.216398954391  # l[1]
0.232298016548  # l[1000]

so not only is it about the same speed to get a string or int key from a
dict, it's about the same speed to get an int index from a list.  This is
independant of the length of the key.

of course, within functions, locals are accessed by simple indexing of a C
array with C ints which should be a bit faster still.

It's possible that in the future this could be extended to methods of
new-style classes which use __slots__ for instance variables.  When no
assignment to the first local is seen, then LOAD_FAST self / LOAD_ATTR a
could be optimized into LOAD_SLOT a.  Besides needing whole-function
analysis (but isn't this already provided to know eg that exec is not
called?) the problem here is that the slot number for the attribute 'a'
depends on the class, but you are free to write
    class C:
        pass
    def f(self, ...): pass
    C.f = f
There are probably other wrinkles that I've not considered that make this
harder than it sounds...

Jeff

import time

def test(d, k):
    for i in range(100000):
        d[k]

keys = (1, 1000, 'a', 'a' * 10000)
d = dict([(k, None) for k in keys])

for key in keys:
    t0 = time.time()
    test(d, key)
    t1 = time.time()

    print t1-t0

l = [None] * 1001

for key in (1, 1000):
    t0 = time.time()
    test(l, key)
    t1 = time.time()

    print t1-t0



Tue, 22 Mar 2005 08:12:38 GMT  
 Integer and String Dictionary Keys for Fast Access
Yes... I suppose it doesn't matter if everything is a pointer.

Lance




Quote:
> Hi,

> I'd like each element of a dictionary to have two keys, one a string and
> the other an integer. Each of course would be unique relative to the other
> keys and associated with a single dictionary value.

> The advantage is that key lookups by integer are no doubt much faster than
> by strings. At least that's what I find with GAUSS, a matrix programming
> language. GAUSS let's users refer to a dictionary value by its string key
> or integer key. Referring to the value by its integer key significantly
> speeds execution in optimization routines.

> Any suggestions?

> Thanks in advance.
> Lance

test it out in python, you may be surprised.  I don't think the type really
matters for dictionary look ups.


Tue, 22 Mar 2005 09:29:14 GMT  
 Integer and String Dictionary Keys for Fast Access
There aren't any pointers in python (that I know of), just references.

Anyway, I would guess (I haven't looked at the source) the speed of the
dictionary lookup has nothing to do with the fact that variables are
references vs. values; it has more to do with the hashing method.

Maybe GUASS has no hashing at all and just does a string compare to every
key until it finds a match (or maybe something not quite that silly, but
still not particularly good).

My guess is that Python's implementation uses the __hash__() value of
objects (this being one of the fundamental reasons why __hash__() is a
member of the base object, just as Java has the hashCode() method in it's
Object and C++... er... well... uh, forget that analogy), which results in a
quick lookup, unless a particular object overrides and does a poor job of
implementing this method; in particular, since strings are immutable, the
hash only needs to be computed once (not every time it is used), so we would
expect lookups of strings in a hash table to be pretty close to as fast as
lookup of ints.

- m


Quote:
> Yes... I suppose it doesn't matter if everything is a pointer.
> test it out in python, you may be surprised.  I don't think the type
really
> matters for dictionary look ups.



Tue, 22 Mar 2005 14:23:54 GMT  
 Integer and String Dictionary Keys for Fast Access

Quote:

> so not only is it about the same speed to get a string or int key
> from a dict, it's about the same speed to get an int index from a
> list.  This is independant of the length of the key.

Hashings ints is actually very slightly faster than strings, and
string hashing does take linear time relative to size, but I can't
prove this imperically nor does it matter.  I just ripped out the
string hashing code and ran it 100k times on 10k strings and it took
4.6 seconds.  So since the growth (both time and space) of the string
hash is linear, that means it takes about 4.6e-12 seconds per
character.  I doubt you'll ever notice that cost in practice.  Even
the simplest operations like integer addition end up being nearly as
slow as string hashing of practical values.

Also in response to another post in this thread (sorry, my newsreader
is down so I'm having to respond from the archive), cPython does not
presently cache the hash values, but again this doesn't really matter.

Quote:
>>> import time

>>> def ttest1(d, i):

...   t = time.time()
...   for i in xrange(100000): d[i]
...   print time.time()-t
Quote:

>>> def ttest2():

...   t = time.time()
...   for i in xrange(100000): 1+1
...   print time.time()-t
Quote:

>>> d = {}
>>> for w in open('/usr/dict/words').readlines():
...   d[w] = None
>>> len(d)
25143

>>> ttest1(d, 'retrieve')
0.354193091393

>>> ttest2()

0.298590898514

--

"[Windows NT is] not about 'Where do you want to go today?'; it's more like
'Where am I allowed to go today?'" -- Mike Mitchell, NT Systems Administrator



Tue, 22 Mar 2005 22:12:32 GMT  
 Integer and String Dictionary Keys for Fast Access
in fact python has automatic interned/hashed strings (look also for 'intern'
in
the help files). so they are same speed as integers for dict lookup as long
as you do not compose the strings freshly.

2 keys on one object?

in static or simple cases simply 2 dicts with (refs on the) same objects do
this. weakref.WeakValueDictionary is helpful some times.

For symmetric kill automation you could start like this:
-----
from weakref import ref

_JointDict__common={}
class JointDict(dict):
    def __setitem__(self, key, obj):
        __common[obj]=obj
        dict.__setitem__(self,key, weakref.ref(obj))
    def __getitem__(self, key):
        o=dict.__getitem__(self,key)
        return __common[o()]
    def killeverywhere(obj):
        del __common[obj]
    killeverywhere=staticmethod(killeverywhere)
    def __contains__(self, key):
        return dict.__getitem__(self,key)() in __common
    #def __iter__, get, ...

class X: a=1

d1=JointDict()
d2=JointDict()
d1[4]=d2["robert"]=x=X()
d1.killeverywhere(x)
print "robert" in d2    #->0

-----

Robert



Fri, 25 Mar 2005 03:10:28 GMT  
 
 [ 7 post ] 

 Relevant Pages 

1. Get dictionary-keys used to format a string

2. string formatting with missing dictionary key

3. fast way of turning a list of integers<256 into a string

4. Extracting list of keys from 2-key dictionary

5. Accessing Dictionary values from within the Dictionary

6. Dictionary to string and back to dictionary??

7. Fast (and I mean fast) DB access from lisp

8. Accessing dicts key-by-key

9. convert integer string to date string

10. Fast conversion of hex string to character string

11. Searching dictionaries for large integer values

12. Fast Dictionary Searching

 

 
Powered by phpBB® Forum Software