Author |
Message |
Tobia #1 / 13
|
 Binary tree optimisation
Quote:
> I have written some code to produce a balanced, sorted, binary tree from a > list of strings, and associated data objects, but it is too slow. The string > is used as the sort key, and a pointer to the data object is kept. I need to > add approximately 2000 items into the tree (at least, many thousands in some > cases) which will be supplied in what is essentially a random order. > My problem is that it can take several seconds to run on a PII, and I would > rather it ran faster. > My code is below. Any ideas on optimisation? > Thanks, > Anthony
> http://www.*-*-*.com/
A (Binary)Tree is a structure for searching in dynamic data. You can't build up a tree faster than n*log n where n is the number of objects you insert. -> optimisations on the source can only make you faster by a constant factor Do you have any additional information about your data? Why do you need a tree? Does your data-structure fit into your RAM or do you need to store parts of it on harddisk? Is it possible to generate the tree once and then store it on harddisk? Why do you have to build up the tree so often? Bye, Tobias
|
Sun, 22 Jul 2001 03:00:00 GMT |
|
 |
Ben Pfaf #2 / 13
|
 Binary tree optimisation
I have written some code to produce a balanced, sorted, binary tree from a list of strings, and associated data objects, but it is too slow. The string is used as the sort key, and a pointer to the data object is kept. I need to add approximately 2000 items into the tree (at least, many thousands in some cases) which will be supplied in what is essentially a random order. My problem is that it can take several seconds to run on a PII, and I would rather it ran faster. My code is below. Any ideas on optimisation? Generally if your code runs too slow, it means that you aren't using the proper algorithm. Although I haven't looked at your code in particular, I've written balanced (AVL) tree insertion routines in the past, and you should be getting much better performance than that. Look over your code carefully for bugs, and make sure that you're using a O(log n) algorithm. -- (supporter of the campaign for grumpiness where grumpiness is due in c.l.c) Please: do not email me copies of your posts to comp.lang.c do not ask me C questions via email; post them instead
|
Sun, 22 Jul 2001 03:00:00 GMT |
|
 |
Stephan Schu #3 / 13
|
 Binary tree optimisation
Quote:
>Someone did say that there was source code for binary tree manipulation >available for download. Does anyone know where? >Thanks, >Anthony
My CLIB (at http://wwwjessen.informatik.tu-muenchen.de/~schulz/WORK/eprover.html) has fully debugged AVL trees over strings. However, I found splay trees to be much leaner and faster (I have not yet replaced the string trees because they are used only in non-critical cases by my programs). The latest version of CLIB also has examples of splay trees (over other key types). You can also look at the original authors web page at http://www.cs.cmu.edu/afs/cs.cmu.edu/user/sleator/www/home.html - he has some C code for download. You should indeed be able to insert a couple of 1000 strings into the tree in sub-seconds on current hardware - are you certain that your bottleneck is there? Stephan -------------------------- It can be done! ---------------------------------
----------------------------------------------------------------------------
|
Mon, 23 Jul 2001 03:00:00 GMT |
|
 |
Will Ro #4 / 13
|
 Binary tree optimisation
[...] : OK, the application for this is building a public symbol list for ALINK, my : free linker (see web page). Libraries often have thousands of public symbols : (VC++5 libc has 1790), and complex applications can define many more, and at : the moment ALINK does a (slow) linear search down an unsorted list. In order : to accelerate the searching, I wanted to use a binary tree. This does indeed : accelerate the searching, but now the bottleneck is the building of the list, : especially when loading a library, so I wanted to know if anyone knew of a : faster algorithm than the one I used, or whether there was a better : implementation of this method. Well, the structure needs to be fast looking up records, and adding records, and doesn't need to delete records (?). A linear list with binary search will give you optimum look-up; what about using a merge sort when you add new elements? Not optimal, but close. Your current code seems to make a lot of calls to malloc() and free(), which aren't the fastest things around. Will
|
Mon, 23 Jul 2001 03:00:00 GMT |
|
 |
Bill Silverstei #5 / 13
|
 Binary tree optimisation
Quote:
>[...] >: OK, the application for this is building a public symbol list for ALINK, my >: free linker (see web page). Libraries often have thousands of public symbols >: (VC++5 libc has 1790), and complex applications can define many more, and at >: the moment ALINK does a (slow) linear search down an unsorted list. In order >: to accelerate the searching, I wanted to use a binary tree. This does indeed >: accelerate the searching, but now the bottleneck is the building of the list, >: especially when loading a library, so I wanted to know if anyone knew of a >: faster algorithm than the one I used, or whether there was a better >: implementation of this method. >Well, the structure needs to be fast looking up records, and adding records, >and doesn't need to delete records (?). A linear list with binary search >will give you optimum look-up; what about using a merge sort when you >add new elements? Not optimal, but close. Your current code seems to >make a lot of calls to malloc() and free(), which aren't the fastest >things around. >Will
I had written an compressor that used avl trees. When I allocated the memory for the word to add and then free if if found, it appeared hung. I changed it to allocate the memory only if it was not in the tree already and it screamed. Saying that malloc and free are not the fastest things around is like saying that hemlock tea is not the healthiest thing to drink for a cold.
|
Mon, 23 Jul 2001 03:00:00 GMT |
|
 |
Will Ro #6 / 13
|
 Binary tree optimisation
: >[...] : >: OK, the application for this is building a public symbol list for ALINK, my : >: free linker (see web page). Libraries often have thousands of public symbols : >: (VC++5 libc has 1790), and complex applications can define many more, and at : >: the moment ALINK does a (slow) linear search down an unsorted list. In order : >: to accelerate the searching, I wanted to use a binary tree. This does indeed : >: accelerate the searching, but now the bottleneck is the building of the list, : >: especially when loading a library, so I wanted to know if anyone knew of a : >: faster algorithm than the one I used, or whether there was a better : >: implementation of this method. : > : >Well, the structure needs to be fast looking up records, and adding records, : >and doesn't need to delete records (?). A linear list with binary search : >will give you optimum look-up; what about using a merge sort when you : >add new elements? Not optimal, but close. Your current code seems to : >make a lot of calls to malloc() and free(), which aren't the fastest : >things around. : I had written an compressor that used avl trees. When I allocated the : memory for the word to add and then free if if found, it appeared hung. : I changed it to allocate the memory only if it was not in the tree : already and it screamed. : Saying that malloc and free are not the fastest things around is like : saying that hemlock tea is not the healthiest thing to drink for a : cold. I fed the code through a profiler, and with 100 elements got: free() 1729 ms 3006 calls malloc() 1253 ms 2204 calls add...() 47 ms 3524 calls rebal...() 31 ms 9274 calls so it's pretty clear where the time's going. The running time was about 3 seconds total; by contrast, sorting 10,000 elements by heapsort was 124 ms and by shellsort 88 ms, with the same hardware, os and compiler. Will
|
Mon, 23 Jul 2001 03:00:00 GMT |
|
 |
Markus Hamme #7 / 13
|
 Binary tree optimisation
If you don't need a sorted output use hashing! How many entries do you expect to be stored in memory? Make a reasonable assumption then calculate a table size. Use chaining for collision resolution. It depends on the distribution of strings, ie. many same keys, almost same keys ... what chaining method to choose. If you expect less than 10 equal hash values use sll otherwise balanced trees. The table size should be prime according to literature to obtain a good distribution of bucket values. A power of 2 is better for restricting the hash value to the table size (i % size) == (i & (size-1)) iff size = 2^j for some j. HTH Markus
|
Wed, 25 Jul 2001 03:00:00 GMT |
|
|
|