Binary tree optimisation 
Author Message
 Binary tree optimisation

Quote:

> I have written some code to produce a balanced, sorted, binary tree from a
> list of strings, and associated data objects, but it is too slow. The string
> is used as the sort key, and a pointer to the data object is kept. I need to
> add approximately 2000 items into the tree (at least, many thousands in some
> cases) which will be supplied in what is essentially a random order.

> My problem is that it can take several seconds to run on a PII, and I would
> rather it ran faster.

> My code is below. Any ideas on optimisation?

> Thanks,

> Anthony


> http://www.*-*-*.com/

A (Binary)Tree is a structure for searching in dynamic data.
You can't build up a tree faster than n*log n where n is the
number of objects you insert.

-> optimisations on the source can only make you faster by a constant factor

Do you have any additional information about your data?
Why do you need a tree?
Does your data-structure fit into your RAM or do you need
to store parts of it on harddisk?
Is it possible to generate the tree once and then store it on harddisk?
Why do you have to build up the tree so often?

Bye, Tobias



Sun, 22 Jul 2001 03:00:00 GMT  
 Binary tree optimisation

   I have written some code to produce a balanced, sorted, binary tree from a
   list of strings, and associated data objects, but it is too slow. The string
   is used as the sort key, and a pointer to the data object is kept. I need to
   add approximately 2000 items into the tree (at least, many thousands in some
   cases) which will be supplied in what is essentially a random order.

   My problem is that it can take several seconds to run on a PII, and I would
   rather it ran faster.

   My code is below. Any ideas on optimisation?

Generally if your code runs too slow, it means that you aren't using
the proper algorithm.  Although I haven't looked at your code in
particular, I've written balanced (AVL) tree insertion routines in the
past, and you should be getting much better performance than that.
Look over your code carefully for bugs, and make sure that you're
using a O(log n) algorithm.
--
(supporter of the campaign for grumpiness where grumpiness is due in c.l.c)

Please: do not email me copies of your posts to comp.lang.c
        do not ask me C questions via email; post them instead



Sun, 22 Jul 2001 03:00:00 GMT  
 Binary tree optimisation

Quote:



>Someone did say that there was source code for binary tree manipulation
>available for download. Does anyone know where?

>Thanks,

>Anthony

My CLIB (at
http://wwwjessen.informatik.tu-muenchen.de/~schulz/WORK/eprover.html)
has fully debugged AVL trees over strings. However, I found splay
trees to be much leaner and faster (I have not yet replaced the string
trees because they are used only in non-critical cases by my
programs). The latest version of CLIB also has examples of splay trees
(over other key types). You can also look at the original authors web
page at
http://www.cs.cmu.edu/afs/cs.cmu.edu/user/sleator/www/home.html - he
has some C code for download.

You should indeed be able to insert a couple of 1000 strings into the
tree in sub-seconds on current hardware - are you certain that your
bottleneck is there?

Stephan

-------------------------- It can be done! ---------------------------------

----------------------------------------------------------------------------



Mon, 23 Jul 2001 03:00:00 GMT  
 Binary tree optimisation
[...]
: OK, the application for this is building a public symbol list for ALINK, my
: free linker (see web page). Libraries often have thousands of public symbols
: (VC++5 libc has 1790), and complex applications can define many more, and at
: the moment ALINK does a (slow) linear search down an unsorted list. In order
: to accelerate the searching, I wanted to use a binary tree. This does indeed
: accelerate the searching, but now the bottleneck is the building of the list,
: especially when loading a library, so I wanted to know if anyone knew of a
: faster algorithm than the one I used, or whether there was a better
: implementation of this method.

Well, the structure needs to be fast looking up records, and adding records,
and doesn't need to delete records (?).   A linear list with binary search
will give you optimum look-up; what about using a merge sort when you
add new elements?  Not optimal, but close.  Your current code seems to
make a lot of calls to malloc() and free(), which aren't the fastest
things around.

Will



Mon, 23 Jul 2001 03:00:00 GMT  
 Binary tree optimisation

Quote:

>[...]
>: OK, the application for this is building a public symbol list for ALINK, my
>: free linker (see web page). Libraries often have thousands of public symbols
>: (VC++5 libc has 1790), and complex applications can define many more, and at
>: the moment ALINK does a (slow) linear search down an unsorted list. In order
>: to accelerate the searching, I wanted to use a binary tree. This does indeed
>: accelerate the searching, but now the bottleneck is the building of the list,
>: especially when loading a library, so I wanted to know if anyone knew of a
>: faster algorithm than the one I used, or whether there was a better
>: implementation of this method.

>Well, the structure needs to be fast looking up records, and adding records,
>and doesn't need to delete records (?).   A linear list with binary search
>will give you optimum look-up; what about using a merge sort when you
>add new elements?  Not optimal, but close.  Your current code seems to
>make a lot of calls to malloc() and free(), which aren't the fastest
>things around.

>Will


I had written an compressor that used avl trees. When I allocated the
memory for the word to add and then free if if found, it appeared hung.
I changed it to allocate the memory only if it was not in the tree
already and it screamed.

Saying that malloc and free are not the fastest things around is like
saying that  hemlock tea is not the healthiest thing to drink for a
cold.



Mon, 23 Jul 2001 03:00:00 GMT  
 Binary tree optimisation

: >[...]
: >: OK, the application for this is building a public symbol list for ALINK, my
: >: free linker (see web page). Libraries often have thousands of public symbols
: >: (VC++5 libc has 1790), and complex applications can define many more, and at
: >: the moment ALINK does a (slow) linear search down an unsorted list. In order
: >: to accelerate the searching, I wanted to use a binary tree. This does indeed
: >: accelerate the searching, but now the bottleneck is the building of the list,
: >: especially when loading a library, so I wanted to know if anyone knew of a
: >: faster algorithm than the one I used, or whether there was a better
: >: implementation of this method.
: >
: >Well, the structure needs to be fast looking up records, and adding records,
: >and doesn't need to delete records (?).   A linear list with binary search
: >will give you optimum look-up; what about using a merge sort when you
: >add new elements?  Not optimal, but close.  Your current code seems to
: >make a lot of calls to malloc() and free(), which aren't the fastest
: >things around.

: I had written an compressor that used avl trees. When I allocated the
: memory for the word to add and then free if if found, it appeared hung.
: I changed it to allocate the memory only if it was not in the tree
: already and it screamed.

: Saying that malloc and free are not the fastest things around is like
: saying that  hemlock tea is not the healthiest thing to drink for a
: cold.

I fed the code through a profiler, and with 100 elements got:

  free() 1729 ms  3006 calls
  malloc() 1253 ms 2204 calls
  add...()  47 ms 3524 calls
  rebal...() 31 ms 9274 calls

so it's pretty clear where the time's going.  The running time was
about 3 seconds total; by contrast, sorting 10,000 elements by heapsort
was 124 ms and by shellsort 88 ms, with the same hardware, os and
compiler.

Will



Mon, 23 Jul 2001 03:00:00 GMT  
 Binary tree optimisation
If you don't need a sorted output use hashing!

How many entries do you expect to be stored in memory?
Make a reasonable assumption then calculate a table size.
Use chaining for collision resolution. It depends on the
distribution of strings, ie. many same keys, almost same keys ...
what chaining method to choose. If you expect less than 10
equal hash values use sll otherwise balanced trees.

The table size should be prime according to literature
to obtain a good distribution of bucket values.
A power of 2 is better for restricting the hash value to the
table size (i % size) == (i & (size-1)) iff size = 2^j for some j.

HTH Markus



Wed, 25 Jul 2001 03:00:00 GMT  
 
 [ 13 post ] 

 Relevant Pages 

1. AVL tree (Binary Balanced tree) using C or C++

2. binary tree & symbol table modifications

3. Need Tutor for Binary Trees in C

4. Binary Search Trees

5. Binary Search Trees

6. binary tree - insert node problem

7. Binary Trees

8. Binary Tree Class?

9. Binary search tree

10. newbie's binary tree problem

11. rebalancing binary trees

12. How do I even out my binary tree ???

 

 
Powered by phpBB® Forum Software