SUMMARY: writing "not slow" smalltalk (long) 
Author Message
 SUMMARY: writing "not slow" smalltalk (long)

First, an update. Using the acquired wisdom of the net, we have retuned
our application with some pretty wonderful results. There are lots of
ways to show this, but based on my home machine (NeXT, 32Mb, ParcPlace
VW) we showed the following improvements (same test case shown):

Original June code:             42 seconds
1st speed fix:                  27 seconds
2nd speed fix :                 20 seconds
3rd speed fix :                 15 seconds

All this for 3-4 hours effort, most spent trying to figure out the
profiler and what it is was telling us. Same code on our Sparc20/50 (32
Mb, ParcPlace VW) runs in under 3.5 seconds with some copious output.
In particular, we are now showing that "TextCollector show:" is taking
something like 25% of the run time. That is, our code is running in
comparable time to how fast VW can pump out the results. We're pretty
happy with that!

How did we do that? The main net advice is the use of the Profiler from
the Advanced Programmers Kit (APOK). We hadn't used it much before but
will from now on. We found a number of "holes" in our code that we fixed.

The other oft offered advice is on "growing" collections. We found some
overhead in that as well and remedied it.

Below are excerpts (with their authors) of some of the advice I have
received. Some did not apply to our application but they are still
pearls to be remembered. Shouldn't this kind of stuff be in the FAQ?


Problem: Collection>>grow (has to copy the whole collection with each
normally this can only be detected with the profiler as this is
transparent to average the programmer.
Try higher initial sizes, use streams on fixed sized Collections and
return contents. Try to avoid #collect: etc. on large collections.


Problem: enumerations (collect, select, reject, detect)
Using enumerations, especially interleaved ones, is one of the nicest
things, but tends to get exponential in computational costs.
Check out what you are looking for in these collections and avoid
interleaved enumerations. Try using dictionaries or sets for repeated


Problem: Number coercing
Smalltalk uses a mixture of double dispatching and coercion with
numbers. This can make up a significant amount of computation time.
Avoid coercion by consistent usage of initial values or consequent
rounding or truncating. Round values before passing them to
GraphicsContext methods. Change the SmallInteger method /, so no
fraction are generated. Normalization of  fractions can be so expensive,
that you can watch lines beeing drawn on your screen.


Problem: Using perform (or: constructing symbols)
Using the "anObject perform: #symbol" is very nice, but can become
expensive, when you construct your symbol from a string. #asSymbol is
quite expensive and tends to fill the symbol space with garbage(they are
never removed).
Try to avoid constructing symbols, rather use a dispatch table
(dictionary), if there are not to many entries. Execute, I'm not quite
shure; "Symbol initTable" ore something to get rid of unused Symbols.


I suspect three areas, none of which I would touch until I did some
intensive profiling (and understood the results! :-)

1) As Andy Choi pointed out to you, adding to OrderedCollections *can*
be slow, but need not be. If a collection never needs to grow, make it
an Array. (Do the same thing if it grows to a certain point, then
becomes essentially read-only.) If it *rarely* needs to grow, make it
an Array anyway, and copy it each time, or consider streaming it. As an
Array, read access will be an order of magnitude faster. If it really
is very dynamic, consider something else, like an IdentitySet.

2) Dictionary probes can be very expensive. If your keys are Strings,
consider making them Symbols, and use an IdentityDictionary.

3) Consider writing a pattern matching primitive if that is eating up
your time.

4) Create fewer objects. For example, don't stuff a bunch of things in
an OrderedCollection just to pass them as arguments, rather keep them
in instance variables, where they can be retrieved through messages.
Massive, unnecessary object creation is a BIG time sink that bites
twice: once when you create, then againg when the GC has to come along
and clean up after you!


It's a sampling technique, so it is necessary to make certain you have
enough samples, or else you end up with aliasing artifacts. Make sure
you loop LOTS of times in the profile. Be aware that lengthy primitives
may "pile up" the time on either side of a sample.

In general, the bottom few lines should show most of the time in
primitives, or at least most of the time should be in base methods. If
not, you have a target for tuning.


By the way, there was a discussion on the net about two months
ago when someone said that when he replaced getter/settter message
sends with ivar access (i.e. wherever they were used to fetch a value
of ivar of the receiver of the current message) his program ran two
times faster.


Growing can be painful.  Recently here in there were some
simple Set benchmarks (adding Integers to Sets) floating around.
Turns out that about 50% of the time was actually spent growing the
set (and it only had to grow it once!).  Correct initial sizing should
be a win.  Note that PPS' implementation of new: for things like Set
and Dictionary is bogus.  That is, it is, new: xx is supposed to make
a collection big enough to hold xx elements.  Since Sets and Dicts are
hashed, you must actually have xx * 1.25 slots to hold xx elements.
They have all the protocols etc to do this correctly but the messed up
the implementation.  


I also had a situation where adding elements to collections was a major
cycle hog. In my particular application I:
   - instrumented the objects to collect statistics on typical collection
      sizes and growth patterns
   - created a subclass of the appropriate collection class
   - implemented new to create instances with larger than the superclass
       default number of slots; the 80/20 rule was applicable here
   - implemented a "predictive" grow method peculiar to this application


Also if you have something like
instance method1.
instance method2.
instance method3.


Anyway, I find that the more I use Smalltalk, the more convinced I am
that the proper use of the Collection classes is the biggest key. In
Digitalk the Set-derived classes (Set, Dictionary, and friends) all use
hashing to optimize (?) lookup times. The IndexableCollection subclasses
use simple arrays as their underlying data structure. Obvious point:
know what you're going to use the collection for and get it into the
optimal representation as soon as possible. (Ultra-obvious point, Don't
use anything derived off Set if you plan to iterate more than look-up)

One Digitalk caveat...Their hashing algorithm stinks so bad that your groovy lookup tables quickly degenerate into linear searches if they get too
big (>64 elements!) you can fix this by munging a few constants in the
base image, or you can reimplement the set classes using a
LinkedCollection sort of structure.

I did this once inside a prime generator program (I wrote it to try and
find better hash constants to use in the solution of the above problem).
I used a VERY simple-minded sieve of erasthostenes program, storing
known primes in a set. It took forever to to generate the primes < 2^31.
I switched the code over to using a PD Btree implementation of a Set and
saw over a thousand-fold spped increase. Admittedly, this is an example
that exploits the pathological behavior in Set derivatives, but I
learned an important lesson from it.


Wed, 11 Dec 1996 01:44:03 GMT  
 [ 1 post ] 

 Relevant Pages 

1. Smalltalk is not a "programming language"?

2. Tips:Writing Smalltalk to be "not slow"

3. string.join(["Tk 4.2p2", "Python 1.4", "Win32", "free"], "for")

4. "Pretty" Language Summary

5. Match "ab" in "abc", but not in "abd"

6. "use" slow on remote computer -- Win95

7. Slow "recording" on winXP Command

8. OOP in the "real world" (SUMMARY)

9. Calling Mario Suedholt (summary re "Skeletons")

10. "The memory could not be "written". " error

11. Summary: "Optimization opportunities unexploited: examples sought"

12. "esh" technical summary


Powered by phpBB® Forum Software