Question: Arrays vs OrderedCollections 
Author Message
 Question: Arrays vs OrderedCollections

I usually use arrays to carry data in Smalltalk. But somebody told me
that the application will be very slow if I use too many arrays. I
should use OrderedCollection instead. I wonder if it is true. I
feel arrays are much easier to use than OrderedCollections. Can
somebody give me some tips about Smalltalk application tune up,
please ?

Thanks in advance.

David



Mon, 05 Feb 1996 00:18:56 GMT  
 Question: Arrays vs OrderedCollections


Quote:
>I usually use arrays to carry data in Smalltalk. But somebody told me
>that the application will be very slow if I use too many arrays. I
>should use OrderedCollection instead. I wonder if it is true. I
>feel arrays are much easier to use than OrderedCollections. Can
>somebody give me some tips about Smalltalk application tune up,
>please ?

I don't know about the memory aspect, but I know I prefer to use
OrderedCollections because they grow automatically.  With Arrays you have
to know how many objects you want in it before you start (and if you
over-fill it, you get an error message) whereas if an OrderedCollection
gets full, it makes itself bigger to accommodate more objects.

--
Matthew Darwin



Mon, 05 Feb 1996 07:43:51 GMT  
 Question: Arrays vs OrderedCollections
|> I usually use arrays to carry data in Smalltalk. But somebody told me
|> that the application will be very slow if I use too many arrays. I
|> should use OrderedCollection instead. I wonder if it is true. I
|> feel arrays are much easier to use than OrderedCollections. Can
|> somebody give me some tips about Smalltalk application tune up,
|> please ?

Both Arrays and OrderedCollections are "variable subclasses", meaning that
they have numbered instance variables.  An OrderedCollection wastes some
of these inst vars in order to allow fast growing at either the start or end.
However, beyond this minor difference, they are treated in the same way by
the garbage collector.  Creating very large collections of any kind can
give the garbage collector a hard time, but creating many small collections
should be no problem.

As Matthew Darwin pointed out, OrderedCollections are generally easier to
use than Arrays because you need not know in advance how big they are.
While Arrays are more familiar to most programmers with experience in other
languages, once you learn how to use OrderedCollections, you won't want
to go back.

One other point: I have seen code where an Array is used as a record,
where the each element has a particular meaning.  It is better to use
a new class to represent such an object.
--
   ,__o                                                  ,__o

(*)/'(*)                                              (*)/'(*)



Mon, 05 Feb 1996 22:40:47 GMT  
 Question: Arrays vs OrderedCollections
|> I usually use arrays to carry data in Smalltalk. But somebody told me
|> that the application will be very slow if I use too many arrays. I
|> should use OrderedCollection instead. I wonder if it is true. I
|> feel arrays are much easier to use than OrderedCollections. Can
|> somebody give me some tips about Smalltalk application tune up,
|> please ?
|>

Not true. In fact, Arrays are slightly faster, because they don't have
to implement adding and removing: thus, their representation is more
static. However, if you are growing your arrays manually (by doing
copyWith:), then this is definitely slower than just add:ing to the
collection.

Using a lot of arrays where you really should be using a domain-specific
object is sometimes a sign of poorly structured code. If you're using
an array to represent any kind of data other than an ordered group of
values, you may be overusing them.

--
-------------------



Mon, 05 Feb 1996 22:40:09 GMT  
 Question: Arrays vs OrderedCollections

Quote:

>>I usually use arrays to carry data in Smalltalk. But somebody told me
>>that the application will be very slow if I use too many arrays.

>I don't know about the memory aspect, but I know I prefer to use
>OrderedCollections because they grow automatically.

Smalltalk-80/V:

If you can possibly pre-allocate, Arrays are better, since their access
methods are primitive, whereas the access methods for
OrderedCollections run in Smalltalk. (To the extent you *can*
pre-allocate, you've got a better design.)

If your data *need* to be dynamically sized, OrderedCollections are
better, but they are overused, and a little forethought can often
replace them with Arrays.

Collectives that *rarely need* to be dynamically sized can be Arrays,
fed by a method that does #copyWith: and #copyWithout:. I'd guess a
access-grow ration of as little as 10:1 might justify this approach.

Note that if you isolate your collectives inside some object, then you
can change implementation at a whim, and see which works best for any
scenario. This is Real Smalltalk at its best! Example, rather than
having lots of objects adding/removing your collective, assign
ownership of that collective to a single object, and don't allow anyone
else to add/remove from it.

SmalltalkAgents:

It has no distinction between pre-allocated and dynamic collectives,
and all Lists grow dynamically. (All lists also automagically answer
streaming protocol, also!) I don't know how they do it, but if it
doesn't impact performance, I say great!

 Jan Steinman, Bytesmiths
 2002 Parkside Court, West Linn, OR 97068-2767 USA, +1 503 657 7703
 Friedlistrasse 19, CH-3006, Bern, Switzerland, +41 31 999 3946



Tue, 06 Feb 1996 14:33:21 GMT  
 Question: Arrays vs OrderedCollections

Quote:
>SmalltalkAgents:
>It has no distinction between pre-allocated and dynamic collectives,
>and all Lists grow dynamically. (All lists also automagically answer
>streaming protocol, also!) I don't know how they do it, but if it
>doesn't impact performance, I say great!
> Jan Steinman, Bytesmiths
> 2002 Parkside Court, West Linn, OR 97068-2767 USA, +1 503 657 7703
> Friedlistrasse 19, CH-3006, Bern, Switzerland, +41 31 999 3946

Regarding SmalltalkAgents.

(1) It does have a class called List that performs all the functions
    of Array and OrderedCollection and also obeys the stream protocols.
    NOTE: Since String is a subclass of List, Strings also obey the
    stream protocols which makes them very fast and efficient.

(2) The List implementation is signifigantly more efficient than an
    ordered collection because it stores its contents inside of itself.
    SmalltalkAgents objects are resizeable so the List merely resizes
    itself to add an object.  This means that a list wastes no memory
    and create no new objects when it grows.  Furthermore, all objects
    support a "pad" factor that enables them to grow in chunks.  This
    means that if the object needs to grow 1 oop slot, it will actually
    take the space from its pad area.  If there is not enough space in the
    pad area it will internally  grow by its "pad" factor.  The GC can
    recover pad area from any and all objects as it needs to for space
    reclamation.  Because of this, all objects and especially collections
    are very memory efficient. By the way, as you might expect lists
    support many of the basic LISP operations for list processing. We
    also added a dynamic list creation operator in the form of
    { statement. statement...}. It builds a List object from the
    accumulated results of each of its statements.

(3) All objects are composed of three logical parts.
    [header/properties] [named and indexed slots] [structured storage]
    The slots and the storage are both independently resizeable and
    thus allow an object to grow or shrink over time as required.
    This means that all objects can actually behave as lists if they
    add the basic messages to support list/collection protocols.
    The structured storage can reside in any memory space and does not
    need to be in the Smalltalk heap.  The structured storage can also
    be an indirected address (i.e. a handle).  The programmer model
    for accessing structured storage remains uniform. Thus
    foo basicCharacterAt: 1. or foo unsignedLongAt: 0. functions
    identically for all memory spaces and/or indirected addresses.
    SmalltalkAgents also supports structure templates to enable the
    structured storage to be accessed just like C structures or Pascal
    records.  In fact, it is often used to hold native host structures
    that are shared with the host C and Pascal routines.
    The size of a character access from the structured storage can be
    1/2/3 bytes depending on the properties settings of the object.
    If a string object is composed of 1 byte characters and a 2 byte
    (Unicode) character is added, the string will automatically rebuild
    its internal structure to hold 2 byte characters.  From the programmer
    level this is all transparent.

(4) SmalltalkAgents provides scoped name spaces known as Library instances.
    They function similarly SystemDictionary but they form a hierarchy
    that nests in the same way the directories do on a file system. The
    partial tree looks similar to the following:

        Environment
        ....DigitalkCompatibility
        ....Macintosh
        ....Motif
        ....OpenLook
        ....OS2
        ....PPSCompatibility
        ....TCPIP
        ....UserLib
        ........DemoLib
        ....Workbench
        ........GUIBuilder

    Each library can contain any set of classes.  Thus the PPSCompatibility
    can contain a definition for FileStream that is different from the
    definition that is contained in Environment or DigitalkCompatibility.

    Libraries can also be included as Pools in a class definition to enable
    all their members to be visible to a class.  A library member can also

    SmalltalkAgents also supports the notion of "aliases" to classes and
    libraries.  Since every class is defined in a library and it has a
    given name.  Any public variable (global) that references that class
    is an alias unless the library and the name are identical.  Thus
    to make an OrderedCollection or Array class we can put an alias in
    PPSCompatibility that points to the List class defined in Environment.

    There is also the notion of a DefaultLibrary which is the target for
    evaluation in listeners (Transcript fields).  The Environment also
    supports the notion a TopLevelLibraries which are used to flatten
    the hierarchy.  TopLevelLibraries are searched in order when a public
    variable cannot be resolved elsewhere.  Any library can be added to
    top level libraries.  Libraries inherit definitions from their parent
    library.  Thus all libraries can see and/or shadow the definitions
    contained the Environment (root) library.

    A developer will typically create a new library and build all their
    classes in the library.  They will also store and objects they may
    need such as lookup tables, pictures, sounds,movies, etc.  Since
    SmalltalkAgents allows any cluster of objects to be extracted or
    imported from the environment in real-time, it becomes a simple
    matter to extract an entire library and send it for importing into
    someone elses environment as a file or over a network link (it can
    also be easily stored in a database).

(5) With regard to performance of Lists.  They are extremely efficient
    because they are leveraging on a "base" functionality that all objects
    have.

(6) There is no artificial distinction for byte var, pointer var, etc.
    class types in SmalltalkAgents.  All classes are treated uniformly
    because the object model is uniform and thus makes all classes
    real first class citizens.  Any class can be made a subclass of
    any other class.  This is intuitive when you realize that every
    objects can have named variables, indexed variables, and structured
    binary storage.

(7) In another "c.l.s. thread", someone asked about scheduling capability
    for SmalltalkAgents pre-emptive threads.  SmalltalkAgents provides
    pre-emptive threads as platform independent entities defined at the
    Smalltalk level.  SmalltalkAgents class library is fully designed
    for re-entrancy, including the compiler.  This means that a thread
    can fork off listening on a TCPStream, TelnetStream, ADSPStream, etc.
    and compile and execute Smalltalk statements sent over that stream
    while other threads are also performing compilations and operations.  
    The ability to schedule threads and handle interrupts/signals is
    controlled at the Smalltalk level and thus enables developers to
    define their own schemes and priority mechanisms.  Threads have their
    own graphic contexts, priorities, groups, process id's, and parent
    processes, etc.

(8) The SmalltalkAgents environment is interrupt driven and supports
    a wide range of signals.  The signals can be selectively masked
    (including all of them) to enable the building of atomic methods.

   "For example"
    mask := thread disableSignals.
    ... atomic statements...
    thread enableSignals: mask.

(9) Threads fully support asynchronous messaging to enable inter-thread
    communication.  A thread which is asleep will wake up whenever it
    receives a message.  Threads typically sleep waiting for signals
    to wake them up by posting messages.  It is also possible to process
    signals (such as network or database i/o) completely at
    signal/interrupt time.

   "For example this posts a message on aThreadObject's queue"
    aThreadObject send: selector to: receiver.

(10) Block closure is fully supported and defineable by the programmer.
    foo := [:a | ].  "Defines a block whose variable 'a' has method scope
                      which is the way Digitalk works."
    foo := [::b |].  "Defines a block whose variable 'b' has block scope
                      which is the way PPS works."

   "We also allow -- ponder this one for a while..."
    foo := [:a ::b :Q | a+b+Q].
   ^foo values: {Time now totalSeconds. Metaclass instanceCount}.

   "CLUES:  {}.  Creates a dynamic list from the statements it contains
            values: injects values from a list into the blocks.
            blocks can have an unlimited number of arguments.
            If a block is evaluated with insufficient arguments,
            the missing arguments are set to nil.
            Q is a public (scratch) variable defined in UserLib
            that will be injected (i.e. set to a value) of nil."

(11) SmalltalkAgents also supports unbound methods.  i.e. methods
     that are not defined in the context of any class.  Such methods
     can be applied to any object and since they are unbound they
     are nameless methods (functions).

     method := SmalltalkCompiler compileUnbound:
               'a: a b: b
                   ^a+b'.
     method executeWithParameters: #(3 4).  "Returns 7"

(12) And then there are compiler directives (like pragmas) for methods...

     foo
         | "method temps" |

         %% "Begin Directives"

             Method Author: 'David Simmons'.
             Method Created: 'Wed 08/25/1993 11:24:36 PM'.        

             Private Method.
             Attribute Method.

             Method Protocol: 'accessing'.

             On Exception Do: [:exeception | ...].

         %% "End Directives"

         ... body statements here ...

Hmm, well...  I could go on and on
...

read more »



Mon, 12 Feb 1996 11:37:04 GMT  
 Question: Arrays vs OrderedCollections

Quote:

>Regarding SmalltalkAgents.
>(2) The List implementation is signifigantly more efficient than an
>    ordered collection because it stores its contents inside of itself.
>    SmalltalkAgents objects are resizeable so the List merely resizes
>    itself to add an object.  This means that a list wastes no memory
>    and create no new objects when it grows.  

This does not make sense to me.  All you are saying is that
extra levels of indirection and the extra state needed to store things
like "current location" are supported by the VM instead of being
implemented in Smalltalk.  This does NOT mean that the implementation
is necessarily more efficient.  In fact, OrderedCollection stores its
contents "inside of itself".

I do not want to go through the entire implementation of OrderedCollection
because  you probably know it already.  But almost everything you say
about List is true of OrderedCollection, too, except that the GC does
not recover pad area.  My understanding is that SmalltalkAgents is
significantly slower than the older implementation of Smalltalk.  Thus,
your claims that this is a more efficient technique need a little
more justification before I'll believe them.  Perhaps the reason why
it is better for SmalltalkAgents to implement things in the VM instead
of Smalltalk is because Smalltalk is slower than it should be.

Quote:
>(3) All objects are composed of three logical parts.
>    [header/properties] [named and indexed slots] [structured storage]
>    The slots and the storage are both independently resizeable and
>    thus allow an object to grow or shrink over time as required.

In my opinion, it is better to make the VM view of an object be as
simple as possible.  Most objects are never resized.  You are better
in the long run to handle those objects that are resized at a higher
level (i.e. in Smalltalk) and make the VM (especially the GC) just
as fast as possible.  In fact, I do not like the fact that in Smalltalk
every object can have indexed slots.  In my opinion, there should only
be a few classes with indexed slotes (Array, ByteArray, String, etc)
and classes like OrderedCollection should have an instance variable
"contents" that contains an Array.  This is more or less what is happening
at the VM level anyway.  I'd prefer to make it explicit and make the
VM simpler.

It is possible that I am full of hot air and that you can get a faster
system by making the VM more complicated.  I've never seen any experiements
that shed light on this one way or the other.  But my intuition leans
toward a simpler VM.

Quote:
>(4) SmalltalkAgents provides scoped name spaces known as Library instances.

I wish the other Smalltalk systems would partition the name space.
This is not very hard to do and is very valuable for building large
systems.

Quote:
>(6) There is no artificial distinction for byte var, pointer var, etc.
>    class types in SmalltalkAgents.

This distinction is unimportant in practice.  Application programmers
never make variable sized classes and lots of Smalltalk-80 programmers
don't even know about this distinction.  Like I said above, I'd prefer
that this distinction be even stronger.

Quote:
>    Any class can be made a subclass of any other class.

?????

Quote:
>(11) SmalltalkAgents also supports unbound methods.  i.e. methods
>     that are not defined in the context of any class.  Such methods
>     can be applied to any object and since they are unbound they
>     are nameless methods (functions).
>     method := SmalltalkCompiler compileUnbound:
>               'a: a b: b
>                   ^a+b'.
>     method executeWithParameters: #(3 4).  "Returns 7"

When I need to do something like this, I do

     aBlock := Compiler evaluate: '[:a :b | a+b]'.
     aBlock value: 3 value: 4.

I probably didn't get the protocol for Compiler right, but you should get
the idea.

Pragmas are interesting, having syntax for lists seems like a good idea,
lots of things about SmalltalkAgents seem worth doing.  However, I don't
buy your comments about making every object complicated.  Most objects
don't need to be complicated, and you should only have to pay the price
for those that are.

Ralph Johnson - University of Illinois at Urbana-Champaign



Tue, 13 Feb 1996 22:18:54 GMT  
 Question: Arrays vs OrderedCollections

writes... (in a thread comparing SmalltalkAgents and Smalltalk)

Quote:

>...every object can have indexed slots.  In my opinion, there should only
>be a few classes with indexed slotes (Array, ByteArray, String, etc)
>and classes like OrderedCollection should have an instance variable
>"contents" that contains an Array...

I agree with your contention here.  It's unfortunate that the various
venders of smalltalk cannot standardize their implementations of Ordered-
Collection.  As you mention (but not quoted here), Smalltalk OrderedCollection
contains its contents "inside of itself", but of the various smalltalks I
have used, this is only true of ST-80 (Parcplace Objectworks, Visualworks,
etc.).  However, Digitalk does use the implementation of storing the
contents of the OrderedCollection inside an instance variable which is an
array.  This to me seems a much cleaner implementation that adheres to the
OOD principles much better than PP's implementation.

Someone else (no slight intended) writes ...
:>>(11) SmalltalkAgents also supports unbound methods.  i.e. methods
:>>     that are not defined in the context of any class.  Such methods
:>>     can be applied to any object and since they are unbound they
:>>     are nameless methods (functions).
:>
:>>     method := SmalltalkCompiler compileUnbound:
:>>               'a: a b: b
:>>                   ^a+b'.
:>>     method executeWithParameters: #(3 4).  "Returns 7"

Ralph responds:

Quote:
>When I need to do something like this, I do

>     aBlock := Compiler evaluate: '[:a :b | a+b]'.
>     aBlock value: 3 value: 4.

>I probably didn't get the protocol for Compiler right, but you should get
>the idea.

Depending on the version of Smalltalk used(?), I don't believe the
'Compiler evaluate:' is even necessary.  I know it's not in ST-80.
Your goal can be accomplished simply by

        aBlock := ([:a :b | a + b ]) value: 3 value: 4.
        ^aBlock value

I use this form all the time for 'hypergraphics' to determine the
action to be taken by the system when the user passes the cursor over
a 'hot region' on the screen.

===================================================================

CDSI - ITS Development       ||  voice  (210) 536-3579
San Antonio, Texas           ||====================================
==============================/
To be sure of hitting the target, shoot first and, whatever you hit,
call it the target.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



Wed, 14 Feb 1996 01:08:00 GMT  
 Question: Arrays vs OrderedCollections
Hi Ralph,

Quote:
>>Regarding SmalltalkAgents.

>>(2) The List implementation is signifigantly more efficient than an
>>    ordered collection because it stores its contents inside of itself.
>>    SmalltalkAgents objects are resizeable so the List merely resizes
>>    itself to add an object.  This means that a list wastes no memory
>>    and create no new objects when it grows.  

>This does not make sense to me.  All you are saying is that
>extra levels of indirection and the extra state needed to store things
>like "current location" are supported by the VM instead of being
>implemented in Smalltalk.  This does NOT mean that the implementation
>is necessarily more efficient.  In fact, OrderedCollection stores its
>contents "inside of itself".

Hmm.  Maybe the point wasn't clear.  For all operations that an
OrderedCollection or an Array object can perform, the List class functions
identically with no overhead.  A #basicAt:/#at: is the same cost whether
it is an Array or an OrderedCollection (except in Digitalk which is slower).
A #basicAt:/#at: in SmalltalkAgents has no extra overhead incurred because
of the richer (uniform) object model.
NOTE: When a List grows or shrinks it doesn't create any new objects and
doesn't need to do any copying of oops.  Its resizing doesn't create as much
memory garbage.  Like the PPS implementation, the access speed is faster than
Digitalks because it is not indirected.

Lists are a combined Array and OrderedCollection.  Arrays are often
used for literal constants such #(foo bar). Yet, the property of being
a constant is not enforced (it is in SmalltalkAgents).  The only real
reason for a distinction between an Array and a OrderedCollection is
because most Smalltalk VM's don't support resizing of a given object.
(i.e. A new object must be instantiated and the elements copied and then
the old object must be told to #become: the new object).  This really
means that Array and OrderedCollection are a result of the constraints
imposed by the VM/object model design and are not based on good factoring.
A general problem in OOP is the proliferation of classes. Poor Smalltalk
architectures can result in such proliferation (another reason not to
use MVC).  Excess or redundant classes complicate maintenance issues and
makes the class libraries larger and thus more difficult to learn.  They
can also hurt performance since they may introduce additional #isKindOf:
tests or uneccessarily deep levels of object nesting.

PPS: Grow by making a new OrderedCollection and then copying objects from
self to the new collection and then #become: the new collection.

Digitalk: Embed an Array instance inside of the OrderedCollection and use an
index to maintain the current size/location of the next slot in the Array.
Grow by creating a new Array and copying objects from the old to the new.
Then set the collection attribute to the new Array.

SmalltalkAgents: add: >>is simply>> #basicAt: self basicSize+1 put: value.
Or if you prefer>>
        self basicSize: self basicSize+1.
        self basicAt: self basicSize put: value.

By the way, because threads are like any other object, they are also
resizeable.  Thus SmalltalkAgents threads provide unlimited recursion
(as a programmer you can specifiy a maximum recursion depth by setting
one of the threads attributes).  The general memory model of
SmalltalkAgents also allows the entire development environment to run
in 4MB. Applications typically take far less.  They also used a shared kernel
which means that if you have additional applications the additional
applications only take 50-100kb of memory each.  You can also make objects
of any size up to the total available memory.  (i.e. String new: 10000000.  
Is fine if you have 10MB free).

Quote:

>I do not want to go through the entire implementation of OrderedCollection
>because  you probably know it already.  But almost everything you say
>about List is true of OrderedCollection, too, except that the GC does
>not recover pad area.  My understanding is that SmalltalkAgents is
>significantly slower than the older implementation of Smalltalk.  

Whatever gave you this idea? (i.e. Where did you get that information?)
SmalltalkAgents is faster overall than even the most recent (or any older
versions of both Digitalk and PPS.  If you're interested in comparisons made
by third parties you can contact QKS.  You can try a copy for 30 days
free and see for yourself. You can also come see SmalltalkAgents at OOPSLA.

Quote:
>Thus,
>your claims that this is a more efficient technique need a little
>more justification before I'll believe them.  Perhaps the reason why
>it is better for SmalltalkAgents to implement things in the VM instead
>of Smalltalk is because Smalltalk is slower than it should be.

I don't quite understand the distinctions you are making here.  Smalltalk and
the VM are one and the same thing.  Smalltalk is the language, the VM is the p-code/native code cpu that executes the Smalltalk.  You're correct that
other implementations of Smalltalk are slower than they need to be. Probably
because of some poor code in the Smalltalk core classes and also because the
VM designs are not as efficient as they could be (i.e. not up to date with
newer hardware capabilities and evolutions in software design).
One reason why we built SmalltalkAgents was because we wanted a faster
Smalltalk implementation (and we succeeded).

Quote:

>>(3) All objects are composed of three logical parts.
>>    [header/properties] [named and indexed slots] [structured storage]
>>    The slots and the storage are both independently resizeable and
>>    thus allow an object to grow or shrink over time as required.
>In my opinion, it is better to make the VM view of an object be as
>simple as possible.  Most objects are never resized.  You are better
>in the long run to handle those objects that are resized at a higher
>level (i.e. in Smalltalk) and make the VM (especially the GC) just
>as fast as possible.  In fact, I do not like the fact that in Smalltalk
>every object can have indexed slots.  In my opinion, there should only
>be a few classes with indexed slotes (Array, ByteArray, String, etc)
>and classes like OrderedCollection should have an instance variable
>"contents" that contains an Array.  This is more or less what is happening
>at the VM level anyway.  I'd prefer to make it explicit and make the
>VM simpler.

That's interesting... Why is it better to have "hardcoded" design limitations
in a language that is supposed to be so expressive and allow polymorphism?
You probably should look at how classes like CompiledMethod are implemented
in PPS and Digitalk to gain a better understanding of why an object should
be capable of (optionally) having all three parts.  All Smalltalks currently
support 2 of the three parts now.  The arbitrary design constraints imposed
by the various "class" types make it more difficult to properly factor the
class libraries. A byte var can't be subclassed by anything other than a
byte var class.  A pointer var can't have named variables. etcetera...
Why are such constraints a good thing?

Quote:
>It is possible that I am full of hot air and that you can get a faster
>system by making the VM more complicated.  I've never seen any experiements
>that shed light on this one way or the other.  But my intuition leans
>toward a simpler VM.

Our VM is designed for RISC cpu architectures.  It is designed to provide
direct sharing of memory spaces with languages other than Smalltalk.  We
can directly share things like C structures with the native operating system.
This dramatically improves performance when interfacing to external code and
the native OS.  It also simplifies design because no data shadowing or kludgy
memory (wrapper) accessor architectures are needed.  We can call or be called
by almost any external language and we don't have to coerce or monkey with
data structures. Our implementations run in 8/24 bit color at the same
(or better) speeds that PPS and Digitalk run in black and white.  We can
perform interrupt driven TCP/IP communications at the same speed as C code.
Dealing with things like SOM (System Object Model) become trivial for any
class.

Quote:

>>(4) SmalltalkAgents provides scoped name spaces known as Library instances.

>I wish the other Smalltalk systems would partition the name space.
>This is not very hard to do and is very valuable for building large
>systems.

My understanding is the PPS will be providing this capability in one of
their upcoming releases.

Quote:

>>(6) There is no artificial distinction for byte var, pointer var, etc.
>>    class types in SmalltalkAgents.
>This distinction is unimportant in practice.  Application programmers
>never make variable sized classes and lots of Smalltalk-80 programmers
>don't even know about this distinction.  Like I said above, I'd prefer
>that this distinction be even stronger.

>>    Any class can be made a subclass of any other class.

>?????

Try giving a class with named variables a subclass that has indexed
variables or vice versa.  Same with byte vars etcetera.  As I said above,
there are artificial constraints that these class typing limitations
impose.  They can lead to an excess of classes (which makes the maintenance
and understanding of code more difficult). They lead to an unecessary
embedding of objects within objects (which makes performance slower).
They also force class hierarchy designs to be poorly factored (because
obvious inheritance structures have to be kludged via embedded objects
and attribute accessors).

- Show quoted text -

Quote:

>>(11) SmalltalkAgents also supports unbound methods.  i.e. methods
>>     that are not defined in the context of any class.  Such methods
>>     can be applied to any object and since they are unbound they
>>     are nameless methods (functions).

>>     method := SmalltalkCompiler compileUnbound:
>>               'a: a b: b
>>                   ^a+b'.
>>     method executeWithParameters: #(3 4).  "Returns 7"

>When I need

...

read more »



Wed, 14 Feb 1996 10:40:09 GMT  
 Question: Arrays vs OrderedCollections

Quote:

>writes... (in a thread comparing SmalltalkAgents and Smalltalk)

>>...every object can have indexed slots.  In my opinion, there should only
>>be a few classes with indexed slotes (Array, ByteArray, String, etc)
>>and classes like OrderedCollection should have an instance variable
>>"contents" that contains an Array...

>I agree with your contention here.  It's unfortunate that the various
>venders of smalltalk cannot standardize their implementations of Ordered-
>Collection.  

The implementations are standardized (although there is no official
standard yet -- QKS is a member of the X3J20 ANSI standards committee)
STA behavior (API) supports all the OrderedCollection messages and
functionality.  The fact that we have a List class that also has an
alias name of OrderedCollection doesn't affect any code or any subclass
behavior.  The only reason you might conceivably have a problem with any
of the vendor implementations is if you are violating encapsulation (a
general no,no) by accessing an OrderedCollection instances variables
via some low level object accessor mechanism (rather than using an
attribute accessor method in the OrderedCollection class).

In SmalltalkAgents:

        foo := OrderedCollection new.
        foo add: 7; add: 8; add: 9.
       ^foo at: 1
        "or"
        foo := OrderedCollection with: 7 with: 8 with: 9.
       ^foo size

Behaves exactly as it should/would in any other Smalltalk.

Quote:
>As you mention (but not quoted here), Smalltalk OrderedCollection
>contains its contents "inside of itself", but of the various smalltalks I
>have used, this is only true of ST-80 (Parcplace Objectworks, Visualworks,
>etc.).  However, Digitalk does use the implementation of storing the
>contents of the OrderedCollection inside an instance variable which is an
>array.  This to me seems a much cleaner implementation that adheres to the
>OOD principles much better than PP's implementation.

I am baffled as to why it is cleaner to have a core (Smalltalk) object
implementing a basic attribute accessing functionality via an
additional object?  It is demonstrably less efficient, more obtuse to
understand, and in fact is an artifact of the lack of resizeable objects
and/or efficient #become: mechanisms in Digitalk's implementations.

i.e. foo := OrderedCollection with: 7.
     foo at: 1.

     (inside of the Digitalk OrderedCollection #at: method it might be)
     at: index
        (index between: 1 and: currentSize) ifTrue:
        [
               ^foo collection at: index "Seems confusing to me?"
        ].
        self error: 'index out of range'.

     (inside of the PPS OrderedCollection #at: method it might be)
     at: index
        (index between: 1 and: (self basicAt: 1)) ifTrue:
        [
               ^foo basicAt: index+1 "Seems a little less confusing to me?"
        ].
        self error: 'index out of range'.

     as opposed to: (simply the primitive #at: in SmalltalkAgents)
        "or if you want to explicitly declare it --
         STA's is smart, so it will optimize at runtime anyway
         when it analyzes the bytecodes the first time."
     at: index
        ^self basicAt: index

PPS: Grow by making a new OrderedCollection and then copy objects from
self to the new collection and then #become: the new collection. Use
one slot as an indexing attribute to keep track of current size.

Digitalk: Embed an Array instance inside of the OrderedCollection and use an
index to maintain the current size/location of the next slot in the Array.
Grow by creating a new Array and copying objects from the old to the new.
Then set the collection attribute to the new Array.

SmalltalkAgents: (more efficiently and simply
                  -- remember STA has GC recoverable padding
                     which you can specify in the class description)
add: value
        "STA's is smart, so it will optimize at runtime
         when it analyzes the bytecodes the first time."
        self basicAt: self basicSize+1 put: value
"or"
        | index "<= index declaration may be optional
                    depending on compiler directives and flags." |

        self basicSize: (index := self basicSize+1).
        self basicAt: index put: value.

Quote:

>Someone else (no slight intended) writes ...
>:>>(11) SmalltalkAgents also supports unbound methods.  i.e. methods
>:>>     that are not defined in the context of any class.  Such methods
>:>>     can be applied to any object and since they are unbound they
>:>>     are nameless methods (functions).
>:>
>:>>     method := SmalltalkCompiler compileUnbound:
>:>>               'a: a b: b
>:>>                   ^a+b'.
>:>>     method executeWithParameters: #(3 4).  "Returns 7"

>Ralph responds:

>>When I need to do something like this, I do

>>     aBlock := Compiler evaluate: '[:a :b | a+b]'.
>>     aBlock value: 3 value: 4.

>>I probably didn't get the protocol for Compiler right, but you should get
>>the idea.

>Depending on the version of Smalltalk used(?), I don't believe the
>'Compiler evaluate:' is even necessary.  I know it's not in ST-80.
>Your goal can be accomplished simply by

>       aBlock := ([:a :b | a + b ]) value: 3 value: 4.
>       ^aBlock value

>I use this form all the time for 'hypergraphics' to determine the
>action to be taken by the system when the user passes the cursor over
>a 'hot region' on the screen.

(* Unbound methods are also one mechanism that can be used to provide
   instance methods *).

Sorry for the simple example. I can make a more complex one that would
be rather difficult to handle properly with a block.  
I was trying to illustrate a point...

        method := SmalltalkCompiler compileUnbound:
'name: name date: date time: time operation: operation otherInfo: info

        | a b c |

        %%
                on exception do: [:exception | ...].
        %%

       " various operations here -- no limit on variables or parameters "
'.

You can do the same thing with blocks in SmalltalkAgents.  What you may not
realize is that (in all Smalltalks) a block ALWAYS carries a home context.
The home context is the method that it was created in.  If you keep a block
around, you keep its home context. This means that you may also be keeping
around an entire nest of other objects that the garbage collector can't get
rid of until the block is no longer referenced.  This leads to an excess of
objects.  

It is also a "mysterious" cause of objects not getting GC'ed even though you think they are not referenced anywhere.  Why do you think that PPS images get
so big?  Using blocks this way can also lead to unintentional side effects
since they have a context and may set the values of the blocks home context
receiver (i.e. self in the method that the block was instantiated in).

In PPS and Digitalk, blocks have a limited number of (injection --
value:value:...) parameters they can accept.  Digitalk blocks don't have
closure so you are always modifying method temporaries in Digitalk.

        | a |

        foo := [:a | ...].
        foobar := [:b | ...].  "b is actually an undeclared method temp"

        a := 12.
        foo value: 4.

       ^a "Returns 4 -- presumably an unintentional side effect resulting
                        from Digitalk's lack of block closure.  This
                        would be even more nefarious if this method had
                        been passed the block to another method that
                        somewhere down the send chain had sent it the
                        value: message."

In PPS all block variables are scoped to the block.  This means that you
never have the inadvertant side-effects shown above.  However, all blocks in
PPS that have variables require extra contexts.  In other words, they are
called "dirty blocks" because the objects they touch upon interefere with
incremental GC.  Thus objects operated on through block variables in PPS
hang around longer and clutter up the memory space(s) -- leading again to
excess garbage and more frequent compaction operations.  In addition, PPS
blocks with variables (closure) will actually create a new block object
(context) every time they are sent the message value (i.e. even more
memory stuff).

Digitalk compiles its "evaluations" into a method in UndefinedObject
and thus it has to update/flush the method cache.  It also means that while
the "evaluation" is occuring, the compiler may not be able to evaluate
another expression since it needs to create it in the same method table slot
of UndefinedObject.  (This is a major issue when you have "real" pre-emptive
asynchronous threads).  In any event, the compilers in both Digitalk and PPS
are not re-entrant and thus (if they did have pre-emptive threads) they
couldn't evaluate statements or compile code on the different threads
without creating serious/fatal re-entrancy problems.  STA does have
pre-emptive (asychronous) threads as part of the Smalltalk language and
class library (i.e. they are not a host/platform dependent functionality).

Unbound methods can have any number of parameters.  They are not bound to
any class so they don't carry any unecessary context information (i.e.
object baggage that interferes with GC operations).  They don't cause any
class method table overwriting conflicts because they are nameless.  They
can be exported/imported (like all objects in SmalltalkAgents) from/into the
Environment in real-time.  When exported they don't need to carry any excess
baggage such as class definitions.  Since they have no context, they provide
complete closure (if you know LISP you understand the importance of this).
Side-effects of accidentally modifying class pool variables, class instance
variables, or instance variables are not possible with unbound methods.
Unbound methods can also have full first class exception handlers, which
blocks don't have (although a blocks home ...

read more »



Wed, 14 Feb 1996 11:39:33 GMT  
 Question: Arrays vs OrderedCollections
Just to note that I really appreciate the detailed explanations that David
Simmonds of QKS has been making here about bothe STA and the reasons/
issues it got that way. On most things I agree with him, on others
I differ, but its nice to hear details from someone who's been there.

Ivan



Wed, 14 Feb 1996 06:09:39 GMT  
 Question: Arrays vs OrderedCollections

Quote:
>You can do the same thing with blocks in SmalltalkAgents.  What you may not
>realize is that (in all Smalltalks) a block ALWAYS carries a home context.
>The home context is the method that it was created in.  If you keep a block
>around, you keep its home context, this means that you may also be keeping
>around an entire nest of other objects that the garbage collector can't get
>rid of until the block is no longer referenced.  This leads to an excess of
>objects.  

I really don't think this is true. PPS has 'clean' blocks, 'copying' blocks,
and 'full' blocks. Of these, only full blocks reference the context they
were created in. Full blocks are only created if you do a return, or (I think)
reference 'super'. Even assigning to an instance variable only causes self
to be copied into the BlockClosure object.

For instance

actionBlockFor: aName

        ^[:event | self registerEvent: event from: aName].

returns a copying block (self and aName are copied), that doesn't reference
the creating context. If the block references nothing other than its arguments,
it is even more efficient. This is a major advantage to PPS's system of
closures: since in Digitalk all variables are scoped within the method, all
blocks have to reference their creator context.

Quote:
>By the way, because threads are like any other object, they are also
>resizeable.  Thus SmalltalkAgents threads provide unlimited recursion
>(as a programmer you can specifiy a maximum recursion depth by setting
>one of the threads attributes.

Are you claming that stacks in some other Smalltalk cannot grow arbitarily?

Quote:
>PPS blocks with variables (closure) will actually create
>a new block object (context) every time they are sent the message value
>(i.e. even more memory stuff).

Arguably true. However, PPS does this within StackSpace, which operates
differently from the object spaces, and doesn't do GC in the same way.
In fact, if you are just doing regular message sends and returns, it
works just as efficiently as a C-language stack. Only when you do something
special with contexts (such as create a full block) are contexts converted
from the stack-frame format into objects in object space.

Quote:
>In any event, the compilers in both Digitalk and PPS
>are not re-entrant and thus (if they did have pre-emptive threads) they
>couldn't evaluate statements or compile code on the different threads
>without creating serious/fatal re-entrancy problems.

I'm not sure this is real important. Most programs hardly ever
actually invoke the compiler.

Quote:
>Unbound methods can have any number of parameters.  They are not bound to any
>class so they don't carry any unecessary context information (i.e. object
>baggage that interferes with GC operations).  They don't cause any class method table overwriting conflicts because they are nameless.  They can be
>exported/imported (like all objects in SmalltalkAgents) from/into the
>Environment in real-time.  When exported they don't need to carry any excess
>baggage such as class definitions.  Since they have no context, they provide
>complete closure (if you know LISP you understand the importance of this).  Side-effects of accidentally modifying class pool variables, class instance
>variables, or instance variables are not possible with unbound methods.
>Unbound methods can also have full first class exception handlers, which
>blocks don't have (although a blocks home context/method may).

Are you claiming that I can't do:

        myBlock := [
                Object errorSignal
                        handle: [:ex | ex returnWith: nil].
                        do: [...]
        ].

LISP doesn't really provide any strong form of complete closure: setq can modify
just about anything you might not want it to. You're not going to get a purely
applicative language here.

Quote:
>>(6) There is no artificial distinction for byte var, pointer var, etc.
>>    class types in SmalltalkAgents.
>This distinction is unimportant in practice.  Application programmers
>never make variable sized classes and lots of Smalltalk-80 programmers
>don't even know about this distinction.  Like I said above, I'd prefer
>that this distinction be even stronger.

I'd agree that I rarely want to make my own indexed class.

Out of curiosity, how do you handle SmallIntegers? Can I subclass it?

How do you effiently have the same structure for an object that stores
pointers/words/bytes in its indexed slots? You must know somehow in
advance, because you have to know how big the slots are. You certainly
don't want to make all the slots 32-bits, if you're going to be storing
bytes in them. (Byte objects need to be efficient, because there are
a lot of strings and bitmaps that are byte-oriented).

Quote:
>Our VM is designed for RISC cpu architectures.

Hmmm. How?

Quote:
>(another reason not to use MVC)

You're flaming MVC? Perhaps you should elaborate on this, as I think a lot
of us readers are immediately skeptical of people criticizing a paradigm
that has so much going for it.

Quote:
>OrderedCollection or an Array object can perform, the List class functions
>identically with no overhead.
...
>Lists are a combined Array and OrderedCollection.  Arrays are often
>used for literal constants such #(foo bar). Yet, the property of being
>a constant is not enforced (it is in SmalltalkAgents).  The only real
>reason for a distinction between an Array and a OrderedCollection is
>because most Smalltalk VM's don't support resizing of a given object.
>(i.e. A new object must be instantiated and the elements copied and then
>the old object must be told to #become: the new object).  This really
>means that Array and OrderedCollection are a result of the constraints
>imposed by the VM/object model design and are not based on good factoring.
>A general problem in OOP is the proliferation of classes. Poor Smalltalk
>architectures can result in such proliferation (another reason not to
>use MVC).  Excess or redundant classes complicate maintenance issues and
>makes the class libraries larger and thus more difficult to learn.  They
>can also hurt performance since they may introduce additional #isKindOf:
>tests or uneccessarily deep levels of object nesting.

Here's a few benchmarks that suggest that PPS's implementation of
OrderedCollections is essentially as fast as their implementation
of Arrays. (PPS4.1 on HP9000/720, unloaded)

|array orderedCollection|
array := (1 to: 100) asArray.
orderedCollection := (1 to: 100) asOrderedCollection.
Transcript cr;cr;cr;show: [array do: [:x | x+1 ]] speed.
Transcript cr;show: [orderedCollection do: [:x | x+1 ]] speed.

outputs:

576.592 microseconds per iteration
1734.33 iterations per second

566.904 microseconds per iteration
1763.97 iterations per second

and:

|array orderedCollection|
array := (1 to: 1000) asArray.
orderedCollection := (1 to: 1000) asOrderedCollection.
Transcript cr;cr;cr;show: [array inject: 1 into: [:a :b | a+b ]] speed.
Transcript cr;show: [orderedCollection inject: 1 into: [:a :b | a+b ]] speed.

outputs:

10610.6 microseconds per iteration
94.2456 iterations per second

10957.7 microseconds per iteration
91.2596 iterations per second

So, you really can't say that there's any particular performance difference.

(Many thanks to Bruce Samuelson for the speed goodie. It does a GC before
running the blocks, so performance is very consistent (+/- 3% on successive
runs), most of which I assume is due to the filling of the HP's large (1M)
code cache.).

I'd be interested in knowing more about how this is implemented.
If it is really true that you can:
        (a) grow these data structures arbitrarily
        (b) without occasionally creating a larger copy and
            abandoning the old one
and
        (c) access its slots in O(1) time,
then you have invented a fundamentally new data structure that could
revolutionize the study of algorithms.

Assuming that you do have some sort of N-way tree implementation, how
much overhead is there? PPS OrderedCollection generally grows by
factors of 2: thus there is an average 33% wasted slots. How does yours
compare?

I guess what it comes down to is: is it worth whatever few percent (if that?)
overall performance increase to have your collection classes incompatible
with the standard? Expecially if the new Smalltalk standard (XJ1173?) comes
out in favour of the conventional implementation, you folks are going to
have a tough time.

--
-------------------



Thu, 15 Feb 1996 07:24:19 GMT  
 Question: Arrays vs OrderedCollections

Quote:
>I really don't think this is true. PPS has 'clean' blocks, 'copying' blocks,
>and 'full' blocks. Of these, only full blocks reference the context they
>were created in. Full blocks are only created if you do a return, or (I think)
>reference 'super'. Even assigning to an instance variable only causes self
>to be copied into the BlockClosure object.

>For instance

>actionBlockFor: aName

>       ^[:event | self registerEvent: event from: aName].

>returns a copying block (self and aName are copied), that doesn't reference
>the creating context. If the block references nothing other than its arguments,
>it is even more efficient. This is a major advantage to PPS's system of
>closures: since in Digitalk all variables are scoped within the method, all
>blocks have to reference their creator context.

====================
As you read on, you'll find I was probably wrong about block contexts,
and in fact I agree with your responses. :-)
====================

What happens in the following case?

foo: param

        | local block |

        block := [:inject | local := inject].

        param doSomething: block.

        local ifTrue: [^block] ifFalse: [^nil].
--------------
What happens in the following case?

foo: param

        | local block |

        block := [:inject | inject doThing: local].

        local := 'Foobar'.   "Is local nil or 'Foobar' in block?
                              In order to be 'Foobar' it must have
                              the method context and so it must be
                              a Full block"
        param doSomething: block.

        local ifTrue: [^block] ifFalse: [^nil].
--------------

Based on what you have said and the way in which I know STA blocks
function, it probably means that the following rules probably apply:

(1) All blocks have an attribute for the method they are defined in.
(2) All blocks know their starting byte code in the method they are
    defined in.
(3) All blocks know the number of parameters they require.
(4) If a block references an outer blocks variables it needs to
    be able to find the outer blocks.  The outer blocks need to
    store their referenced variables in a separate context.
(5) If a block references any instance variables it needs self.
(6) If a block references any method temps it needs the method context.
(7) If a block does a non-local (^) return it needs the method context.
(8) If a block references any global variables it needs the literal
    association.
(9) If a block contains any other literals (Array constants, Strings,
    LargeIntegers) it needs those literals.
(10)Presumably it obtains those literals from the method in which the
    block is defined in.
(11)The method contains the receivers class.

So a CLEAN block is:
a) No child blocks referencing this blocks variables (so no block
   local context needs to be made each value...)
b) No method temps referenced (so no method context required)
c) No non-local return (so no method context required)

So a COPYING block is:
a) A child block references one or more of this blocks variables.
   (So a block context is required to hold the block values for
    the child block(s)).
b) No method temps referenced (so no method context required)
c) No non-local return (so no method context required)

So a DIRTY block is:
a) Any block that is not CLEAN or COPYING

A CLEAN block then has:
* It probably carries around "self", unless we change
  the rules about blocks that reference class instance vars and
  instance variables.
* It carries around the compiled method and all its literals.

A COPYING block then has:
* It probably carries around "self", unless we change
  the rules about blocks that reference class instance vars and
  instance variables.
* It carries around the compiled method and all its literals.
* A local context (on the call chain) while it is performing value.

A DIRTY block then has:
* It probably carries around "self", unless we change
  the rules about blocks that reference class instance vars and
  instance variables.
* It carries around the compiled method and all its literals.
* A local context (on the call chain) while it is performing value...
* The method context in which it was invoked.  If it only has the
  context for the purposes of a non-local return then the context
  can be de-referenced as soon as the method exits/returns.

=============
Thank you for clarifying my understanding of the details of PPS blocks.

MY PREVIOUS STATEMENTS ABOUT BLOCKS ALWAYS CARRYING AROUND A CONTEXT
WERE NOT CORRECT (I AM MORE FAMILIAR WITH THE ISSUES IN DIGITALK ST).
=============

STA blocks are effectively the same as PPS but have some additional
features. They can be compiled to behave like Digitalk blocks or
PPS blocks.  They can have an unlimited (255) number of parameters.

The primary reason we added unbound methods was to solve the following
(global -- class>>instance method) contention problem.

Time 1 -- thread A: evaluate: '4+3'
Time 2 -- thread A: compilation of temporary method X[A] completed.
Time 3 -- thread A: (swap in thread B -- remember pre-emptive threads)
Time 4 -- thread B: evaluate: 'Time now'.
Time 5 -- thread B: compilation of temporary method X[B] completed.
Time 6 -- thread B: (swap in thread A -- remember pre-emptive threads)
Time 7 -- thread A: ^nil X (operates on X[B] not X[A])
... We therefore have a synchronization problem

It could be solved by:
Time 1 -- thread A: evaluate: '4+3'
Time 2 -- thread A: internalEval: '[4+3]'. (convert to the form)
Time 3 -- thread A: compilation of temporary method X[A] completed.
Time 4 -- thread A: disable signals/interrupts
                    set X[A] in UndefinedObject class.
                    block := nil X.
                    enable signals/interrupts.
Time 3 -- thread A: (swap in thread B -- remember pre-emptive threads)
Time 1 -- thread B: evaluate: 'Time now'
Time 2 -- thread B: internalEval: '[Time now]'. (convert to the form)
Time 3 -- thread B: compilation of temporary method X[B] completed.
Time 4 -- thread A: disable signals/interrupts
                    set X[B] in UndefinedObject class.
                    block := nil X.
                    enable signals/interrupts.
Time 3 -- thread A: (swap in thread A -- remember pre-emptive threads)
Time 7 -- thread A: ^block value (i.e. no problem)

So tradeoffs?  
Having had to follow this out to its logical conclusion, unbound methods
save some compile time, they save some disabled interrupt/signal time,
and they save the instantiation of a block.  In addition to the block
that is instantiated we end up keeping around the method and the extra objects
we need for the block (which is not signifigantly more than what the
unbound method keeps around).

1) We can live with briefly disabled signals/interrupts.
2) Only difference is we have the block object itself, its class,
   and the "self" object which is nil.
3) Flush the cache for the "X" method.  Install the X method in the
   UndefinedObject class.  (A speed loss, perhaps not signifigant
   compared to compilation time?).
4) The X method and all the globals and literals it references hang
   around even after the evaluation completes -- because the method
   is permanently stored in UndefinedObject.  A minor but possibly
   annoying situation -- could be remedied by removing the method
   from UndefinedObject immediately after execution.

Unbound methods aren't required.  Their usage is relevant to
the general user only in the sense that they may lead to speedier
evaluation with slightly less object instantiation/gc.

Quote:

>>By the way, because threads are like any other object, they are also
>>resizeable.  Thus SmalltalkAgents threads provide unlimited recursion
>>(as a programmer you can specifiy a maximum recursion depth by setting
>>one of the threads attributes.

>Are you claming that stacks in some other Smalltalk cannot grow arbitarily?

YES.  Specifically, some Digitalk platforms have a 32kb limit (it may apply
to all of them).

============
On a separate note, I was speaking with someone recently about benchmarks
for PPS and they indicated that it was important to give PPS at least 8MB
for benchmarks to perform properly.  Since STA can run the
entire development system in 4MB that seemed large? As I understand
it, PPS has separate memory spaces and thus has problems making large
objects -- i.e. it cannot always effectively utilize the space.  STA is
using a single physical memory space.  The logical separation of memory
spaces is handled by linking of blocks into appropriate semi-spaces.  The
result is that there is a single free space. When a block is freed, it goes
back into the global free pool. This makes it possible to make an object
almost as large as the total available space.  Performance is generally not affected by memory space unless the free space falls below some 15-20%/200kb whichever is less. Whether this is a good design or not, I will not speculate, other than to say that we can run in less memory and we are less affected
by available free memory space.
============

- Show quoted text -

Quote:
>>PPS blocks with variables (closure) will actually create
>>a new block object (context) every time they are sent the message value
>>(i.e. even more memory stuff).

>Arguably true. However, PPS does this within StackSpace, which operates
>differently from the object spaces, and doesn't do GC in the same way.
>In fact, if you are just doing regular message sends and returns, it
>works just as efficiently as a C-language stack. Only when you do something
>special with contexts (such as create a full block) are contexts converted
>from the stack-frame format into objects in object space.

>>In any event, the compilers in both Digitalk and PPS
>>are not re-entrant and thus (if they did have pre-emptive threads) they
>>couldn't evaluate statements or compile code on the different threads
>>without creating serious/fatal re-entrancy problems.

>I'm not sure this is real important.

...

read more »



Thu, 15 Feb 1996 21:43:23 GMT  
 Question: Arrays vs OrderedCollections

Quote:
>(4) If a block references an outer blocks variables it needs to
>    be able to find the outer blocks.  The outer blocks need to
>    store their referenced variables in a separate context.

I think this is where PPS is more clever than that. BlockClosure has
an attribute called copiedValues. If you do:
 - a read access to a local variable:           it gets copied

 - a read/write access to an instance variable: self gets copied

 - a write access to a local variable:          only in this case
                                                does the new block
                                                point to the creator
                                                context

Note the semantics of copying: if you change the value of an local variable
after creating a block that references it, the properties of the block
do not get changed. This is the real meaning of 'closure'. That is, once
you create the block, it's closed.

For example:
Object subclass: #TestThing
        instanceVariableNames: 'iv1 iv2 '
        classVariableNames: ''
        poolDictionaries: ''
        category: 'TrevorScratch'

TestThing>>putBlockIV1
        ^[:value | iv1 := value].

'TestThing new putBlockIV1' returns [edited]:

a BlockClosure
        method:
                a CompiledBlock
                        bytes: #[36 203 1 16 17 202 0 101 ];
                        outerMethod: (TestThing>>#putBlockIV1.
        outerContext: nil;
        copiedValues: <the TestThing instance>.

Those byte codes mean:

literals: ()

1 <CB 01> push 1 copied values            "This is self"
3 <10> push local 0                       "push :value"
4 <11> push local 1                       "use index 1 into copiedValues"
5 <CA 00> store inst 0 of top             "store into iv1"
7 <65> return
'

Thus, the block does not have a pointer to the creater context. It has its
own CompiledBlock, which PPS compiles directly to its own machine code
on execution. As far as I can tell, the outerMethod IV of CompiledBlock
is only used to show you the method in the de{*filter*}.

Quote:
>Fewer classes involved in the menu structure should result in simpler and
>cleaner encapsulation and design/maintenance.

Hmmm. Possibly this is true in some cases, but I would hardly consider this
a commandment of OO design. I could, for instance, have my entire program
be one object. Is this good?

Quote:
>SmalltalkAgents
>provides a per/instance property for the size of character accesses into the
>structured storage of the instance.  This is why there only needs to
>be one class of String, and one class of Symbol even though we support
>8/16/24 bit characters.  The programmer model is always to view characters
>as unsigned 24 bit values.  The actual storage space used in an object
>will correspond to the largest character size used within the object.

This is nice. Hear, Hear.

Quote:
>When the object grows larger than available space in the pad area,
>the memory manager clones the object to be the required size plus
>some pad factor. The clone is performed with a fast memory copy loop
>(remember RISC).  The old block that was too small is immediately
>added to the free blocks list.

Adding to the free blocks list immediately is perhaps an advantage over
the PPS implementation of simply leaving the old version free for GC.
However, most implementations prefer not to even use a free list, because
of the overhead of scanning the list to find a block of an appropriate
size for the allocation. Rather, PPS simply blasts objects into NewSpace
(one after another, so the allocation is just incrementing the end-of-space
pointer). Allocation is nearly free. Then, it cleans up the short-lived
corpses a short time later.

The idea of the unused slots being available for GC is interesting. I'm
not sure whether I like the idea or not. Potentially, in a generational
scavenging system, your Lists would typically be scavenged into a new
space, and shrunk down to their minimum size quite soon after creation.
This _could_ result in having to grow the object much more often.

On the other hand, there is possibly a win in terms of working in
small memories.

Can you describe for us c.l.s types STA's GC system? Is is a generational
scavenging approach? Can you explain why you use free list allocation?

Quote:
>It is more than a few percent.  OrderedCollections and Arrays are "core"
>elements that are used everywhere so their performance is tantamount to
>the overall performance of a Smalltalk.

I'm still not sure. The vast majority of collections in PPS are Arrays, which
are equally as fast as Lists.

In response to your benchmark suggestions:

Transcript cr;show:  [a := (1 to: 5000) asArray] speed.
Transcript cr;show:  [oc := (1 to: 5000) asOrderedCollection] speed.

106259.0 microseconds per iteration
9.41094 iterations per second

90410.3 microseconds per iteration
11.0607 iterations per second

(oddly enough!)

In the other tests using at: and at:put:, OCs are about twice as slow
as Arrays: about 4 uS for Array>>at:, and 8 for OrderedCollection>>at:

I'm not trying to be critical of STA. Also, I am not in any way affiliated
with ParcPlace (I'm just a reasonably satisfied user). But, when you are
making comparisons with a product that I and others know quite well,
you have to be very sure of your facts.

--
-------------------



Fri, 16 Feb 1996 05:24:21 GMT  
 Question: Arrays vs OrderedCollections

Quote:
>I think this is where PPS is more clever than that.

I'm not sure "clever" best describes the design decisions you were outlining.
It is actually "easier" to implement block closure the way you are about
to describe, but we don't think it is "proper" closure -- which is why
we tried to go the extra mile.

I will say however, that having had to re-look at this issue in "gory"
detail I am very inclined to think that block parameters should ONLY have
block scope. The potential for mistakes, bad side effects, and misleading
code outweigh any real benefits of allowing parameters to be globals (as
STA currently does) or method temporaries (as STA and Digitalk allow).
It has been close to two years since I last looked into the issues.  In
that time I have gained a great deal more insight into the nature of
large code bases (i.e. the entire class library set) and related
implementation issues. The added performance cost for allowing block
parameters to be anything other than block scoped are not worth it. So we
will probably transition our implementation to only allow block scoping
of a blocks parameters.

**
  Since we already have the more flexible block parameter architecture,
  we will probably disable it by default.  To utilize the functionality
  the user will have to explicitly specify it in the method directives.
  Hopefully we will "evolve" it out of our system by version 2.0.
**

Quote:
>BlockClosure has
>an attribute called copiedValues. If you do:
> - a read access to a local variable:          it gets copied

> - a read/write access to an instance variable: self gets copied

> - a write access to a local variable:         only in this case
>                                               does the new block
>                                               point to the creator
>                                               context

>Note the semantics of copying: if you change the value of an local variable
>after creating a block that references it, the properties of the block
>do not get changed. This is the real meaning of 'closure'. That is, once
>you create the block, it's closed.

Hmm... As I mentioned above, this seems like a design/conceptual error to
me. It can lead to some serious inconsistencies that I believe would
result in erroneous design/code flaws as the following trivialized
examples try to illustrate.

For example:
------------------------
foo

        | block local |

        block := [local].

        local := 8.

       ^block           "When evaluated, block should return 8 not nil"
------------------------
foo

        | block |

        block := [ivar]."ivar is some instance variable"

        ivar := 8.

       ^block           "When evaluated, block should return 8 not nil"
------------------------
"OR"
foo

        | block |

        block := [Q].   "Q is some global"

        Q := 7.

       ^block           "When evaluated, block should return 7 not nil"
------------------------
"OR"
foo

        | block |

        [:x |
                block := [x].
                x := 6.
        ] value: nil.

       ^block           "When evaluated, block should return 6 not nil
                         -- or it is inconsistent with the other
                            rules for scoping!
                         -- it also puts a severe limit on the usefulness
                            of block variables."
------------------------
"OR"
foo

        | block |

        [:x |

                block := [x].   "From a language binding point of view
                                 it should not matter the order in which
                                 the block is declared!"
                x := 6.
               ^block
        ] value: nil.   "When evaluated, block should return 6 not nil
                         -- or it is inconsistent with the other
                            rules for scoping!
                         -- it also puts a severe limit on the usefulness
                            of block variables."
------------------------
"OR"
foo

        | block |

        [:x |

                x := 6.
                block := [x].
        ] value: nil.

       ^block           "When evaluated, block should return 6 not nil
                         -- or it is inconsistent with the other
                            rules for scoping!
                         -- it also puts a severe limit on the usefulness
                            of block variables."
------------------------
"OR"
foo

        | block |

        [:x |

                x := 6.
                block := [x].   "From a language binding point of view
                                 it should not matter the order in which
                                 the block is declared!"
               ^block
        ] value: nil.   "When evaluated, block should return 6 not nil
                         -- or it is inconsistent with the other
                            rules for scoping!
                         -- it also puts a severe limit on the usefulness
                            of block variables."
------------------------
All the above blocks should have behaved consistently.  If the scope of a
variable changes its behavior that makes for a "dangerous" scenario where
you must code differently depending on variable type and implementation.
This is a "language" issue and should be addressed/resolved by the
standards committee.

==============================

Quote:
>Adding to the free blocks list immediately is perhaps an advantage over
>the PPS implementation of simply leaving the old version free for GC.
>However, most implementations prefer not to even use a free list, because
>of the overhead of scanning the list to find a block of an appropriate
>size for the allocation. Rather, PPS simply blasts objects into NewSpace
>(one after another, so the allocation is just incrementing the end-of-space
>pointer). Allocation is nearly free. Then, it cleans up the short-lived
>corpses a short time later.

Really? I believe they have multiple heaps/memory spaces.  I think that they
migrate objects from one space to another over time.  I suspect they must
garbage collect the EDEN space frequently.  I would guess that they allocate
larger objects in memory spaces other than EDEN. If my speculations are
correct, then I think you might be oversimplifying the issues in
PPS memory management.  Allocation, GC, and attribute read/write operations
are all intertwined.  In my experience, oversimplifying one area usually
leads to complexities or poor performance (including excessive memory
requirements) in other areas.

Quote:

>The idea of the unused slots being available for GC is interesting. I'm
>not sure whether I like the idea or not. Potentially, in a generational
>scavenging system, your Lists would typically be scavenged into a new
>space, and shrunk down to their minimum size quite soon after creation.
>This _could_ result in having to grow the object much more often.

The "quite soon" is in human terms not in execution terms.  Objects
such as strings, lists, and other collections tend to be grown/shrunk in
bursts.  The GC mechanism only takes pad space away when it cannot satisfy
memory requests any other way.  As soon as the object demands more space
the padding is re-allocated.

Quote:

>On the other hand, there is possibly a win in terms of working in
>small memories.

It should simplify the programming model since the class designer
can count of uniform mechanisms for every object.  It should also
perform more efficiently since the resizing mechanisms are a fundamental
operation in the VM rather than residing at the ST level (you can still
do it in ST but you don't have to) -- and the VM is handcrafted for
maximum performance on any given platform.

Quote:

>Can you describe for us c.l.s types STA's GC system? Is is a generational
>scavenging approach? Can you explain why you use free list allocation?

SmalltalkAgents has four garbage collection schemes running
simultaneously, which is why GC operations are "transparent".  The free
block list is only checked to a few levels and it is fast (remember RISC
-- sorry but I have to keep pointing it out).  If a free block cannot be
found in a heuristically determined amount of "work" then it is allocated
from the end of free space (i.e. like NewSpace).  This enables STA to
function effectively with lower memory because the need to compact the
heap space can be deferred and processed incrementally since it isn't the
only free memory source. Also, with an object table approach you can do
very fast GC and oop mutation but you run the danger of the OT growing
large and not being compactable.  We believe we have also solved that
problem with our GC schemes.  I apologize, but for "real-business"
proprietary reasons I cannot really go much further into the GC
architecture.

Quote:
>(oddly enough!)

>In the other tests using at: and at:put:, OCs are about twice as slow
>as Arrays: about 4 uS for Array>>at:, and 8 for OrderedCollection>>at:

The OC versus Array times are the kind of differences I expected.

I am sure that as time goes on, benchmarks or other testing will reveal
areas where STA performs less well than other Smalltalks or similar
languages.  Our focus is on overall performance of STA based applications
AND productivity for development AND maintainability of application code.
As with anything, compromises are required and performance of the
Smalltalk language is not the only issue -- as long as the performance
decisions are an intentional aspect of the design, and as long as the
performance is "reasonable".

Quote:

>I'm not trying to be critical of STA. Also, I am not in any way affiliated
>with ParcPlace (I'm just a reasonably satisfied user). But, when you are
>making comparisons with a product that I and others know quite well,
>you have to be very sure of your facts.

As far as being critical of STA, go ahead.  As far as correcting my
errors, go ahead with that too -- how else will we/QKS ...

read more »



Fri, 16 Feb 1996 13:11:19 GMT  
 
 [ 20 post ]  Go to page: [1] [2]

 Relevant Pages 

1. newbie question on OrderedCollections

2. Multidimensional array vs. array of array

3. Question about array ops on arrays of types of arrays of ...(ack)

4. automatic arrays vs. allocated arrays

5. Subroutine arguments: arrays vs array slices

6. overhead of tracing arrays vs. array elements

7. OrderedCollections

8. Help with OrderedCollections

9. GAS vs M$ vs Borland; formats questions

10. web vs standalone (VRML vs Inventor): question

11. Newbie question: tcl vs perl and tkMotif vs tkX

12. stdcall vs c vs cdecl vs pascal vs whosyerdaddy

 

 
Powered by phpBB® Forum Software