Collection of collections, how to collect? 
Author Message
 Collection of collections, how to collect?

I have an object that represents the relationship between two parties called
EfiTreaty.  Any single party can have multiple of them.  Each EfiTreaty
contains within it a collection of contracts between two parties.  Not
surprisingly, they're called EfiContract(s).

When I need to look at all of a party's (EfiParty) contracts I do something
like:

contracts
        ^self treaties inject: (Set new) into: [ :sum :each | sum addAll:
(each contracts values); yourself ]

but it seems really slow (no I haven't profiled it yet, it runs in Gemstone).
I suspect this may be a less than efficient way to create a collection of all
the treaty's contracts.  To make matters worse, each EfiTreaty has three other
collections just like contracts that I need to do the same thing on.

For background, "self treaties" returns a collection of an EfiParty's
EfiTreatys.

--
.tom



Mon, 01 Dec 2003 09:57:14 GMT  
 Collection of collections, how to collect?
Tom,

I do something like:

contracts
        ^self treaties inject: (Set new) into: [ :sum :each | sum addAll:
(each contracts values); yourself ]

but it seems really slow
<<

You don't explain what "values" is.

Have you defined a (good) "hash" method for whatever class of object "(each
contract values)" returns?

Cheers, Paul



Mon, 01 Dec 2003 11:39:21 GMT  
 Collection of collections, how to collect?
Thomas,

Quote:
> When I need to look at all of a party's (EfiParty) contracts I do
something
> like:

> contracts
>         ^self treaties inject: (Set new) into: [ :sum :each | sum addAll:
> (each contracts values); yourself ]

> but it seems really slow (no I haven't profiled it yet, it runs in

Gemstone).

There is a profiler in GemStone -- see the ProfMonitor class.  You can also
use the old "System>>millisecondsToRun:" as you are looking to find faster
incantations.

Possibly related:

I have profiled this type of code in VisualWorks and generally have found it
slower that the old ugly 'C'-type array incantations.  For example:

    | anOrderedCollection |

    anOrderedCollection := OrderedCollection new: 100.

    self treatites do: [ :each |
        anOrderedCollection addAll: each contracts values ].

    ^anOrderedCollection asSet

Often runs much faster.  But your mileage will vary!

Regards,

Randy



Mon, 01 Dec 2003 21:00:44 GMT  
 Collection of collections, how to collect?
#values is a method sent to dictionaries so that only the value (an
EfiContract) is returned rather than an Association.

--
.tom



Mon, 01 Dec 2003 20:56:12 GMT  
 Collection of collections, how to collect?

Quote:

> I have an object that represents the relationship between two parties called
> EfiTreaty.  Any single party can have multiple of them.  Each EfiTreaty
> contains within it a collection of contracts between two parties.  Not
> surprisingly, they're called EfiContract(s).

> When I need to look at all of a party's (EfiParty) contracts I do something
> like:

> contracts
>         ^self treaties inject: (Set new) into: [ :sum :each | sum addAll:
> (each contracts values); yourself ]

> but it seems really slow (no I haven't profiled it yet, it runs in Gemstone).
> I suspect this may be a less than efficient way to create a collection of all
> the treaty's contracts.  To make matters worse, each EfiTreaty has three other
> collections just like contracts that I need to do the same thing on.

> For background, "self treaties" returns a collection of an EfiParty's
> EfiTreatys.

Why not let a party know what his contracts are? Then

EfiParty>>contracts
        ^contracts

EfiParty>>addTreaty: aTreaty
        treaties add: aTreaty.
        contracts addAll: aTreaty contracts

EfiParty>>removeTreaty: aTreaty
        treaties remove: aTreaty
        self recomputeContracts "this becomes your method above"

This is basically a caching mechanism. Without being more familiar with
your app, I can't give any more concrete suggestions. Questions that
come to mind:

What is the typical number of treaties per party?
What is the typical number of contracts per treaty?
To what degree do the contracts of one treaty intersect with another?
What's more likely, accessing contracts or adding/removing treaties?

--
Travis Griggs (a.k.a. Lord of the Fries)
Member: 3rd Boolean State Software Collective
Key Technology
"It had better be a pretty good meeting to be better than no meeting at
all"-- Boyd K. Packer



Mon, 01 Dec 2003 23:40:22 GMT  
 Collection of collections, how to collect?
here's the optimal GemStone code IMO, with several comments after.

contracts

    | return |
    return := IdentitySet new.
    1 to: self treaties size do:
        [:indx |
        (self treaties at: indx) contracts keysAndValuesDo:
            [:key :value |
            return add: value]].
    ^return

Here are my comments.

1. GemStone code has to be efficient.  One must sacrifice object 'purity' for
server code,
because performance is paramount in server code.
2. IdentitySet is more efficient than Set, but it may not be appropriate in your
case.  Only
you know the semantics of your model and can decide.
3. #to:do: is more efficient than #do:, and much more so than #inject:into:.  (see
comment 1)
4. The code above minimize temporary object creation (see the implementation of
#values
for instance).
5. I realize that encapsulation is broken, this goes back to my comment 1, one
then decides
on the implementation.
6. You're right that maintaining a separate collection can be a good design
decision.

Normand

Quote:

> I have an object that represents the relationship between two parties called
> EfiTreaty.  Any single party can have multiple of them.  Each EfiTreaty
> contains within it a collection of contracts between two parties.  Not
> surprisingly, they're called EfiContract(s).

> When I need to look at all of a party's (EfiParty) contracts I do something
> like:

> contracts
>         ^self treaties inject: (Set new) into: [ :sum :each | sum addAll:
> (each contracts values); yourself ]

> but it seems really slow (no I haven't profiled it yet, it runs in Gemstone).
> I suspect this may be a less than efficient way to create a collection of all
> the treaty's contracts.  To make matters worse, each EfiTreaty has three other
> collections just like contracts that I need to do the same thing on.

> For background, "self treaties" returns a collection of an EfiParty's
> EfiTreatys.

> --
> .tom



Mon, 01 Dec 2003 23:55:05 GMT  
 Collection of collections, how to collect?
I think you need to re-arrange the way you navigate your objects.
The way you've got it now, if I understand correctly, the contracts
do not know the parties involved. So, you always have to traverse
the treaty to the contract, which will forever limit the speed.

Besides, I think you have a more serious problem with the design
if you only have one treaty collection in the party objects.
The reason is that you'll not be able to distinguish which role
a party plays in a contract -- are they the "party of the first part"
or the "party of the second part"? You need two collections, one
for the treaties/contracts for which the party is the "party of
the first part", and the other for the "second part". Then you have
to keep the collections in-sync. If you really need to have these
collections around (for performance?) then that's what you've got
to do. But, it's much simpler to have the treaty/contract objects
hold the two parties involved, and use #select: over the collection
of all treaty/contract objects to find those involving the party.

Think of it this way. The contract is a many-to-many relationship
between two parties. The normal way to handle a many-to-many is
to use a cross-reference table -- that's what the contract object is,
but with additional attributes.
I think if you had a many-to-many between objectA and objectB, you
may have seen the need for the cross-reference object, but the fact
that objectA and objectB are both party objects may have got you
down this path. I scratched my head on this stuff a few years back.

--yanni
Zhao Technology Inc.

Quote:

> I have an object that represents the relationship between two parties called
> EfiTreaty.  Any single party can have multiple of them.  Each EfiTreaty
> contains within it a collection of contracts between two parties.  Not
> surprisingly, they're called EfiContract(s).

> When I need to look at all of a party's (EfiParty) contracts I do something
> like:

> contracts
>         ^self treaties inject: (Set new) into: [ :sum :each | sum addAll:
> (each contracts values); yourself ]

> but it seems really slow (no I haven't profiled it yet, it runs in Gemstone).
> I suspect this may be a less than efficient way to create a collection of all
> the treaty's contracts.  To make matters worse, each EfiTreaty has three other
> collections just like contracts that I need to do the same thing on.

> For background, "self treaties" returns a collection of an EfiParty's
> EfiTreatys.

> --
> .tom



Tue, 02 Dec 2003 00:24:37 GMT  
 Collection of collections, how to collect?
Not sure if an IdentitySet is required. Filtering duplicates can be
expensive, especially if you know ahead of time that there won't be any :-)

It might make sense to make two passes if the subcollections are big:

1. totalSize := treaties inject: 0 into: [:total :subcollection | total +
subcollection size ].
2. return := Array new: totalSize.

The second step would pregrow the result collection to avoid reallocation
overhead.

Steve


Quote:
> here's the optimal GemStone code IMO, with several comments after.

> contracts

>     | return |
>     return := IdentitySet new.
>     1 to: self treaties size do:
>         [:indx |
>         (self treaties at: indx) contracts keysAndValuesDo:
>             [:key :value |
>             return add: value]].
>     ^return



Tue, 02 Dec 2003 01:23:46 GMT  
 Collection of collections, how to collect?
I suggested IdentitySet as the original code was using a Set, thus filtering
duplicates based on equality.  An IdentitySet is much more efficient, and is
usually appropriate as equality if not redefined is identity based.

Growth of IdentitySets is handled very well by GemStone, so no need to
pre-grow.

Normand

Quote:

> Not sure if an IdentitySet is required. Filtering duplicates can be
> expensive, especially if you know ahead of time that there won't be any :-)

> It might make sense to make two passes if the subcollections are big:

> 1. totalSize := treaties inject: 0 into: [:total :subcollection | total +
> subcollection size ].
> 2. return := Array new: totalSize.

> The second step would pregrow the result collection to avoid reallocation
> overhead.

> Steve



> > here's the optimal GemStone code IMO, with several comments after.

> > contracts

> >     | return |
> >     return := IdentitySet new.
> >     1 to: self treaties size do:
> >         [:indx |
> >         (self treaties at: indx) contracts keysAndValuesDo:
> >             [:key :value |
> >             return add: value]].
> >     ^return



Tue, 02 Dec 2003 03:15:43 GMT  
 Collection of collections, how to collect?

Quote:


> > I have an object that represents the relationship between two parties called
> > EfiTreaty.  Any single party can have multiple of them.  Each EfiTreaty
> > contains within it a collection of contracts between two parties.  Not
> > surprisingly, they're called EfiContract(s).

> > When I need to look at all of a party's (EfiParty) contracts I do something
> > like:

> > contracts
> >         ^self treaties inject: (Set new) into: [ :sum :each | sum addAll:
> > (each contracts values); yourself ]

> > but it seems really slow (no I haven't profiled it yet, it runs in Gemstone).
> > I suspect this may be a less than efficient way to create a collection of all
> > the treaty's contracts.  To make matters worse, each EfiTreaty has three other
> > collections just like contracts that I need to do the same thing on.

> > For background, "self treaties" returns a collection of an EfiParty's
> > EfiTreatys.

> Why not let a party know what his contracts are? Then

> EfiParty>>contracts
>         ^contracts

That's the question, I suppose.  A party's contracts are an aggregation of the
contracts across all the treaties.  As I think about it, maybe I shouldn't be
trying to create a single collection of all contracts across the treaties, since I
rarely want just that.  What I really want is to do something with each of them
which maybe (should) be implemented as a message to each of the treaties to iterate
over their own set of contracts.

Quote:
> EfiParty>>addTreaty: aTreaty
>         treaties add: aTreaty.
>         contracts addAll: aTreaty contracts

> EfiParty>>removeTreaty: aTreaty
>         treaties remove: aTreaty
>         self recomputeContracts "this becomes your method above"

> This is basically a caching mechanism. Without being more familiar with
> your app, I can't give any more concrete suggestions. Questions that
> come to mind:

> What is the typical number of treaties per party?

For a large customer 2-3000.

Quote:
> What is the typical number of contracts per treaty?

They accumulate at the rate of about 1-3/day

Quote:
> To what degree do the contracts of one treaty intersect with another?

They intersect only when one of the parties wants to examine all the contracts
*they* participate in.

Quote:
> What's more likely, accessing contracts or adding/removing treaties?

accessing contracts.  Treaties would never be removed, though they may eventually
be archived if the relationship between them is terminated.

Quote:
> --
> Travis Griggs (a.k.a. Lord of the Fries)
> Member: 3rd Boolean State Software Collective
> Key Technology
> "It had better be a pretty good meeting to be better than no meeting at
> all"-- Boyd K. Packer

--
.tom


Tue, 02 Dec 2003 11:07:01 GMT  
 Collection of collections, how to collect?
I know that none of them will be duplicates.  Maybe all I need is a Bag then
for this purpose?  An OrderedCollection, perhaps?

I did make two passes at it.  The first to discover the size of the contracts
dictionaries so I could preallocate the new collection.  Then I iterated
through the contracts adding them to the new collection.  This ran *much*
faster.  I'll try the bag idea next...

--
.tom



Tue, 02 Dec 2003 11:10:26 GMT  
 Collection of collections, how to collect?

Quote:

> I think you need to re-arrange the way you navigate your objects.
> The way you've got it now, if I understand correctly, the contracts
> do not know the parties involved. So, you always have to traverse
> the treaty to the contract, which will forever limit the speed.

Each contract has #buyer and #seller methods which return the parties involved.
It's redundant information since I can get that information from the treaty, but
they are created independent of the treaty then added to it later.

Quote:
> Besides, I think you have a more serious problem with the design
> if you only have one treaty collection in the party objects.
> The reason is that you'll not be able to distinguish which role
> a party plays in a contract -- are they the "party of the first part"
> or the "party of the second part"? You need two collections, one
> for the treaties/contracts for which the party is the "party of
> the first part", and the other for the "second part". Then you have
> to keep the collections in-sync. If you really need to have these
> collections around (for performance?) then that's what you've got
> to do. But, it's much simpler to have the treaty/contract objects
> hold the two parties involved, and use #select: over the collection
> of all treaty/contract objects to find those involving the party.

Very perceptive.  In reality, the parties are actually role objects.  Supposing
ACME was an EfiParty, there may be several roles it plays in treaties.  In these
cases, there's EfiBuyer, EfiSeller, EfiFunder, EfiBroker....  The treaties are are
the relationship between these roles.  So between anEfiBuyer and anEfiSeller, there
would exist anEfTreaty containing all the documents that have been exchanged
between the two.

Quote:
> Think of it this way. The contract is a many-to-many relationship
> between two parties. The normal way to handle a many-to-many is
> to use a cross-reference table -- that's what the contract object is,
> but with additional attributes.
> I think if you had a many-to-many between objectA and objectB, you
> may have seen the need for the cross-reference object, but the fact
> that objectA and objectB are both party objects may have got you
> down this path. I scratched my head on this stuff a few years back.

> --yanni
> Zhao Technology Inc.


> > I have an object that represents the relationship between two parties called
> > EfiTreaty.  Any single party can have multiple of them.  Each EfiTreaty
> > contains within it a collection of contracts between two parties.  Not
> > surprisingly, they're called EfiContract(s).

> > When I need to look at all of a party's (EfiParty) contracts I do something
> > like:

> > contracts
> >         ^self treaties inject: (Set new) into: [ :sum :each | sum addAll:
> > (each contracts values); yourself ]

> > but it seems really slow (no I haven't profiled it yet, it runs in Gemstone).
> > I suspect this may be a less than efficient way to create a collection of all
> > the treaty's contracts.  To make matters worse, each EfiTreaty has three other
> > collections just like contracts that I need to do the same thing on.

> > For background, "self treaties" returns a collection of an EfiParty's
> > EfiTreatys.

> > --
> > .tom

--
.tom


Tue, 02 Dec 2003 11:15:34 GMT  
 Collection of collections, how to collect?
Quote:


> > I think you need to re-arrange the way you navigate your objects.
> > The way you've got it now, if I understand correctly, the contracts
> > do not know the parties involved. So, you always have to traverse
> > the treaty to the contract, which will forever limit the speed.

> Each contract has #buyer and #seller methods which return the parties involved.
> It's redundant information since I can get that information from the treaty, but
> they are created independent of the treaty then added to it later.

[stuff deleted]

IMO, this collection of collections approach is a bad idea because
my gut tells me they're going to get out of sync at some point.
Since the contracts point to the buyer and seller, there are
now two ways that *define* the buyer and seller: by navigating
the collections, and by the buyer/seller instance variable.
All that has to happen is that the buyer/seller gets set,
without the corresponding changes being made to the collections,
and you've got database integrity problems. You may get it
coded correctly now, but down the road, who knows what might
happen. The developer needs to know too much to get it right.
For example, what if the buyer/seller change, and the collection
modifications were done in the course of separate transactions.
Then you'll have a tough time tracking down the cases where the
corruption occurs. So, violating "Say it once, and only once"
is going to lead to trouble.

I'd started on the following suggestion for my previous post,
but found I didn't know enough about your applicatiion (and I still
don't). But here it is anyway.

Add a class variable to your EfiContract class, called AllContracts,
and initialize it to an appropriate collection instance (Set or Bag).
Ensure that all contracts get added to AllContracts, by modifying #new
(or some other "constructor"). Now, you can simply query the AllContracts
collection to select the contracts based on the buyer/seller party.
To get acceptable performance, you have to avoid iterating through
all the contract objects. Look in the chapter called
"Indexed Associative Access" in the GemStone Programming Guide
to see how to do it. I was looking at an old copy of the doc's,
but hopefully the latest doc. still has the chapter.

--yanni
Zhao Technology Inc.



Tue, 02 Dec 2003 23:22:12 GMT  
 
 [ 13 post ] 

 Relevant Pages 

1. collection inside a collection: how to selectively collect?

2. ways to collect data from two collections of the same size

3. VAST: BIG collection

4. extracting sql query result from ordered collection

5. Creating collections - dynamic list operator {...}

6. Creating collections - dynamic list operator - braces

7. Ordered Collection doesn't grow.

8. Making a sorted collection to a SortedCollection

9. How do I create a List Model for my custom collection class

10. ListModel or Ordered Collection

11. Collections/Dictionaries

 

 
Powered by phpBB® Forum Software