JOIN Optimization
Author Message
JOIN Optimization

I think it must be too late in the day.  I have a query that was running in
about 9 seconds this morning.  As far as I can tell, running the same query
on similar (but not exactly the same) data now takes minutes to run.  To
create the same outcome with SCANing through the table takes moments.  Can
anyone see the stupid mistake I must be making???

The final result will be more in depth, but for debugging and example
purposes I kept taking out fields and result calculations to narrow it down
and still see the problem with the fairly bare-bones example below.  I'm
matching the StartNum to the RCR record that has First8Lo <= StartNum <=
First8Hi.

* "RCR" Definition (First8Lo N (8), First8Hi N (8), LocationID I)
* This table has roughly 100,000 records and basically defines
non-overlapping
* ranges from First8Lo to First8Hi that collectively cover a of values
* from 11111111 to 99999999.
* There are regurlar indexes on First8Lo and First8Hi
*
* "Recs1" Definition (CallDate C (8), StartNum N (8))
* This table has 5000 records with various char representations of a date
* and StartNum values that are within the range of 11111111-99999999.
* There is a regular index on StartNum.

* This query now takes 9 minutes to produce a table of results:
SELE CallDate, LocationID ;
FROM Recs1 LEFT JOIN RCR ON ;
Recs1.StartNum BETWEEN RCR.First8Lo AND RCR.First8Hi ;
INTO TABLE Temp

* The same results are produced in .19 seconds from the following code
SET NEAR ON
SELECT Recs1
COPY FIELD CallDate NEXT 0 TO Temp
USE Temp EXCLUSIVE IN 0
ALTER TABLE Temp ADD COLUMN LocationID I NULL
SET ORDER TO First8Hi IN RCR
SCAN
SCATTER FIELD CallDate MEMVAR
=SEEK( Recs1.StartNum, "RCF" )
IF BETWEEN( Recs1.StartNum, RCR.First8Lo, RCR.First8Hi )
m.LocationID = RCR.LocationID
ELSE
m.LocationID = .NULL.
ENDIF
INSERT INTO Temp FROM MEMVAR
ENDSCAN

Am I missing something???

TIA,

John

Mon, 23 Jun 2003 07:28:58 GMT
JOIN Optimization
John,

Joining on BETWEEN is not the usual way to do things.
and what is not.

Try using

FROM Recs1 LEFT JOIN RCR ON ;
Recs1.StartNum >= BETWEEN RCR.First8Lo AND ;
Recs1.StartNum <= RCR.First8Hi ;

When it was running in 9 seconds were you using BETWEEN? I bet you were
using <=, >=, or =.

--

Cindy Winegarden

Microsoft Certified Professional, Visual FoxPro
Microsoft Visual FoxPro MVP

| I think it must be too late in the day.  I have a query that was running
in
| about 9 seconds this morning.  As far as I can tell, running the same
query
| on similar (but not exactly the same) data now takes minutes to run.  To
| create the same outcome with SCANing through the table takes moments.  Can
| anyone see the stupid mistake I must be making???
|
| The final result will be more in depth, but for debugging and example
| purposes I kept taking out fields and result calculations to narrow it
down
| and still see the problem with the fairly bare-bones example below.  I'm
| matching the StartNum to the RCR record that has First8Lo <= StartNum <=
| First8Hi.
|
| * "RCR" Definition (First8Lo N (8), First8Hi N (8), LocationID I)
| * This table has roughly 100,000 records and basically defines
| non-overlapping
| * ranges from First8Lo to First8Hi that collectively cover a of values
| * from 11111111 to 99999999.
| * There are regurlar indexes on First8Lo and First8Hi
| *
| * "Recs1" Definition (CallDate C (8), StartNum N (8))
| * This table has 5000 records with various char representations of a date
| * and StartNum values that are within the range of 11111111-99999999.
| * There is a regular index on StartNum.
|
| * This query now takes 9 minutes to produce a table of results:
| SELE CallDate, LocationID ;
|  FROM Recs1 LEFT JOIN RCR ON ;
|   Recs1.StartNum BETWEEN RCR.First8Lo AND RCR.First8Hi ;
|  INTO TABLE Temp
|
|
| * The same results are produced in .19 seconds from the following code
| SET NEAR ON
| SELECT Recs1
| COPY FIELD CallDate NEXT 0 TO Temp
| USE Temp EXCLUSIVE IN 0
| ALTER TABLE Temp ADD COLUMN LocationID I NULL
| SET ORDER TO First8Hi IN RCR
| SCAN
|    SCATTER FIELD CallDate MEMVAR
|    =SEEK( Recs1.StartNum, "RCF" )
|    IF BETWEEN( Recs1.StartNum, RCR.First8Lo, RCR.First8Hi )
|       m.LocationID = RCR.LocationID
|    ELSE
|       m.LocationID = .NULL.
|    ENDIF
|    INSERT INTO Temp FROM MEMVAR
| ENDSCAN
|
| Am I missing something???
|
| TIA,
|
| John
|
|

Mon, 23 Jun 2003 09:09:04 GMT
JOIN Optimization
Hi
BETWEEN is optimizable:
Quote:
<<A basic optimizable expression has the following characteristics:
a.. eIndex exactly matches the expression on which an index is
constructed.
b.. eExpr is any expression and can include variables and fields from
other unrelated tables.
c.. relOp is one of the following relational operators: <, >, =, <=, >=,
<>, #, ==, or !=. You can also use the ISNULL( ), BETWEEN( ), or INLIST( )
functions (or their SQL equivalents such as IS NULL, and so on).
You can use BETWEEN( ) or INLIST( ) in the following two forms:

eIndex BETWEEN(eIndex, eExpr, eExpr)

-or-

eExpr INLIST(eIndex, eExpr) >>

-Anders

Quote:
> John,

> Joining on BETWEEN is not the usual way to do things.
> and what is not.

> Try using

> FROM Recs1 LEFT JOIN RCR ON ;
>  Recs1.StartNum >= BETWEEN RCR.First8Lo AND ;
> Recs1.StartNum <= RCR.First8Hi ;

> When it was running in 9 seconds were you using BETWEEN? I bet you were
> using <=, >=, or =.

> --

> Cindy Winegarden

> Microsoft Certified Professional, Visual FoxPro
> Microsoft Visual FoxPro MVP

> | I think it must be too late in the day.  I have a query that was running
> in
> | about 9 seconds this morning.  As far as I can tell, running the same
> query
> | on similar (but not exactly the same) data now takes minutes to run.  To
> | create the same outcome with SCANing through the table takes moments.
Can
> | anyone see the stupid mistake I must be making???
> |
> | The final result will be more in depth, but for debugging and example
> | purposes I kept taking out fields and result calculations to narrow it
> down
> | and still see the problem with the fairly bare-bones example below.  I'm
> | matching the StartNum to the RCR record that has First8Lo <= StartNum <=
> | First8Hi.
> |
> | * "RCR" Definition (First8Lo N (8), First8Hi N (8), LocationID I)
> | * This table has roughly 100,000 records and basically defines
> | non-overlapping
> | * ranges from First8Lo to First8Hi that collectively cover a of values
> | * from 11111111 to 99999999.
> | * There are regurlar indexes on First8Lo and First8Hi
> | *
> | * "Recs1" Definition (CallDate C (8), StartNum N (8))
> | * This table has 5000 records with various char representations of a
date
> | * and StartNum values that are within the range of 11111111-99999999.
> | * There is a regular index on StartNum.
> |
> | * This query now takes 9 minutes to produce a table of results:
> | SELE CallDate, LocationID ;
> |  FROM Recs1 LEFT JOIN RCR ON ;
> |   Recs1.StartNum BETWEEN RCR.First8Lo AND RCR.First8Hi ;
> |  INTO TABLE Temp
> |
> |
> | * The same results are produced in .19 seconds from the following code
> | SET NEAR ON
> | SELECT Recs1
> | COPY FIELD CallDate NEXT 0 TO Temp
> | USE Temp EXCLUSIVE IN 0
> | ALTER TABLE Temp ADD COLUMN LocationID I NULL
> | SET ORDER TO First8Hi IN RCR
> | SCAN
> |    SCATTER FIELD CallDate MEMVAR
> |    =SEEK( Recs1.StartNum, "RCF" )
> |    IF BETWEEN( Recs1.StartNum, RCR.First8Lo, RCR.First8Hi )
> |       m.LocationID = RCR.LocationID
> |    ELSE
> |       m.LocationID = .NULL.
> |    ENDIF
> |    INSERT INTO Temp FROM MEMVAR
> | ENDSCAN
> |
> | Am I missing something???
> |
> | TIA,
> |
> | John
> |
> |

Mon, 23 Jun 2003 10:08:40 GMT
JOIN Optimization
Anders,

Thanks for the correction.

Why, then, did he go from 9 seconds to several minutes?

--

Cindy Winegarden

Microsoft Certified Professional, Visual FoxPro
Microsoft Visual FoxPro MVP

| Hi
| BETWEEN is optimizable:
| Quote:
| <<A basic optimizable expression has the following characteristics:
|   a.. eIndex exactly matches the expression on which an index is
| constructed.
|   b.. eExpr is any expression and can include variables and fields from
| other unrelated tables.
|   c.. relOp is one of the following relational operators: <, >, =, <=, >=,
| <>, #, ==, or !=. You can also use the ISNULL( ), BETWEEN( ), or INLIST( )
| functions (or their SQL equivalents such as IS NULL, and so on).
| You can use BETWEEN( ) or INLIST( ) in the following two forms:
|
| eIndex BETWEEN(eIndex, eExpr, eExpr)
|
| -or-
|
| eExpr INLIST(eIndex, eExpr) >>
|
|
| -Anders
|
|

| > John,
| >
| > Joining on BETWEEN is not the usual way to do things.
optimizable
| > and what is not.
| >
| > Try using
| >
| > FROM Recs1 LEFT JOIN RCR ON ;
| >  Recs1.StartNum >= BETWEEN RCR.First8Lo AND ;
| > Recs1.StartNum <= RCR.First8Hi ;
| >
| > When it was running in 9 seconds were you using BETWEEN? I bet you were
| > using <=, >=, or =.
| >
| >
| > --
| >
| >
| > Cindy Winegarden
| >
| > Microsoft Certified Professional, Visual FoxPro
| > Microsoft Visual FoxPro MVP
| >

| >
| >
| >

| > | I think it must be too late in the day.  I have a query that was
running
| > in
| > | about 9 seconds this morning.  As far as I can tell, running the same
| > query
| > | on similar (but not exactly the same) data now takes minutes to run.
To
| > | create the same outcome with SCANing through the table takes moments.
| Can
| > | anyone see the stupid mistake I must be making???
| > |
| > | The final result will be more in depth, but for debugging and example
| > | purposes I kept taking out fields and result calculations to narrow it
| > down
| > | and still see the problem with the fairly bare-bones example below.
I'm
| > | matching the StartNum to the RCR record that has First8Lo <= StartNum
<=
| > | First8Hi.
| > |
| > | * "RCR" Definition (First8Lo N (8), First8Hi N (8), LocationID I)
| > | * This table has roughly 100,000 records and basically defines
| > | non-overlapping
| > | * ranges from First8Lo to First8Hi that collectively cover a of values
| > | * from 11111111 to 99999999.
| > | * There are regurlar indexes on First8Lo and First8Hi
| > | *
| > | * "Recs1" Definition (CallDate C (8), StartNum N (8))
| > | * This table has 5000 records with various char representations of a
| date
| > | * and StartNum values that are within the range of 11111111-99999999.
| > | * There is a regular index on StartNum.
| > |
| > | * This query now takes 9 minutes to produce a table of results:
| > | SELE CallDate, LocationID ;
| > |  FROM Recs1 LEFT JOIN RCR ON ;
| > |   Recs1.StartNum BETWEEN RCR.First8Lo AND RCR.First8Hi ;
| > |  INTO TABLE Temp
| > |
| > |
| > | * The same results are produced in .19 seconds from the following code
| > | SET NEAR ON
| > | SELECT Recs1
| > | COPY FIELD CallDate NEXT 0 TO Temp
| > | USE Temp EXCLUSIVE IN 0
| > | ALTER TABLE Temp ADD COLUMN LocationID I NULL
| > | SET ORDER TO First8Hi IN RCR
| > | SCAN
| > |    SCATTER FIELD CallDate MEMVAR
| > |    =SEEK( Recs1.StartNum, "RCF" )
| > |    IF BETWEEN( Recs1.StartNum, RCR.First8Lo, RCR.First8Hi )
| > |       m.LocationID = RCR.LocationID
| > |    ELSE
| > |       m.LocationID = .NULL.
| > |    ENDIF
| > |    INSERT INTO Temp FROM MEMVAR
| > | ENDSCAN
| > |
| > | Am I missing something???
| > |
| > | TIA,
| > |
| > | John
| > |
| > |
| >
| >
|
|

Mon, 23 Jun 2003 17:43:25 GMT
JOIN Optimization

Quote:
> I think it must be too late in the day.  I have a query that was running
in
> about 9 seconds this morning.  As far as I can tell, running the same
query
> on similar (but not exactly the same) data now takes minutes to run.  To
> create the same outcome with SCANing through the table takes moments.  Can
> anyone see the stupid mistake I must be making???

> The final result will be more in depth, but for debugging and example
> purposes I kept taking out fields and result calculations to narrow it
down
> and still see the problem with the fairly bare-bones example below.  I'm
> matching the StartNum to the RCR record that has First8Lo <= StartNum <=
> First8Hi.

> * "RCR" Definition (First8Lo N (8), First8Hi N (8), LocationID I)
> * This table has roughly 100,000 records and basically defines
> non-overlapping
> * ranges from First8Lo to First8Hi that collectively cover a of values
> * from 11111111 to 99999999.
> * There are regurlar indexes on First8Lo and First8Hi
> *
> * "Recs1" Definition (CallDate C (8), StartNum N (8))
> * This table has 5000 records with various char representations of a date
> * and StartNum values that are within the range of 11111111-99999999.
> * There is a regular index on StartNum.

> * This query now takes 9 minutes to produce a table of results:
> SELE CallDate, LocationID ;
>  FROM Recs1 LEFT JOIN RCR ON ;
>   Recs1.StartNum BETWEEN RCR.First8Lo AND RCR.First8Hi ;
>  INTO TABLE Temp

> * The same results are produced in .19 seconds from the following code
> SET NEAR ON
> SELECT Recs1
> COPY FIELD CallDate NEXT 0 TO Temp
> USE Temp EXCLUSIVE IN 0
> ALTER TABLE Temp ADD COLUMN LocationID I NULL
> SET ORDER TO First8Hi IN RCR
> SCAN
>    SCATTER FIELD CallDate MEMVAR
>    =SEEK( Recs1.StartNum, "RCF" )
>    IF BETWEEN( Recs1.StartNum, RCR.First8Lo, RCR.First8Hi )
>       m.LocationID = RCR.LocationID
>    ELSE
>       m.LocationID = .NULL.
>    ENDIF
>    INSERT INTO Temp FROM MEMVAR
> ENDSCAN

> Am I missing something???

Why INSERT? That should be GATHER MEMVAR or GATHER MEMVAR FIELDS locationid,
shouldn't it?
Or insert into a cursor Temp.
This routine and your query are not quite equivalent. The xbase code will
always create exactly the 5000 rows there are in Recs1, whether there are
one or more rows in Rcr that match, while the query will create as many rows
as it finds matches. That could in principle be 5000 x 100000 rows.

-Anders

Mon, 23 Jun 2003 22:44:04 GMT
JOIN Optimization
Hi Cindy,

Thanks.  I had tried that too with the same results.  As this was pretty
much for a one-time utility this is more of a philosophical question, but
still confuses me!

- John

Quote:
> John,

> Joining on BETWEEN is not the usual way to do things.
> and what is not.

> Try using

> FROM Recs1 LEFT JOIN RCR ON ;
>  Recs1.StartNum >= BETWEEN RCR.First8Lo AND ;
> Recs1.StartNum <= RCR.First8Hi ;

> When it was running in 9 seconds were you using BETWEEN? I bet you were
> using <=, >=, or =.

> --

> Cindy Winegarden

> Microsoft Certified Professional, Visual FoxPro
> Microsoft Visual FoxPro MVP

> | I think it must be too late in the day.  I have a query that was running
> in
> | about 9 seconds this morning.  As far as I can tell, running the same
> query
> | on similar (but not exactly the same) data now takes minutes to run.  To
> | create the same outcome with SCANing through the table takes moments.
Can
> | anyone see the stupid mistake I must be making???
> |
> | The final result will be more in depth, but for debugging and example
> | purposes I kept taking out fields and result calculations to narrow it
> down
> | and still see the problem with the fairly bare-bones example below.  I'm
> | matching the StartNum to the RCR record that has First8Lo <= StartNum <=
> | First8Hi.
> |
> | * "RCR" Definition (First8Lo N (8), First8Hi N (8), LocationID I)
> | * This table has roughly 100,000 records and basically defines
> | non-overlapping
> | * ranges from First8Lo to First8Hi that collectively cover a of values
> | * from 11111111 to 99999999.
> | * There are regurlar indexes on First8Lo and First8Hi
> | *
> | * "Recs1" Definition (CallDate C (8), StartNum N (8))
> | * This table has 5000 records with various char representations of a
date
> | * and StartNum values that are within the range of 11111111-99999999.
> | * There is a regular index on StartNum.
> |
> | * This query now takes 9 minutes to produce a table of results:
> | SELE CallDate, LocationID ;
> |  FROM Recs1 LEFT JOIN RCR ON ;
> |   Recs1.StartNum BETWEEN RCR.First8Lo AND RCR.First8Hi ;
> |  INTO TABLE Temp
> |
> |
> | * The same results are produced in .19 seconds from the following code
> | SET NEAR ON
> | SELECT Recs1
> | COPY FIELD CallDate NEXT 0 TO Temp
> | USE Temp EXCLUSIVE IN 0
> | ALTER TABLE Temp ADD COLUMN LocationID I NULL
> | SET ORDER TO First8Hi IN RCR
> | SCAN
> |    SCATTER FIELD CallDate MEMVAR
> |    =SEEK( Recs1.StartNum, "RCF" )
> |    IF BETWEEN( Recs1.StartNum, RCR.First8Lo, RCR.First8Hi )
> |       m.LocationID = RCR.LocationID
> |    ELSE
> |       m.LocationID = .NULL.
> |    ENDIF
> |    INSERT INTO Temp FROM MEMVAR
> | ENDSCAN
> |
> | Am I missing something???
> |
> | TIA,
> |
> | John
> |
> |

Mon, 23 Jun 2003 23:34:38 GMT
JOIN Optimization
Hi Anders,

Thanks, and my apologies for some sloppiness in my terminology on the
original post...

Quote:
> Why INSERT? That should be GATHER MEMVAR or GATHER MEMVAR FIELDS
locationid,
> shouldn't it?
> > COPY FIELD CallDate NEXT 0 TO Temp

Temp starts out blank so the INSERT is doing the job of adding and filling
the record.

Quote:
> This routine and your query are not quite equivalent. The xbase code will
> always create exactly the 5000 rows there are in Recs1, whether there are
> one or more rows in Rcr that match, while the query will create as many
rows
> as it finds matches. That could in principle be 5000 x 100000 rows.

Agreed.  I apologize for overstating equivalency, (the xbase was
quick-and-dirty as this specific case is a one-time need).  I said they were
equivalent based on knowing some aspects of the data having no
duplication/overlap in the ranges--producing at most one match per record in
same code comparison on different datasets holds no guarantee to produce the
same results.

(Please note I don't generally rely on clean data.  This is one of those
rare times I can count on the data being as expected, this isn't going into
production, and I'll not have to deal with pesky users somehow messing up my
perfect data <g>.)

Thanks,

John

Mon, 23 Jun 2003 23:50:49 GMT
JOIN Optimization

Quote:
> Hi Anders,

> Thanks, and my apologies for some sloppiness in my terminology on the
> original post...

> > Why INSERT? That should be GATHER MEMVAR or GATHER MEMVAR FIELDS
> locationid,
> > shouldn't it?

> > > COPY FIELD CallDate NEXT 0 TO Temp

Sorry. I didn't notice the NEXT 0 scope.

-Anders

Tue, 24 Jun 2003 01:05:50 GMT

 Page 1 of 1 [ 8 post ]

Relevant Pages