JOIN Optimization 
Author Message
 JOIN Optimization

I think it must be too late in the day.  I have a query that was running in
about 9 seconds this morning.  As far as I can tell, running the same query
on similar (but not exactly the same) data now takes minutes to run.  To
create the same outcome with SCANing through the table takes moments.  Can
anyone see the stupid mistake I must be making???

The final result will be more in depth, but for debugging and example
purposes I kept taking out fields and result calculations to narrow it down
and still see the problem with the fairly bare-bones example below.  I'm
matching the StartNum to the RCR record that has First8Lo <= StartNum <=
First8Hi.

* "RCR" Definition (First8Lo N (8), First8Hi N (8), LocationID I)
* This table has roughly 100,000 records and basically defines
non-overlapping
* ranges from First8Lo to First8Hi that collectively cover a of values
* from 11111111 to 99999999.
* There are regurlar indexes on First8Lo and First8Hi
*
* "Recs1" Definition (CallDate C (8), StartNum N (8))
* This table has 5000 records with various char representations of a date
* and StartNum values that are within the range of 11111111-99999999.
* There is a regular index on StartNum.

* This query now takes 9 minutes to produce a table of results:
SELE CallDate, LocationID ;
 FROM Recs1 LEFT JOIN RCR ON ;
  Recs1.StartNum BETWEEN RCR.First8Lo AND RCR.First8Hi ;
 INTO TABLE Temp

* The same results are produced in .19 seconds from the following code
SET NEAR ON
SELECT Recs1
COPY FIELD CallDate NEXT 0 TO Temp
USE Temp EXCLUSIVE IN 0
ALTER TABLE Temp ADD COLUMN LocationID I NULL
SET ORDER TO First8Hi IN RCR
SCAN
   SCATTER FIELD CallDate MEMVAR
   =SEEK( Recs1.StartNum, "RCF" )
   IF BETWEEN( Recs1.StartNum, RCR.First8Lo, RCR.First8Hi )
      m.LocationID = RCR.LocationID
   ELSE
      m.LocationID = .NULL.
   ENDIF
   INSERT INTO Temp FROM MEMVAR
ENDSCAN

Am I missing something???

TIA,

John



Mon, 23 Jun 2003 07:28:58 GMT  
 JOIN Optimization
John,

Joining on BETWEEN is not the usual way to do things.
Please read Chapter 15 in the Programmer's Guide about what is optimizable
and what is not.

Try using

FROM Recs1 LEFT JOIN RCR ON ;
 Recs1.StartNum >= BETWEEN RCR.First8Lo AND ;
Recs1.StartNum <= RCR.First8Hi ;

When it was running in 9 seconds were you using BETWEEN? I bet you were
using <=, >=, or =.

--

Cindy Winegarden

Microsoft Certified Professional, Visual FoxPro
Microsoft Visual FoxPro MVP



| I think it must be too late in the day.  I have a query that was running
in
| about 9 seconds this morning.  As far as I can tell, running the same
query
| on similar (but not exactly the same) data now takes minutes to run.  To
| create the same outcome with SCANing through the table takes moments.  Can
| anyone see the stupid mistake I must be making???
|
| The final result will be more in depth, but for debugging and example
| purposes I kept taking out fields and result calculations to narrow it
down
| and still see the problem with the fairly bare-bones example below.  I'm
| matching the StartNum to the RCR record that has First8Lo <= StartNum <=
| First8Hi.
|
| * "RCR" Definition (First8Lo N (8), First8Hi N (8), LocationID I)
| * This table has roughly 100,000 records and basically defines
| non-overlapping
| * ranges from First8Lo to First8Hi that collectively cover a of values
| * from 11111111 to 99999999.
| * There are regurlar indexes on First8Lo and First8Hi
| *
| * "Recs1" Definition (CallDate C (8), StartNum N (8))
| * This table has 5000 records with various char representations of a date
| * and StartNum values that are within the range of 11111111-99999999.
| * There is a regular index on StartNum.
|
| * This query now takes 9 minutes to produce a table of results:
| SELE CallDate, LocationID ;
|  FROM Recs1 LEFT JOIN RCR ON ;
|   Recs1.StartNum BETWEEN RCR.First8Lo AND RCR.First8Hi ;
|  INTO TABLE Temp
|
|
| * The same results are produced in .19 seconds from the following code
| SET NEAR ON
| SELECT Recs1
| COPY FIELD CallDate NEXT 0 TO Temp
| USE Temp EXCLUSIVE IN 0
| ALTER TABLE Temp ADD COLUMN LocationID I NULL
| SET ORDER TO First8Hi IN RCR
| SCAN
|    SCATTER FIELD CallDate MEMVAR
|    =SEEK( Recs1.StartNum, "RCF" )
|    IF BETWEEN( Recs1.StartNum, RCR.First8Lo, RCR.First8Hi )
|       m.LocationID = RCR.LocationID
|    ELSE
|       m.LocationID = .NULL.
|    ENDIF
|    INSERT INTO Temp FROM MEMVAR
| ENDSCAN
|
| Am I missing something???
|
| TIA,
|
| John
|
|



Mon, 23 Jun 2003 09:09:04 GMT  
 JOIN Optimization
Hi
BETWEEN is optimizable:
Quote:
<<A basic optimizable expression has the following characteristics:
  a.. eIndex exactly matches the expression on which an index is
constructed.
  b.. eExpr is any expression and can include variables and fields from
other unrelated tables.
  c.. relOp is one of the following relational operators: <, >, =, <=, >=,
<>, #, ==, or !=. You can also use the ISNULL( ), BETWEEN( ), or INLIST( )
functions (or their SQL equivalents such as IS NULL, and so on).
You can use BETWEEN( ) or INLIST( ) in the following two forms:

eIndex BETWEEN(eIndex, eExpr, eExpr)

-or-

eExpr INLIST(eIndex, eExpr) >>

-Anders


Quote:
> John,

> Joining on BETWEEN is not the usual way to do things.
> Please read Chapter 15 in the Programmer's Guide about what is optimizable
> and what is not.

> Try using

> FROM Recs1 LEFT JOIN RCR ON ;
>  Recs1.StartNum >= BETWEEN RCR.First8Lo AND ;
> Recs1.StartNum <= RCR.First8Hi ;

> When it was running in 9 seconds were you using BETWEEN? I bet you were
> using <=, >=, or =.

> --

> Cindy Winegarden

> Microsoft Certified Professional, Visual FoxPro
> Microsoft Visual FoxPro MVP




> | I think it must be too late in the day.  I have a query that was running
> in
> | about 9 seconds this morning.  As far as I can tell, running the same
> query
> | on similar (but not exactly the same) data now takes minutes to run.  To
> | create the same outcome with SCANing through the table takes moments.
Can
> | anyone see the stupid mistake I must be making???
> |
> | The final result will be more in depth, but for debugging and example
> | purposes I kept taking out fields and result calculations to narrow it
> down
> | and still see the problem with the fairly bare-bones example below.  I'm
> | matching the StartNum to the RCR record that has First8Lo <= StartNum <=
> | First8Hi.
> |
> | * "RCR" Definition (First8Lo N (8), First8Hi N (8), LocationID I)
> | * This table has roughly 100,000 records and basically defines
> | non-overlapping
> | * ranges from First8Lo to First8Hi that collectively cover a of values
> | * from 11111111 to 99999999.
> | * There are regurlar indexes on First8Lo and First8Hi
> | *
> | * "Recs1" Definition (CallDate C (8), StartNum N (8))
> | * This table has 5000 records with various char representations of a
date
> | * and StartNum values that are within the range of 11111111-99999999.
> | * There is a regular index on StartNum.
> |
> | * This query now takes 9 minutes to produce a table of results:
> | SELE CallDate, LocationID ;
> |  FROM Recs1 LEFT JOIN RCR ON ;
> |   Recs1.StartNum BETWEEN RCR.First8Lo AND RCR.First8Hi ;
> |  INTO TABLE Temp
> |
> |
> | * The same results are produced in .19 seconds from the following code
> | SET NEAR ON
> | SELECT Recs1
> | COPY FIELD CallDate NEXT 0 TO Temp
> | USE Temp EXCLUSIVE IN 0
> | ALTER TABLE Temp ADD COLUMN LocationID I NULL
> | SET ORDER TO First8Hi IN RCR
> | SCAN
> |    SCATTER FIELD CallDate MEMVAR
> |    =SEEK( Recs1.StartNum, "RCF" )
> |    IF BETWEEN( Recs1.StartNum, RCR.First8Lo, RCR.First8Hi )
> |       m.LocationID = RCR.LocationID
> |    ELSE
> |       m.LocationID = .NULL.
> |    ENDIF
> |    INSERT INTO Temp FROM MEMVAR
> | ENDSCAN
> |
> | Am I missing something???
> |
> | TIA,
> |
> | John
> |
> |



Mon, 23 Jun 2003 10:08:40 GMT  
 JOIN Optimization
Anders,

Thanks for the correction.

Why, then, did he go from 9 seconds to several minutes?

--

Cindy Winegarden

Microsoft Certified Professional, Visual FoxPro
Microsoft Visual FoxPro MVP



| Hi
| BETWEEN is optimizable:
| Quote:
| <<A basic optimizable expression has the following characteristics:
|   a.. eIndex exactly matches the expression on which an index is
| constructed.
|   b.. eExpr is any expression and can include variables and fields from
| other unrelated tables.
|   c.. relOp is one of the following relational operators: <, >, =, <=, >=,
| <>, #, ==, or !=. You can also use the ISNULL( ), BETWEEN( ), or INLIST( )
| functions (or their SQL equivalents such as IS NULL, and so on).
| You can use BETWEEN( ) or INLIST( ) in the following two forms:
|
| eIndex BETWEEN(eIndex, eExpr, eExpr)
|
| -or-
|
| eExpr INLIST(eIndex, eExpr) >>
|
|
| -Anders
|
|


| > John,
| >
| > Joining on BETWEEN is not the usual way to do things.
| > Please read Chapter 15 in the Programmer's Guide about what is
optimizable
| > and what is not.
| >
| > Try using
| >
| > FROM Recs1 LEFT JOIN RCR ON ;
| >  Recs1.StartNum >= BETWEEN RCR.First8Lo AND ;
| > Recs1.StartNum <= RCR.First8Hi ;
| >
| > When it was running in 9 seconds were you using BETWEEN? I bet you were
| > using <=, >=, or =.
| >
| >
| > --
| >
| >
| > Cindy Winegarden
| >
| > Microsoft Certified Professional, Visual FoxPro
| > Microsoft Visual FoxPro MVP
| >

| >
| >
| >


| > | I think it must be too late in the day.  I have a query that was
running
| > in
| > | about 9 seconds this morning.  As far as I can tell, running the same
| > query
| > | on similar (but not exactly the same) data now takes minutes to run.
To
| > | create the same outcome with SCANing through the table takes moments.
| Can
| > | anyone see the stupid mistake I must be making???
| > |
| > | The final result will be more in depth, but for debugging and example
| > | purposes I kept taking out fields and result calculations to narrow it
| > down
| > | and still see the problem with the fairly bare-bones example below.
I'm
| > | matching the StartNum to the RCR record that has First8Lo <= StartNum
<=
| > | First8Hi.
| > |
| > | * "RCR" Definition (First8Lo N (8), First8Hi N (8), LocationID I)
| > | * This table has roughly 100,000 records and basically defines
| > | non-overlapping
| > | * ranges from First8Lo to First8Hi that collectively cover a of values
| > | * from 11111111 to 99999999.
| > | * There are regurlar indexes on First8Lo and First8Hi
| > | *
| > | * "Recs1" Definition (CallDate C (8), StartNum N (8))
| > | * This table has 5000 records with various char representations of a
| date
| > | * and StartNum values that are within the range of 11111111-99999999.
| > | * There is a regular index on StartNum.
| > |
| > | * This query now takes 9 minutes to produce a table of results:
| > | SELE CallDate, LocationID ;
| > |  FROM Recs1 LEFT JOIN RCR ON ;
| > |   Recs1.StartNum BETWEEN RCR.First8Lo AND RCR.First8Hi ;
| > |  INTO TABLE Temp
| > |
| > |
| > | * The same results are produced in .19 seconds from the following code
| > | SET NEAR ON
| > | SELECT Recs1
| > | COPY FIELD CallDate NEXT 0 TO Temp
| > | USE Temp EXCLUSIVE IN 0
| > | ALTER TABLE Temp ADD COLUMN LocationID I NULL
| > | SET ORDER TO First8Hi IN RCR
| > | SCAN
| > |    SCATTER FIELD CallDate MEMVAR
| > |    =SEEK( Recs1.StartNum, "RCF" )
| > |    IF BETWEEN( Recs1.StartNum, RCR.First8Lo, RCR.First8Hi )
| > |       m.LocationID = RCR.LocationID
| > |    ELSE
| > |       m.LocationID = .NULL.
| > |    ENDIF
| > |    INSERT INTO Temp FROM MEMVAR
| > | ENDSCAN
| > |
| > | Am I missing something???
| > |
| > | TIA,
| > |
| > | John
| > |
| > |
| >
| >
|
|



Mon, 23 Jun 2003 17:43:25 GMT  
 JOIN Optimization


Quote:
> I think it must be too late in the day.  I have a query that was running
in
> about 9 seconds this morning.  As far as I can tell, running the same
query
> on similar (but not exactly the same) data now takes minutes to run.  To
> create the same outcome with SCANing through the table takes moments.  Can
> anyone see the stupid mistake I must be making???

> The final result will be more in depth, but for debugging and example
> purposes I kept taking out fields and result calculations to narrow it
down
> and still see the problem with the fairly bare-bones example below.  I'm
> matching the StartNum to the RCR record that has First8Lo <= StartNum <=
> First8Hi.

> * "RCR" Definition (First8Lo N (8), First8Hi N (8), LocationID I)
> * This table has roughly 100,000 records and basically defines
> non-overlapping
> * ranges from First8Lo to First8Hi that collectively cover a of values
> * from 11111111 to 99999999.
> * There are regurlar indexes on First8Lo and First8Hi
> *
> * "Recs1" Definition (CallDate C (8), StartNum N (8))
> * This table has 5000 records with various char representations of a date
> * and StartNum values that are within the range of 11111111-99999999.
> * There is a regular index on StartNum.

> * This query now takes 9 minutes to produce a table of results:
> SELE CallDate, LocationID ;
>  FROM Recs1 LEFT JOIN RCR ON ;
>   Recs1.StartNum BETWEEN RCR.First8Lo AND RCR.First8Hi ;
>  INTO TABLE Temp

> * The same results are produced in .19 seconds from the following code
> SET NEAR ON
> SELECT Recs1
> COPY FIELD CallDate NEXT 0 TO Temp
> USE Temp EXCLUSIVE IN 0
> ALTER TABLE Temp ADD COLUMN LocationID I NULL
> SET ORDER TO First8Hi IN RCR
> SCAN
>    SCATTER FIELD CallDate MEMVAR
>    =SEEK( Recs1.StartNum, "RCF" )
>    IF BETWEEN( Recs1.StartNum, RCR.First8Lo, RCR.First8Hi )
>       m.LocationID = RCR.LocationID
>    ELSE
>       m.LocationID = .NULL.
>    ENDIF
>    INSERT INTO Temp FROM MEMVAR
> ENDSCAN

> Am I missing something???

Why INSERT? That should be GATHER MEMVAR or GATHER MEMVAR FIELDS locationid,
shouldn't it?
Or insert into a cursor Temp.
This routine and your query are not quite equivalent. The xbase code will
always create exactly the 5000 rows there are in Recs1, whether there are
one or more rows in Rcr that match, while the query will create as many rows
as it finds matches. That could in principle be 5000 x 100000 rows.

-Anders



Mon, 23 Jun 2003 22:44:04 GMT  
 JOIN Optimization
Hi Cindy,

Thanks.  I had tried that too with the same results.  As this was pretty
much for a one-time utility this is more of a philosophical question, but
still confuses me!

- John


Quote:
> John,

> Joining on BETWEEN is not the usual way to do things.
> Please read Chapter 15 in the Programmer's Guide about what is optimizable
> and what is not.

> Try using

> FROM Recs1 LEFT JOIN RCR ON ;
>  Recs1.StartNum >= BETWEEN RCR.First8Lo AND ;
> Recs1.StartNum <= RCR.First8Hi ;

> When it was running in 9 seconds were you using BETWEEN? I bet you were
> using <=, >=, or =.

> --

> Cindy Winegarden

> Microsoft Certified Professional, Visual FoxPro
> Microsoft Visual FoxPro MVP




> | I think it must be too late in the day.  I have a query that was running
> in
> | about 9 seconds this morning.  As far as I can tell, running the same
> query
> | on similar (but not exactly the same) data now takes minutes to run.  To
> | create the same outcome with SCANing through the table takes moments.
Can
> | anyone see the stupid mistake I must be making???
> |
> | The final result will be more in depth, but for debugging and example
> | purposes I kept taking out fields and result calculations to narrow it
> down
> | and still see the problem with the fairly bare-bones example below.  I'm
> | matching the StartNum to the RCR record that has First8Lo <= StartNum <=
> | First8Hi.
> |
> | * "RCR" Definition (First8Lo N (8), First8Hi N (8), LocationID I)
> | * This table has roughly 100,000 records and basically defines
> | non-overlapping
> | * ranges from First8Lo to First8Hi that collectively cover a of values
> | * from 11111111 to 99999999.
> | * There are regurlar indexes on First8Lo and First8Hi
> | *
> | * "Recs1" Definition (CallDate C (8), StartNum N (8))
> | * This table has 5000 records with various char representations of a
date
> | * and StartNum values that are within the range of 11111111-99999999.
> | * There is a regular index on StartNum.
> |
> | * This query now takes 9 minutes to produce a table of results:
> | SELE CallDate, LocationID ;
> |  FROM Recs1 LEFT JOIN RCR ON ;
> |   Recs1.StartNum BETWEEN RCR.First8Lo AND RCR.First8Hi ;
> |  INTO TABLE Temp
> |
> |
> | * The same results are produced in .19 seconds from the following code
> | SET NEAR ON
> | SELECT Recs1
> | COPY FIELD CallDate NEXT 0 TO Temp
> | USE Temp EXCLUSIVE IN 0
> | ALTER TABLE Temp ADD COLUMN LocationID I NULL
> | SET ORDER TO First8Hi IN RCR
> | SCAN
> |    SCATTER FIELD CallDate MEMVAR
> |    =SEEK( Recs1.StartNum, "RCF" )
> |    IF BETWEEN( Recs1.StartNum, RCR.First8Lo, RCR.First8Hi )
> |       m.LocationID = RCR.LocationID
> |    ELSE
> |       m.LocationID = .NULL.
> |    ENDIF
> |    INSERT INTO Temp FROM MEMVAR
> | ENDSCAN
> |
> | Am I missing something???
> |
> | TIA,
> |
> | John
> |
> |



Mon, 23 Jun 2003 23:34:38 GMT  
 JOIN Optimization
Hi Anders,

Thanks, and my apologies for some sloppiness in my terminology on the
original post...

Quote:
> Why INSERT? That should be GATHER MEMVAR or GATHER MEMVAR FIELDS
locationid,
> shouldn't it?
> > COPY FIELD CallDate NEXT 0 TO Temp

Temp starts out blank so the INSERT is doing the job of adding and filling
the record.

Quote:
> This routine and your query are not quite equivalent. The xbase code will
> always create exactly the 5000 rows there are in Recs1, whether there are
> one or more rows in Rcr that match, while the query will create as many
rows
> as it finds matches. That could in principle be 5000 x 100000 rows.

Agreed.  I apologize for overstating equivalency, (the xbase was
quick-and-dirty as this specific case is a one-time need).  I said they were
equivalent based on knowing some aspects of the data having no
duplication/overlap in the ranges--producing at most one match per record in
Recs1.  So I knew about this outcome, but you're absolutely correct--the
same code comparison on different datasets holds no guarantee to produce the
same results.

(Please note I don't generally rely on clean data.  This is one of those
rare times I can count on the data being as expected, this isn't going into
production, and I'll not have to deal with pesky users somehow messing up my
perfect data <g>.)

Thanks,

John



Mon, 23 Jun 2003 23:50:49 GMT  
 JOIN Optimization


Quote:
> Hi Anders,

> Thanks, and my apologies for some sloppiness in my terminology on the
> original post...

> > Why INSERT? That should be GATHER MEMVAR or GATHER MEMVAR FIELDS
> locationid,
> > shouldn't it?

> > > COPY FIELD CallDate NEXT 0 TO Temp

Sorry. I didn't notice the NEXT 0 scope.

-Anders



Tue, 24 Jun 2003 01:05:50 GMT  
 
 [ 8 post ] 

 Relevant Pages 

1. SQL Joins and Rushmore Optimization

2. Left outer join=left join?

3. Optimization question

4. optimization

5. Optimization

6. FoxPro 2.6 for DOS, code optimization

7. Rushmore optimization ?

8. Query Optimization

9. Help with optimization

10. VFP Rushmore optimization not kicking in

11. Search optimization

12. SQL statement optimization

 

 
Powered by phpBB® Forum Software