Обсуждение: When are index scans used over seq scans?

Поиск
Список
Период
Сортировка

When are index scans used over seq scans?

От
Richard van den Berg
Дата:
We have a table with 1M rows that contain sessions with a start and
finish timestamps. When joining this table with a 10k table with rounded
timestamps, explain shows me sequential scans are used, and the join
takes about 6 hours (2s per seq scan on session table * 10000):

 Nested Loop  (cost=252.80..233025873.16 rows=1035480320 width=97)
Join Filter: (("outer".starttime <= "inner".ts) AND ("outer".finishtime
>= "inner".ts))
   ->  Seq Scan on sessions us  (cost=0.00..42548.36 rows=924536
width=105)    ->  Materialize  (cost=252.80..353.60 rows=10080 width=8)
         ->  Seq Scan on duration du  (cost=0.00..252.80 rows=10080 width=8)

However, during the initial loading of the data (we first load into text
tables, then convert to tables using timestamps etc, then run this
query) the same query took only 12 minutes. While debugging, I increased
cpu_tuple_cost to 0.1 (from 0.01). Now the explain shows an index scan,
and the run time comes down to 11 minutes:

 Nested Loop  (cost=0.00..667700310.42 rows=1035480320 width=97)
   ->  Seq Scan on sessions us (cost=0.00..125756.60 rows=924536 width=105)
   ->  Index Scan using ix_du_ts on duration du  (cost=0.00..604.46
rows=1120 width=8)
         Index Cond: (("outer".starttime <= du.ts) AND
("outer".finishtime >= du.ts))

I am glad that I found a way to force the use of the index, but still
can't explain why in the initial run the planner made the right choice,
but now I need to give it a hand. Could this have to do with the
statistics of the tables? I make very sure (during the initial load and
while testing) that I vacuum analyze all tables after I fill them.

I'm runing postgres 7.4.7.

Any help is appreciated.

--
Richard van den Berg, CISSP
-------------------------------------------
Trust Factory B.V. |     www.dna-portal.net
Bazarstraat 44a    |  www.trust-factory.com
2518AK The Hague   |  Phone: +31 70 3620684
The Netherlands    |  Fax  : +31 70 3603009
-------------------------------------------

Re: When are index scans used over seq scans?

От
John A Meinel
Дата:
Richard van den Berg wrote:

>We have a table with 1M rows that contain sessions with a start and
>finish timestamps. When joining this table with a 10k table with rounded
>timestamps, explain shows me sequential scans are used, and the join
>takes about 6 hours (2s per seq scan on session table * 10000):
>
> Nested Loop  (cost=252.80..233025873.16 rows=1035480320 width=97)
>Join Filter: (("outer".starttime <= "inner".ts) AND ("outer".finishtime
>
>
>>= "inner".ts))
>>
>>
>   ->  Seq Scan on sessions us  (cost=0.00..42548.36 rows=924536
>width=105)    ->  Materialize  (cost=252.80..353.60 rows=10080 width=8)
>         ->  Seq Scan on duration du  (cost=0.00..252.80 rows=10080 width=8)
>
>However, during the initial loading of the data (we first load into text
>tables, then convert to tables using timestamps etc, then run this
>query) the same query took only 12 minutes. While debugging, I increased
>cpu_tuple_cost to 0.1 (from 0.01). Now the explain shows an index scan,
>and the run time comes down to 11 minutes:
>
> Nested Loop  (cost=0.00..667700310.42 rows=1035480320 width=97)
>   ->  Seq Scan on sessions us (cost=0.00..125756.60 rows=924536 width=105)
>   ->  Index Scan using ix_du_ts on duration du  (cost=0.00..604.46
>rows=1120 width=8)
>         Index Cond: (("outer".starttime <= du.ts) AND
>("outer".finishtime >= du.ts))
>
>I am glad that I found a way to force the use of the index, but still
>can't explain why in the initial run the planner made the right choice,
>but now I need to give it a hand. Could this have to do with the
>statistics of the tables? I make very sure (during the initial load and
>while testing) that I vacuum analyze all tables after I fill them.
>
>I'm runing postgres 7.4.7.
>
>Any help is appreciated.
>
>
>
I believe the problem is that postgres doesn't recognize how restrictive
a date-range is unless it uses constants.
So saying:

select blah from du WHERE time between '2004-10-10' and '2004-10-15';
Will properly use the index, because it realizes it only returns a few rows.
However
select blah from du, us where du.ts between us.starttime and us.finishtime;
Doesn't know how selective that BETWEEN is.

This has been discussed as a future improvement to the planner (in
8.*).  I don't know the current status.

Also, in the future, you really should post your table schema, and
explain analyze instead of just explain. (I realize that with a 6hr
query it is a little painful.)

Notice that in the above plans, the expected number of rows drops from
10k down to 1k (which is probably where the planner decides to switch).
And if you actually did the analyze probably the number of rows is much
lower still.

Probably you should try to find out the status of multi-table
selectivity. It was discussed in the last couple of months.

John
=:->


Вложения

Re: When are index scans used over seq scans?

От
Tom Lane
Дата:
Richard van den Berg <richard.vandenberg@trust-factory.com> writes:
> We have a table with 1M rows that contain sessions with a start and
> finish timestamps. When joining this table with a 10k table with rounded
> timestamps, explain shows me sequential scans are used, and the join
> takes about 6 hours (2s per seq scan on session table * 10000):

>  Nested Loop  (cost=252.80..233025873.16 rows=1035480320 width=97)
> Join Filter: (("outer".starttime <= "inner".ts) AND ("outer".finishtime
>> = "inner".ts))
>    ->  Seq Scan on sessions us  (cost=0.00..42548.36 rows=924536
> width=105)    ->  Materialize  (cost=252.80..353.60 rows=10080 width=8)
>          ->  Seq Scan on duration du  (cost=0.00..252.80 rows=10080 width=8)

The explain shows no such thing.  What is the *actual* runtime of
each plan per EXPLAIN ANALYZE, please?

(In general, any time you are complaining about planner misbehavior,
it is utterly pointless to give only planner estimates and not reality.
By definition, you don't think the estimates are right.)

            regards, tom lane

Re: When are index scans used over seq scans?

От
Richard van den Berg
Дата:
John A Meinel wrote:
> I believe the problem is that postgres doesn't recognize how restrictive
> a date-range is unless it uses constants.

And it does when using BETWEEN with int for example? Impressive. :-)

> select blah from du WHERE time between '2004-10-10' and '2004-10-15';
> Will properly use the index, because it realizes it only returns a few
> rows.

Correct, it does.

> Probably you should try to find out the status of multi-table
> selectivity. It was discussed in the last couple of months.

I can't find the posts you are refering to. What is the priciple of
multi-table selectivity?

Your explanation sounds very plausible.. I don't mind changing the
cpu_tuple_cost before running BETWEEN with timestamps, they are easy
enough to spot.

Thanks,

--
Richard van den Berg, CISSP
-------------------------------------------
Trust Factory B.V. |     www.dna-portal.net
Bazarstraat 44a    |  www.trust-factory.com
2518AK The Hague   |  Phone: +31 70 3620684
The Netherlands    |  Fax  : +31 70 3603009
-------------------------------------------

Re: When are index scans used over seq scans?

От
John A Meinel
Дата:
Richard van den Berg wrote:

>John A Meinel wrote:
>
>
>>I believe the problem is that postgres doesn't recognize how restrictive
>>a date-range is unless it uses constants.
>>
>>
>
>And it does when using BETWEEN with int for example? Impressive. :-)
>
>
>
>>select blah from du WHERE time between '2004-10-10' and '2004-10-15';
>>Will properly use the index, because it realizes it only returns a few
>>rows.
>>
>>
>
>Correct, it does.
>
>
>
>>Probably you should try to find out the status of multi-table
>>selectivity. It was discussed in the last couple of months.
>>
>>
>
>I can't find the posts you are refering to. What is the priciple of
>multi-table selectivity?
>
>Your explanation sounds very plausible.. I don't mind changing the
>cpu_tuple_cost before running BETWEEN with timestamps, they are easy
>enough to spot.
>
>Thanks,
>
>
>
Well, there was a thread titled "date - range"
There is also "recognizing range constraints" which started with "plan
for relatively simple query seems to be very inefficient".

Sorry that I gave you poor search terms.

Anyway, "date - range" gives an interesting workaround. Basically you
store date ranges with a different structure, which allows fast index
lookups.

The other threads are just discussing the possibility of improving the
planner so that it recognizes WHERE a > b AND a < c, is generally more
restrictive.

There was a discussion about how to estimate selectivity, but I think it
mostly boils down that except for pathological cases, a > b AND a < c is
always more restrictive than just a > b, or  a < c.

Some of it may be also be found in pgsql-hackers, rather than
pgsql-performance, but I'm not subscribed to -hackers, so most of it
should be in -performance.

John
=:->

caveat, I'm not a developer, I just read a lot of the list.

Вложения

Re: When are index scans used over seq scans?

От
Richard van den Berg
Дата:
Tom Lane wrote:
> The explain shows no such thing.  What is the *actual* runtime of
> each plan per EXPLAIN ANALYZE, please?

I took a simplified version of the problem (the actual query that took 6
hours joins 3 tables). With cpu_tuple_cost = 0.1:

 Nested Loop  (cost=0.00..667700310.42 rows=1035480320 width=97) (actual
time=31.468..42629.629 rows=6171334 loops=1)
   ->  Seq Scan on sessions us  (cost=0.00..125756.60 rows=924536
width=105) (actual time=31.366..3293.523 rows=924536 loops=1)
   ->  Index Scan using ix_du_ts on duration du  (cost=0.00..604.46
rows=1120 width=8) (actual time=0.004..0.011 rows=7 loops=924536)
         Index Cond: (("outer".starttimetrunc <= du.ts) AND
("outer".finishtimetrunc >= du.ts))
 Total runtime: 44337.937 ms

The explain analyze for cpu_tuple_cost = 0.01 is running now. If it
takes hours, I'll send it to the list tomorrow.

--
Richard van den Berg, CISSP
-------------------------------------------
Trust Factory B.V. |     www.dna-portal.net
Bazarstraat 44a    |  www.trust-factory.com
2518AK The Hague   |  Phone: +31 70 3620684
The Netherlands    |  Fax  : +31 70 3603009
-------------------------------------------

Re: When are index scans used over seq scans?

От
Richard van den Berg
Дата:
Tom Lane wrote:
> The explain shows no such thing.  What is the *actual* runtime of
> each plan per EXPLAIN ANALYZE, please?

Ok, it took 3.5 hours to complete. :-/

This is with the default cpu_tuple_cost = 0.01:

 Nested Loop  (cost=252.80..233010147.16 rows=1035480320 width=98)
(actual time=0.369..12672213.137 rows=6171334 loops=1)
   Join Filter: (("outer".starttimetrunc <= "inner".ts) AND
("outer".finishtimetrunc >= "inner".ts))
   ->  Seq Scan on sessions us  (cost=0.00..26822.36 rows=924536
width=106) (actual time=0.039..5447.349 rows=924536 loops=1)
   ->  Materialize  (cost=252.80..353.60 rows=10080 width=8) (actual
time=0.000..2.770 rows=10080 loops=924536)
         ->  Seq Scan on duration du  (cost=0.00..252.80 rows=10080
width=8) (actual time=0.019..13.397 rows=10080 loops=1)
 Total runtime: 12674486.670 ms

Once again with cpu_tuple_cost = 0.1:

 Nested Loop  (cost=0.00..667684584.42 rows=1035480320 width=98) (actual
time=42.892..39877.928 rows=6171334 loops=1)
   ->  Seq Scan on sessions us  (cost=0.00..110030.60 rows=924536
width=106) (actual time=0.020..917.803 rows=924536 loops=1)
   ->  Index Scan using ix_du_ts on duration du  (cost=0.00..604.46
rows=1120 width=8) (actual time=0.004..0.011 rows=7 loops=924536)
         Index Cond: (("outer".starttimetrunc <= du.ts) AND
("outer".finishtimetrunc >= du.ts))
 Total runtime: 41635.468 ms
(5 rows)

--
Richard van den Berg, CISSP
-------------------------------------------
Trust Factory B.V. |     www.dna-portal.net
Bazarstraat 44a    |  www.trust-factory.com
2518AK The Hague   |  Phone: +31 70 3620684
The Netherlands    |  Fax  : +31 70 3603009
-------------------------------------------

Re: When are index scans used over seq scans?

От
Richard van den Berg
Дата:
Thanks a lot John for the correct search terms. :-)

The suggestion in
http://archives.postgresql.org/pgsql-performance/2005-04/msg00029.php to
add a constraint that checks (finishtime >= starttime) does not make a
difference for me. Still seq scans are used.

The width solution explained in
http://archives.postgresql.org/pgsql-performance/2005-04/msg00027.php
and
http://archives.postgresql.org/pgsql-performance/2005-04/msg00116.php
does make a huge difference when selecting 1 timestamp using a BETWEEN
(2ms vs 2sec), but as soon as I put 2 timestamps in a table and try a
join, everything goes south (7.7sec). I have 10k timestamps in the
duration table. :-(

I'm getting more confused on how the planner decides to use indexes. For
example, if I try:

explain analyze select us.oid from sessions us where '2005-04-10
23:11:00' between us.starttimetrunc and us.finishtimetrunc;

     QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using sessions_st_ft_idx2 on sessions us
(cost=0.00..18320.73 rows=4765 width=4) (actual time=0.063..2.455
rows=279 loops=1)
   Index Cond: (('2005-04-10 23:11:00'::timestamp without time zone <=
finishtimetrunc) AND ('2005-04-10 23:11:00'::timestamp without time zone
>= starttimetrunc))
 Total runtime: 2.616 ms

is uses the index! However, if I change the date it does not:

explain analyze select us.oid from sessions us where '2005-04-09
23:11:00' between us.starttimetrunc and us.finishtimetrunc;

   QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on sessions us  (cost=0.00..68173.04 rows=41575 width=4)
(actual time=553.424..1981.695 rows=64 loops=1)
   Filter: (('2005-04-09 23:11:00'::timestamp without time zone >=
starttimetrunc) AND ('2005-04-09 23:11:00'::timestamp without time zone
<= finishtimetrunc))
 Total runtime: 1981.802 ms

The times in sessions go from '2005-04-04 00:00:00' to '2005-04-10
23:59:00' so both are valid times to query for, but April 10th is more
towards the end. A little experimenting shows that if I go earlier than
'2005-04-10 13:26:15' seq scans are being used. I was thinking this
timestamp would have something to do with the histogram_bounds in
pg_stats, but I cannot find a match:

 starttimetrunc         | {"2005-04-04 00:05:00","2005-04-04
11:49:00","2005-04-04 22:03:00","2005-04-05 10:54:00","2005-04-05
21:08:00","2005-04-06 10:28:00","2005-04-07 01:57:00","2005-04-07
15:55:00","2005-04-08 10:18:00","2005-04-08 17:12:00","2005-04-10 23:57:00"}
 finishtimetrunc        | {"2005-04-04 00:05:00.93","2005-04-04
11:53:00.989999","2005-04-04 22:35:00.38","2005-04-05
11:13:00.029999","2005-04-05 21:31:00.989999","2005-04-06
10:45:01","2005-04-07 02:08:08.25","2005-04-07 16:20:00.93","2005-04-08
10:25:00.409999","2005-04-08 17:15:00.949999","2005-04-11 02:08:19"}

--
Richard van den Berg, CISSP
-------------------------------------------
Trust Factory B.V. |     www.dna-portal.net
Bazarstraat 44a    |  www.trust-factory.com
2518AK The Hague   |  Phone: +31 70 3620684
The Netherlands    |  Fax  : +31 70 3603009
-------------------------------------------
   Have you visited our new DNA Portal?
-------------------------------------------

Re: When are index scans used over seq scans?

От
Tom Lane
Дата:
Richard van den Berg <richard.vandenberg@trust-factory.com> writes:
> This is with the default cpu_tuple_cost = 0.01:

>  Nested Loop  (cost=252.80..233010147.16 rows=1035480320 width=98)
> (actual time=0.369..12672213.137 rows=6171334 loops=1)
>    Join Filter: (("outer".starttimetrunc <= "inner".ts) AND
> ("outer".finishtimetrunc >= "inner".ts))
>    ->  Seq Scan on sessions us  (cost=0.00..26822.36 rows=924536
> width=106) (actual time=0.039..5447.349 rows=924536 loops=1)
>    ->  Materialize  (cost=252.80..353.60 rows=10080 width=8) (actual
> time=0.000..2.770 rows=10080 loops=924536)
>          ->  Seq Scan on duration du  (cost=0.00..252.80 rows=10080
> width=8) (actual time=0.019..13.397 rows=10080 loops=1)
>  Total runtime: 12674486.670 ms

Hmm, that *is* showing rather a spectacularly large amount of time in
the join itself: if I did the arithmetic right,

regression=# select 12672213.137 - (5447.349 + 2.770*924536 + 13.397);
   ?column?
--------------
 10105787.671
(1 row)

which is almost 80% of the entire runtime.  Which is enormous.
What are those column datatypes exactly?  Perhaps you are incurring a
datatype conversion cost?  Straight timestamp-vs-timestamp comparison
is fairly cheap, but any sort of conversion will cost dearly.

The planner's model for the time spent in the join itself is
    (cpu_tuple_cost + 2 * cpu_operator_cost) * n_tuples
(the 2 because you have 2 operators in the join condition)
so you'd have to raise one or the other of these parameters
to model this situation accurately.  But I have a hard time
believing that cpu_tuple_cost is really as high as 0.1.
It seems more likely that the cpu_operator_cost is underestimated,
which leads me to question what exactly is happening in those
comparisons.

            regards, tom lane

Re: When are index scans used over seq scans?

От
Richard van den Berg
Дата:
John A Meinel wrote:
> You might try doing:
> ALTER TABLE us ALTER COLUMN starttimetrunc SET STATISTICS 200;
> ALTER TABLE us ALTER COLUMN finishtimetrunc SET STATISTICS 200;
> VACUUM ANALYZE us;

I've been looking into that. While increasing the statistics makes the
planner use the index for simple selects, it still does not for joins.

Another thing that threw me off is that after a "vacuum analyze" a
"select * from us where 'x' between start and finish" uses seq scans,
while after just an "analyze" is uses the index! I thought both
statements were supposed to update the statistics in the same way? (This
is with 7.4.7.)

> You have 2 tables, a duration, and a from->to table, right? How many
> rows in each?

Duration: 10k
Sessions: 1M

> Anyway, you can play around with it by using stuff like:
> SET enable_seqscan TO off;

This doesn't help much. Instead of turning seqscans off this setting
increases its cost with 100M. Since my query already has a cost of about
400M-800M this doesn't matter much.

For now, the only reliable way of forcing the use of the index is to set
cpu_tuple_cost = 1.

--
Richard van den Berg, CISSP
-------------------------------------------
Trust Factory B.V. |     www.dna-portal.net
Bazarstraat 44a    |  www.trust-factory.com
2518AK The Hague   |  Phone: +31 70 3620684
The Netherlands    |  Fax  : +31 70 3603009
-------------------------------------------


Re: When are index scans used over seq scans?

От
Richard van den Berg
Дата:
Tom Lane wrote:
> which is almost 80% of the entire runtime.  Which is enormous.
> What are those column datatypes exactly?

                      Table "richard.sessions"
         Column         |            Type             | Modifiers
------------------------+-----------------------------+-----------
[unrelated columns removed]
 starttimetrunc         | timestamp without time zone |
 finishtimetrunc        | timestamp without time zone |
Indexes:
    "rb_us_st_ft_idx" btree (starttimetrunc, finishtimetrunc)
    "rb_us_st_ft_idx2" btree (finishtimetrunc, starttimetrunc)
Check constraints:
    "date_check" CHECK (finishtimetrunc >= starttimetrunc)

             Table "richard.duration"
 Column |            Type             | Modifiers
--------+-----------------------------+-----------
 ts     | timestamp without time zone |

> Perhaps you are incurring a datatype conversion cost?

Not that I can tell.

> It seems more likely that the cpu_operator_cost is underestimated,

As you perdicted, increasing cpu_operator_cost from 0.0025 to 0.025 also
causes the planner to use the index on duration.

> which leads me to question what exactly is happening in those
> comparisons.

Your guess is as good as mine (actually, yours is much better). I can
put together a reproducable test case if you like..

--
Richard van den Berg, CISSP
-------------------------------------------
Trust Factory B.V. |     www.dna-portal.net
Bazarstraat 44a    |  www.trust-factory.com
2518AK The Hague   |  Phone: +31 70 3620684
The Netherlands    |  Fax  : +31 70 3603009
-------------------------------------------
   Have you visited our new DNA Portal?
-------------------------------------------

Re: When are index scans used over seq scans?

От
Tom Lane
Дата:
Richard van den Berg <richard.vandenberg@trust-factory.com> writes:
> Tom Lane wrote:
>> Perhaps you are incurring a datatype conversion cost?

> Not that I can tell.

No, apparently not.  Hmm ... timestamp_cmp_internal is just a couple of
isnan() checks and one or two floating-point compares.  Should be pretty
dang cheap.  Unless isnan() is ridiculously expensive on your hardware?
More likely there is some bottleneck that we are not thinking of.

Are the tables in question particularly wide (many columns)?

>> which leads me to question what exactly is happening in those
>> comparisons.

> Your guess is as good as mine (actually, yours is much better). I can
> put together a reproducable test case if you like..

I'm thinking it would be interesting to look at a gprof profile of the
nestloop case.  If you can rebuild with profiling and get one, that
would be fine, or you can make up a test case that shows the same slow
joining behavior.

            regards, tom lane

Re: When are index scans used over seq scans?

От
Richard van den Berg
Дата:
Tom Lane wrote:
> Are the tables in question particularly wide (many columns)?

Yes they are. They both have 20 columns. If I cut down the duration
table to just 1 column of timestamps, the planner uses the index.

Interesting, so I could store just the timestamps in another table (view
doesn't help) to speed up this query.

I am using the debian package. How can I tell if profiling is enabled?

Thanks a lot,

--
Richard van den Berg, CISSP
-------------------------------------------
Trust Factory B.V. |     www.dna-portal.net
Bazarstraat 44a    |  www.trust-factory.com
2518AK The Hague   |  Phone: +31 70 3620684
The Netherlands    |  Fax  : +31 70 3603009
-------------------------------------------