Обсуждение: Optimizer confusion?

Поиск
Список
Период
Сортировка

Optimizer confusion?

От
Philip Warner
Дата:
I have a table for which the SEQSCAN and INDEXSCAN estimates are the same
up to a point, after which the SEQSCAN estimates remain fixed, and the
indexscan estimates continue to grow. However, the actual speed of the
index scan is superior for a much greater period than the optimizer predicts.

The database has a table 'ping' with various fields including a 'pingtime
timestamp'; it also has a btree indexe on the date and has been 'vacuum
analyze'-ed. There are about 200000 rows and the data is evenly distributed
in 5 minute intervals.

These are the results from 'explain':

------ 1 ----
uptime=# explain select * from ping where pingtime>'1-aug-1999' and
pingtime<'1-aug-1999';
NOTICE:  QUERY PLAN:

Index Scan using ping_ix1 on ping  (cost=0.00..4.28 rows=1 width=52)
------

This seems fine, even if the query is bogus.


------ 2 ----
uptime=# explain select * from ping where pingtime>'1-aug-1999' and
pingtime<'2-aug-1999';
NOTICE:  QUERY PLAN:

Index Scan using ping_ix1 on ping  (cost=0.00..1679.29 rows=561 width=52)
------

Also looks OK.


------ 3 ----
uptime=# explain select * from ping where pingtime>'1-aug-1999' and
pingtime<'3-aug-1999';
NOTICE:  QUERY PLAN:

Index Scan using ping_ix1 on ping  (cost=0.00..3091.18 rows=1123 width=52)
------

This seems OK; the estimate is roughly double the previous, which is to be
expected, I think.


------- 5 ----
uptime=# explain select * from ping where pingtime>'1-aug-1999' and
pingtime<'5-aug-1999';
NOTICE:  QUERY PLAN:

Index Scan using ping_ix1 on ping  (cost=0.00..5386.70 rows=2245 width=52)
------

Again. this is OK, although I am a little surprised at the continuing
non-linearity of the estimates.

Now it starts getting very strange:

------- 5+a bit ----
uptime=# explain select * from ping where pingtime>'1-aug-1999' and
pingtime<'5-aug-1999 20:25';
NOTICE:  QUERY PLAN:

Seq Scan on ping  (cost=0.00..6208.68 rows=2723 width=52)
-------

OK so far, but look at the following (the costs are the same):

------- 3 Months ----
uptime=# explain select * from ping where pingtime>'1-aug-1999' and
pingtime<'1-nov-1999';
NOTICE:  QUERY PLAN:

Seq Scan on ping  (cost=0.00..6208.68 rows=51623 width=52)
-------

and

------- 5 + a YEAR ----
uptime=# explain select * from ping where pingtime>'1-aug-1999' and
pingtime<'5-aug-2000 20:25';
NOTICE:  QUERY PLAN:

Seq Scan on ping  (cost=0.00..6208.68 rows=208184 width=52)
------


Now what is also strange, is if I set ENABLE_SEQSCAN=OFF, then the
estimates up to '5+a bit' are the *same*, but the running time is
substantially better for index scan. In fact the running time is better for
index scans up to an interval of about three months. I presume there is
something wrong with the selectivify estimates for the index.

I really don't want to have the code call 'SET ENABLE_SEQSCAN=OFF/ON'
around this statement, since for a longer period, I do want a sequential
scan. And building my own 'query optimizer' which says 'if time diff > 3
months, then enable seqscan' seems like a very bad idea.

I would be interested to know (a) if there is any way I can influence the
optimizer choice when it considers using the index in question, and (b) if
the fixed seqscan cost estimate is a bug.


FWIW, the output of a 3 month period with ENABLE_SEQSCAN=OFF is:

-----
uptime=# set enable_seqscan=off;
uptime=# explain select * from ping where pingtime>'1-aug-1999' and
pingtime<'1-nov-1999';
NOTICE:  QUERY PLAN:

Index Scan using ping_ix1 on ping  (cost=0.00..27661.01 rows=51623 width=52)
-----

Any help, explanation, etc would be appreciated.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|
                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

Re: [HACKERS] Optimizer confusion?

От
Tom Lane
Дата:
Philip Warner <pjw@rhyme.com.au> writes:
> Again. this is OK, although I am a little surprised at the continuing
> non-linearity of the estimates.

Indexscan estimates are supposed to be nonlinear, actually, to account
for the effects of caching.  I doubt the shapes of the curves are right
in detail, but I haven't had time to do any research about it.

> I would be interested to know (a) if there is any way I can influence the
> optimizer choice when it considers using the index in question,

You could push random_page_cost and effective_cache_size around to try
to match your platform better.  Let me know if that helps...

> (b) if the fixed seqscan cost estimate is a bug.

I don't think so.  A seqscan will touch every page and every tuple once,
therefore the costs should be pretty much independent of the number of
tuples that actually get selected, no?  (Note that the time spent
returning tuples to the frontend is deliberately ignored by the
optimizer, on the grounds that every correct plan for a given query
will have the exact same output costs.  So if you want to try to compare
the planner's cost estimates to real elapsed time, you might want to
measure the results for "select count(*) from ..." instead of "select *
from ..."  so that output costs are held fixed.)

            regards, tom lane

Re: [HACKERS] Optimizer confusion?

От
Philip Warner
Дата:
At 02:15 12/08/00 -0400, Tom Lane wrote:
>Philip Warner <pjw@rhyme.com.au> writes:
>> Again. this is OK, although I am a little surprised at the continuing
>> non-linearity of the estimates.
>
>Indexscan estimates are supposed to be nonlinear, actually, to account
>for the effects of caching.  I doubt the shapes of the curves are right
>in detail, but I haven't had time to do any research about it.

Of course - I was thinking it was a seqscan, which is even sillier.

As to the non-linearity, I would have thought the majority of I/Os by far
would be reading rows, and with retrieval by index, you may not get much
buffering benefit on table pages. For a large table, ISTM linear estimates
would be a good estimate.


>> I would be interested to know (a) if there is any way I can influence the
>> optimizer choice when it considers using the index in question,
>
>You could push random_page_cost and effective_cache_size around to try
>to match your platform better.  Let me know if that helps...

Setting it to 0.1 works (it was 4). But this (I think) just highlights the
fact that the index is sorted by date, and the rows were added in date
order. As a result (for this table, in this query), the index scan get's a
much better cache-hit rate, so the actual IO cost is low.

Does that sound reasonable? Does the optimizer know if I have used
clustering? If so, maybe I should just use the clustering command. If not,
then probably it's best to go with an index scan always for this query. Is
there any way I can code the query with "{enable_seqscan=off}" to apply
only to the current query? Or, perhaps more usefully, ask it (politely) to
use a given index for part of the predicate?

ISTM setting random_page_cost to 0.1 would be a bad idea in general...


>> (b) if the fixed seqscan cost estimate is a bug.
>
>I don't think so.

I think you're right; equating cost to row-IO means seqscan cost is fixed.


>  So if you want to try to compare
>the planner's cost estimates to real elapsed time, you might want to
>measure the results for "select count(*) from ..." instead of "select *
>from ..."  so that output costs are held fixed.)

This actually makes the indexscan even more desirable - the time interval
has to be more than 7 months before indexscan is slower, but at this point
it gets hard to tell how much benefit is being made from buffering etc, so
the 'elapsed time' comparison is probably pretty dodgy.

I don't suppose I can get the backend to tell me how many logical IOs and
how much CPU it used?


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|
                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

Re: [HACKERS] Optimizer confusion?

От
Tom Lane
Дата:
Philip Warner <pjw@rhyme.com.au> writes:
>> Indexscan estimates are supposed to be nonlinear, actually, to account
>> for the effects of caching.  I doubt the shapes of the curves are right
>> in detail, but I haven't had time to do any research about it.

> As to the non-linearity, I would have thought the majority of I/Os by far
> would be reading rows, and with retrieval by index, you may not get much
> buffering benefit on table pages. For a large table, ISTM linear estimates
> would be a good estimate.

The estimate is linear, for number-of-tuples-to-fetch << number-of-pages-
in-table. See src/backend/optimizer/path/costsize.c for the gory details.

>> You could push random_page_cost and effective_cache_size around to try
>> to match your platform better.  Let me know if that helps...

> Setting it to 0.1 works (it was 4).

Unfortunately, random_page_cost < 1 is ridiculous on its face ... but
that's not where the problem is anyway, as your next comment makes clear.

> But this (I think) just highlights the
> fact that the index is sorted by date, and the rows were added in date
> order. As a result (for this table, in this query), the index scan get's a
> much better cache-hit rate, so the actual IO cost is low.

> Does that sound reasonable?

Quite.  The cost estimates are based on the assumption that the tuples
visited by an indexscan are scattered randomly throughout the table.
Obviously, if that's wrong then the estimates will be way too high.

> Does the optimizer know if I have used clustering?

Nope.  To quote from the code:

     * XXX if the relation has recently been "clustered" using this index,
     * then in fact the target tuples will be highly nonuniformly
     * distributed, and we will be seriously overestimating the scan cost!
     * Currently we have no way to know whether the relation has been
     * clustered, nor how much it's been modified since the last
     * clustering, so we ignore this effect.  Would be nice to do better
     * someday.

The killer implementation problem here is keeping track of how much the
table ordering has been altered since the last CLUSTER command.  We have
talked about using an assumption of "once clustered, always clustered",
ie, ignore the issue of sort order degrading over time.  That's pretty
ugly but it might still be more serviceable than the current state of
ignorance.  For a table like this one, where rows are added in date
order and (I imagine) seldom updated, the sort order isn't going to
degrade anyway.  For other tables, you could assume that you're going
to run CLUSTER on a periodic maintenance basis to keep the sort order
fairly good.

I have not yet done anything about this, mainly because I'm unwilling to
encourage people to use CLUSTER, since it's so far from being ready for
prime time (see TODO list).  Once we've done something about table
versioning, we can rewrite CLUSTER so that it's actually reasonable to
use on a regular basis, and at that point it'd make sense to make the
optimizer CLUSTER-aware.

> I don't suppose I can get the backend to tell me how many logical IOs and
> how much CPU it used?

Yes you can.  Run psql with
    PGOPTIONS="-s"
and look in the postmaster log.  There's also -tparse, -tplan,
-texec if you'd rather see the query time broken down by stages.

            regards, tom lane