Re: Weirdly pesimistic estimates in optimizer

Поиск
Список
Период
Сортировка
От David Kubečka
Тема Re: Weirdly pesimistic estimates in optimizer
Дата
Msg-id CAGLMwk_aCL55y_9eVRA5cMHd-L--Z+C+NHjzbV9MX8oQiDpGYg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Weirdly pesimistic estimates in optimizer  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: Weirdly pesimistic estimates in optimizer  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Hi Tomas and others,

2015-03-02 21:29 GMT+01:00 Tomas Vondra <tomas.vondra@2ndquadrant.com>:
Hi David ;-)

On 2.3.2015 20:19, David Kubečka wrote:
>
> The question is why optimizer, or rather the cost estimator,
> produced so much different estimates upon very small change in input.
> Moreover it seems that the main culprit of bad estimates isn't
> actually directly related to outer table, but rather to inner table.
> Just compare estimates for the two index scans:
>
> With 'random_fk_dupl':
>          ->  Index Scan using facts_fk_idx on facts  (cost=0.42..5.75
> rows=100 width=15) (actual time=0.009..0.117 rows=98 loops=100)
> With 'random_fk_uniq':
>          ->  Index Scan using facts_fk_idx on facts  (cost=0.42..214.26
> rows=100 width=15) (actual time=0.007..0.109 rows=98 loops=100)
>
> I have read the optimizer README file and also looked briefly at the
>  code, but this seems to be something not related to particular
> implementation of algorithm (e.g. nested loop). Perhaps it's the way
> how cost estimates are propagated down (or sideways? that would be
> weird...) the query tree. But I am really not sure, since this is my
> first time lookng at the optimizer code base. I should also add that
> I have reproduced this behaviour for all versions of Pg from 9.2 up
> to current devel.

Interesting. I've only spent a few minutes looking at this, but it
certainly is a bit strange that using smaller table with unique values
results in a slower plan than using a table with duplicities (but the
same number of unique values).

Yep. And I think that I have found the cause of the strangeness. I have traced the planner code and this is how cost_index from costsize.c is called when computing the cost of index scan on outer (facts) table, when the inner table is with duplicate entries (random_fk_dupl):

cost_index (path=0x27f8fb8, root=0x27670e8, loop_count=9920)

The important thing here is the value of loop_count which is the (perfect estimate of) number of rows of random_fk_dupl (I have recreated the table so it differs slightly from previously posted version (9893)). Now the predominant part of running cost is computed by this expression:

run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);

Since csquared (indexCorrelation * indexCorrelation) is very small in this case, the biggest term here is max_IO_cost, which is computed as follows:

max_IO_cost = (pages_fetched * spc_random_page_cost) / loop_count;

There is division by loop_count because of predicted effect of caching and it is exactly this division which makes the run_cost for single index lookup so low compared with the query version with random_fk_uniq.  So the main problem is that the planner calls cost_index with loop_count equal to number of rows of inner table, *but it ignores semi-join semantics*, i.e. it doesn't account for the number of *unique* rows in the inner table which will actually be the number of loops.

Since this is my first time looking into the optimizer (and in fact any postgres) code I am not yet able to locate the exact place where this should be repaired, but I hope that in few days I will have a patch :-)

Another observation is that the planner does see other plans, because
tweaking the cost variables like this:

    SET random_page_cost = 2;

produces this plan (which on my machine runs just a tad slower than the
first query in your post):

                            QUERY PLAN
---------------------------------------------------------------------
 HashAggregate  (cost=10564.81..10688.47 rows=9893 width=15)
   Group Key: facts.fk
   ->  Nested Loop  (cost=5.42..10515.34 rows=9893 width=15)
       ->  HashAggregate  (cost=2.24..3.23 rows=99 width=4)
             Group Key: random_fk_uniq.fk
             ->  Seq Scan on random_fk_uniq  (cost=0.00..1.99 ...)
       ->  Bitmap Heap Scan on facts  (cost=3.18..105.18 rows=100 ...)
           Recheck Cond: (fk = random_fk_uniq.fk)
           ->  Bitmap Index Scan on facts_fk_idx  (cost=0.00..3.15 ...)
                     Index Cond: (fk = random_fk_uniq.fk)

Yeah, this makes now perfect sense to me. From the run_cost and max_IO_cost formulas you can see that (in this case!) random_page_cost is almost exactly the linear factor here. So eventually the results are completely opposite than I wished at first, i.e. instead of tweaking postgres to produce lower costs for random_fk_uniq it will start producing higher costs for random_fk_dupl. But I now believe that this is correct.

Regards,

-David


I can get similar results by setting cpu_operator_cost=0.005. And
further lowering to random_page_cost to 1.1 actually gets me this:

                            QUERY PLAN
---------------------------------------------------------------------
 HashAggregate  (cost=6185.41..6309.08 rows=9893 width=15)
   Group Key: facts.fk
   ->  Nested Loop  (cost=2.66..6135.95 rows=9893 width=15)
      ->  HashAggregate  (cost=2.24..3.23 rows=99 width=4)
            Group Key: random_fk_uniq.fk
            ->  Seq Scan on random_fk_uniq  (cost=0.00..1.99 ...)
      ->  Index Scan using facts_fk_idx on facts  (cost=0.42..60.95 ...)
               Index Cond: (fk = random_fk_uniq.fk)

which is exactly the first plan from your post.

I still don't understand why using the smaller table produces less
efficient plan, but the estimated costs are very clost (20013 vs. 18147
is very small difference in this context), and the estimator shoots for
minimal average error. So it may seem 'weirdly pessimistic' but it might
be equally optimistic for other queries.

Also, keep in mind that the cost model is just a simplification of
reality, so it can't be correct 100% of the time.

The fact that this small difference in costs results in significant
duration difference suggests that the default optimizer cost values do
not reflect your environment. For example, you assume that this is very
fast:

    NestLoop
        -> Seq Scan on A
        -> Index Scan using B_Y_IDX on B
            Index Condition: B.Y = A.X

but index scans often produce a lot of random I/O, and that may be quite
expensive if it really hits the storage. And that's what the default
values (random_page_cost=4) is set for. But if you're on a system with
lots of RAM, or SSD, ... that simply is not the case, and may penalize
index scans (and prefer more sequential workloads).

The other thing you might do is creating index on (fk,f), which on the
new releases produces this:

                           QUERY PLAN
-------------------------------------------------------------------------
 HashAggregate  (cost=783.77..907.43 rows=9893 width=15)
   Group Key: facts.fk
   ->  Nested Loop  (cost=2.66..734.30 rows=9893 width=15)
       ->  HashAggregate  (cost=2.24..3.23 rows=99 width=4)
           Group Key: random_fk_uniq.fk
           ->  Seq Scan on random_fk_uniq  (cost=0.00..1.99 ...)
       ->  Index Only Scan using facts_2_idx on facts  (cost=...)
               Index Cond: (fk = random_fk_uniq.fk)

Which limits the amount of random I/O by only scanning the index, and is
even faster than the first query.

regard

--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andreas Karlsson
Дата:
Сообщение: Re: Using 128-bit integers for sum, avg and statistics aggregates
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: MD5 authentication needs help