Обсуждение: Bogus nestloop rows estimate in 8.4.7

Поиск

Список

Период

Сортировка

Bogus nestloop rows estimate in 8.4.7

От

Marti Raudsepp

Дата:

28 мая 2012 г., 04:46:06

Hi list,

This bug isn't causing me any immediate problems -- the plan works out
well regardless -- but PostgreSQL 8.4.7 is somehow overestimating the
number of rows coming from a nestloop join, when joining 2 large
partitioned tables. Maybe it's been fixed in more recent versions,
sadly it's an EOL Linux distro and I have no immediate plans to
upgrade.

It's estimating to join 135957 x 281 rows, but the product is somehow
2946151270877

In reality, it's joining 132577 x ~1 rows to get 133116 results

QUERY PLAN
 GroupAggregate  (cost=852067259163.57..977278688175.85
rows=2946151270877 width=36)
   ->  Sort  (cost=852067259163.57..859432637340.77 rows=2946151270877 width=36)
         Sort Key: b.banner_id, b.client_body_id,
(COALESCE(b.partner_body_id, a.partner_body_id)), b.space_id,
b.campaign_id, a.evt_type_id
         ->  Nested Loop  (cost=0.00..213859871.55 rows=2946151270877 width=36)
               Join Filter: (a.request_id = b.request_id)
               ->  Append  (cost=0.00..5905.69 rows=135957 width=20)
                     ->  Index Scan using "XIF01request" on request a
(cost=0.00..8.27 rows=1 width=20)
                           Index Cond: ((request_time >= '2012-05-28
09:00:00'::timestamp without time zone) AND (request_time <
'2012-05-28 10:00:00'::timestamp without time zone))
                     ->  Index Scan using "XIF01request_1222" on
request_1222 a  (cost=0.00..5897.42 rows=135956 width=20)
                           Index Cond: ((request_time >= '2012-05-28
09:00:00'::timestamp without time zone) AND (request_time <
'2012-05-28 10:00:00'::timestamp without time zone))
               ->  Append  (cost=0.00..1569.44 rows=281 width=32)
                     ->  Seq Scan on request_data b  (cost=0.00..11.30
rows=130 width=32)
                     ->  Index Scan using
"IX_relationship64_request_d_c_1150" on request_d_c_1150 b
(cost=0.00..9.56 rows=2 width=32)
                           Index Cond: (b.request_id = a.request_id)
*snip lots of partition index scans*

Query:
SELECT '2012-05-28T09:00:00', count(*),
uniq(sort(array_agg(visitor_id))), banner_id, client_body_id,
partner_body_id, space_id, campaign_id, evt_type_id FROM stats_request
WHERE stats_request.request_time >= '2012-05-28T09:00:00' AND
stats_request.request_time < (timestamp '2012-05-28T09:00:00' +
interval E'1 hour')::timestamp
GROUP BY banner_id, client_body_id, partner_body_id, space_id,
campaign_id, evt_type_id ORDER BY banner_id, client_body_id,
partner_body_id, space_id, campaign_id, evt_type_id;

Full EXPLAIN ANALYZE is attached.

Regards,
Marti

Вложения

explain-analyze.txt

Re: Bogus nestloop rows estimate in 8.4.7

От

Marti Raudsepp

Дата:

28 мая 2012 г., 05:19:19

On Mon, May 28, 2012 at 10:45 AM, Marti Raudsepp <marti@juffo.org> wrote:
> Query:
> SELECT '2012-05-28T09:00:00', count(*),
> uniq(sort(array_agg(visitor_id))), banner_id, client_body_id,
> partner_body_id, space_id, campaign_id, evt_type_id FROM stats_request
> WHERE stats_request.request_time >= '2012-05-28T09:00:00' AND
> stats_request.request_time < (timestamp '2012-05-28T09:00:00' +
> interval E'1 hour')::timestamp
> GROUP BY banner_id, client_body_id, partner_body_id, space_id,
> campaign_id, evt_type_id ORDER BY banner_id, client_body_id,
> partner_body_id, space_id, campaign_id, evt_type_id;

Oh, I forgot to mention that stats_request is a view and that's where
the JOIN is coming from:

CREATE VIEW stats_request ASSELECT a.request_id, a.request_time, b.b2s_id, a.evt_type_id,
b.space_id, b.banner_id, COALESCE(b.visitor_id, a.visitor_id) AS
visitor_id, COALESCE(b.partner_body_id, a.partner_body_id) AS
partner_body_id, b.client_body_id, b.campaign_id  FROM request a  JOIN request_data b USING (request_id);

request and request_data are both large partitioned tables.

Regards,
Marti

Re: Bogus nestloop rows estimate in 8.4.7

От

Tom Lane

Дата:

28 мая 2012 г., 14:57:17

Marti Raudsepp <marti@juffo.org> writes:
> This bug isn't causing me any immediate problems -- the plan works out
> well regardless -- but PostgreSQL 8.4.7 is somehow overestimating the
> number of rows coming from a nestloop join, when joining 2 large
> partitioned tables.

This sounds familiar, but a quick trawl through the commit logs didn't
immediately turn up any related-looking patches.  Can you put together
a self-contained test case?

Also, what do you have constraint_exclusion set to?
        regards, tom lane

Re: Bogus nestloop rows estimate in 8.4.7

От

Marti Raudsepp

Дата:

28 мая 2012 г., 16:33:50

On Mon, May 28, 2012 at 8:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Also, what do you have constraint_exclusion set to?

The only sane value, "partition"

> This sounds familiar, but a quick trawl through the commit logs didn't
> immediately turn up any related-looking patches.  Can you put together
> a self-contained test case?

Sure, tested on 8.4.7, 8.4.11,  with all default configuration. Does
not occur in >=9.0

create table a_parent (i int);
create table a_child1 () inherits (a_parent);
create table a_child2 () inherits (a_parent);
insert into a_child1 select generate_series(1,100000);
insert into a_child2 select generate_series(100001,200000);
create index a1_i_idx on a_child1(i);
create index a2_i_idx on a_child2(i);

create table b_parent (i int);
create table b_child1 () inherits (b_parent);
create table b_child2 () inherits (b_parent);
insert into b_child1 select generate_series(1,100000);
insert into b_child1 select generate_series(1,100000);
insert into b_child2 select generate_series(100001,200000);
insert into b_child2 select generate_series(100001,200000);
create index b1_i_idx on b_child1(i);
create index b2_i_idx on b_child2(i);

analyze;
explain select * from a_parent join b_parent using (i) where i between 1 and 2;

Actually returns 4 rows, but estimate is 28168

QUERY PLAN
Nested Loop  (cost=0.00..1276.16 rows=28168 width=4) Join Filter: (public.a_parent.i = public.b_parent.i) ->  Append
(cost=0.00..62.56rows=14 width=4)       ->  Seq Scan on a_parent  (cost=0.00..46.00 rows=12 width=4)
Filter:((i >= 1) AND (i <= 2))       ->  Index Scan using a1_i_idx on a_child1 a_parent 
(cost=0.00..8.28 rows=1 width=4)             Index Cond: ((i >= 1) AND (i <= 2))       ->  Index Scan using a2_i_idx on
a_child2a_parent 
(cost=0.00..8.28 rows=1 width=4)             Index Cond: ((i >= 1) AND (i <= 2)) ->  Append  (cost=0.00..56.64
rows=2404width=4)       ->  Seq Scan on b_parent  (cost=0.00..34.00 rows=2400 width=4)       ->  Index Scan using
b2_i_idxon b_child2 b_parent 
(cost=0.00..11.31 rows=2 width=4)             Index Cond: (public.b_parent.i = public.a_parent.i)       ->  Index Scan
usingb1_i_idx on b_child1 b_parent 
(cost=0.00..11.32 rows=2 width=4)             Index Cond: (public.b_parent.i = public.a_parent.i)
(15 rows)


There was a similar case in 9.0.4 with WHERE i=1, but that has been
fixed in 9.0.7

Regards,
Marti

Re: Bogus nestloop rows estimate in 8.4.7

От

Marti Raudsepp

Дата:

28 мая 2012 г., 16:37:57

On Mon, May 28, 2012 at 10:32 PM, Marti Raudsepp <marti@juffo.org> wrote:
> There was a similar case in 9.0.4 with WHERE i=1, but that has been
> fixed in 9.0.7

Oh, it's been fixed in 9.0.7, but apparently not in 8.4.11; the empty
parent tables are confusing the estimate:

explain select * from a_parent join b_parent using (i) where i=1;

QUERY PLAN
Nested Loop  (cost=56.57..123.65 rows=224 width=4) ->  Append  (cost=0.00..62.60 rows=16 width=4)       ->  Seq Scan on
b_parent (cost=0.00..40.00 rows=12 width=4)             Filter: (i = 1)       ->  Index Scan using b2_i_idx on b_child2
b_parent
(cost=0.00..11.30 rows=2 width=4)             Index Cond: (i = 1)       ->  Index Scan using b1_i_idx on b_child1
b_parent
(cost=0.00..11.30 rows=2 width=4)             Index Cond: (i = 1) ->  Materialize  (cost=56.57..56.71 rows=14 width=4)
    ->  Append  (cost=0.00..56.56 rows=14 width=4)             ->  Seq Scan on a_parent  (cost=0.00..40.00 rows=12
width=4)                  Filter: (i = 1)             ->  Index Scan using a1_i_idx on a_child1 a_parent
 
(cost=0.00..8.28 rows=1 width=4)                   Index Cond: (i = 1)             ->  Index Scan using a2_i_idx on
a_child2a_parent
 
(cost=0.00..8.28 rows=1 width=4)                   Index Cond: (i = 1)

Regards,
Marti

Re: Bogus nestloop rows estimate in 8.4.7

От

Tom Lane

Дата:

28 мая 2012 г., 17:23:24

Marti Raudsepp <marti@juffo.org> writes:
> On Mon, May 28, 2012 at 10:32 PM, Marti Raudsepp <marti@juffo.org> wrote:
>> There was a similar case in 9.0.4 with WHERE i=1, but that has been
>> fixed in 9.0.7

> Oh, it's been fixed in 9.0.7, but apparently not in 8.4.11; the empty
> parent tables are confusing the estimate:

Hmm ... what your test case seems to be exhibiting is related to this:

Author: Tom Lane <tgl@sss.pgh.pa.us>
Branch: master [f3ff0433a] 2011-07-14 17:30:57 -0400
Branch: REL9_1_STABLE Release: REL9_1_0 [cf8245285] 2011-07-14 17:31:12 -0400
Branch: REL9_0_STABLE Release: REL9_0_5 [0dd46a776] 2011-07-14 17:31:25 -0400
   In planner, don't assume that empty parent tables aren't really empty.      There's a heuristic in
estimate_rel_size()to clamp the minimum size   estimate for a table to 10 pages, unless we can see that vacuum or
analyze  has been run (and set relpages to something nonzero, so this will always   happen for a table that's actually
empty). However, it would be better   not to do this for inheritance parent tables, which very commonly are   really
emptyand can be expected to stay that way.  Per discussion of a   recent pgsql-performance report from Anish Kejariwal.
Also prevent it   from happening for indexes (although this is more in the nature of   documentation, since CREATE
INDEXnormally initializes relpages to   something nonzero anyway).      Back-patch to 9.0, because the ability to
collectstatistics across a   whole inheritance tree has improved the planner's estimates to the point   where this
relativelysmall error makes a significant difference.  In the   referenced report, merge or hash joins were incorrectly
estimatedas   cheaper than a nestloop with inner indexscan on the inherited table.   That was less likely before 9.0
becausethe lack of inherited stats would   have resulted in a default (and rather pessimistic) estimate of the cost
ofa merge or hash join.

However, the error in your original example is far too large to be
explained by that, so I think it was tripping over something different.
When I run your test case in 8.4, I get
Nested Loop  (cost=0.00..1276.12 rows=28168 width=4)  Join Filter: (public.a_parent.i = public.b_parent.i)  ->  Append
(cost=0.00..62.57rows=14 width=4)        ->  Seq Scan on a_parent  (cost=0.00..46.00 rows=12 width=4)
Filter:((i >= 1) AND (i <= 2))        ->  Index Scan using a1_i_idx on a_child1 a_parent  (cost=0.00..8.29 rows=1
width=4)... ->  Append  (cost=0.00..56.63 rows=2404 width=4)        ->  Seq Scan on b_parent  (cost=0.00..34.00
rows=2400width=4)        ->  Index Scan using b1_i_idx on b_child1 b_parent  (cost=0.00..11.34 rows=2 width=4)...

and that join size estimate is not too out of line if you accept the
admittedly-bogus numbers for the appendrel sizes.  There seems to be
something else going on in your original example.
        regards, tom lane

Re: Bogus nestloop rows estimate in 8.4.7

От

Marti Raudsepp

Дата:

28 мая 2012 г., 19:13:43

On Mon, May 28, 2012 at 11:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> However, the error in your original example is far too large to be
> explained by that, so I think it was tripping over something different.

Good point. But I generated a bigger data set with the above test case
and it gets progressively worse with more rows and partitions. (The
original database has 2x4 billion rows in over 100 partitions)

Here's a bigger test case, 2GB total (will take a few minutes to
generate). It gives a total estimate of 3900158, even though the
Append nodes suggest 13x2406 rows.

create table a_parent (i int);
create table a_child1 () inherits (a_parent);
create table a_child2 () inherits (a_parent);
create table a_child3 () inherits (a_parent);
insert into a_child1 select generate_series(00000001,10000000);
insert into a_child2 select generate_series(10000001,20000000);
insert into a_child3 select generate_series(20000001,30000000);
create index a1_i_idx on a_child1(i);
create index a2_i_idx on a_child2(i);
create index a3_i_idx on a_child3(i);
alter table a_child1 add check (i between 00000001 and 10000000);
alter table a_child2 add check (i between 10000001 and 20000000);
alter table a_child3 add check (i between 20000001 and 30000000);

create table b_parent (i int);
create table b_child1 () inherits (b_parent);
create table b_child2 () inherits (b_parent);
create table b_child3 () inherits (b_parent);
insert into b_child1 select generate_series(00000001,10000000);
insert into b_child1 select generate_series(00000001,10000000);
insert into b_child2 select generate_series(10000001,20000000);
insert into b_child2 select generate_series(10000001,20000000);
insert into b_child3 select generate_series(20000001,30000000);
insert into b_child3 select generate_series(20000001,30000000);
create index b1_i_idx on b_child1(i);
create index b2_i_idx on b_child2(i);
create index b3_i_idx on b_child3(i);
alter table b_child1 add check (i between 00000001 and 10000000);
alter table b_child2 add check (i between 10000001 and 20000000);
alter table b_child3 add check (i between 20000001 and 30000000);

analyze;
explain select * from a_parent join b_parent using (i) where i between 1 and 2;

Nested Loop  (cost=0.00..1413.71 rows=3900158 width=4) Join Filter: (public.a_parent.i = public.b_parent.i) ->  Append
(cost=0.00..55.37rows=13 width=4)       ->  Seq Scan on a_parent  (cost=0.00..46.00 rows=12 width=4)
Filter:((i >= 1) AND (i <= 2))       ->  Index Scan using a1_i_idx on a_child1 a_parent
 
(cost=0.00..9.37 rows=1 width=4)             Index Cond: ((i >= 1) AND (i <= 2)) ->  Append  (cost=0.00..74.41
rows=2406width=4)       ->  Seq Scan on b_parent  (cost=0.00..34.00 rows=2400 width=4)       ->  Index Scan using
b1_i_idxon b_child1 b_parent
 
(cost=0.00..13.43 rows=2 width=4)             Index Cond: (public.b_parent.i = public.a_parent.i)       ->  Index Scan
usingb2_i_idx on b_child2 b_parent
 
(cost=0.00..13.50 rows=2 width=4)             Index Cond: (public.b_parent.i = public.a_parent.i)       ->  Index Scan
usingb3_i_idx on b_child3 b_parent
 
(cost=0.00..13.48 rows=2 width=4)             Index Cond: (public.b_parent.i = public.a_parent.i)

Regards,
Marti

Re: Bogus nestloop rows estimate in 8.4.7

От

Tom Lane

Дата:

28 мая 2012 г., 19:37:18

Marti Raudsepp <marti@juffo.org> writes:
> On Mon, May 28, 2012 at 11:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> However, the error in your original example is far too large to be
>> explained by that, so I think it was tripping over something different.

> Good point. But I generated a bigger data set with the above test case
> and it gets progressively worse with more rows and partitions. (The
> original database has 2x4 billion rows in over 100 partitions)

> Here's a bigger test case, 2GB total (will take a few minutes to
> generate). It gives a total estimate of 3900158, even though the
> Append nodes suggest 13x2406 rows.

On reflection I think this is an artifact of the lack of
inheritance-tree stats in 8.4.  The estimated size of the join does
*not* come from the product of the two appendrel sizes shown in EXPLAIN,
because the inner one is a inner indexscan using a parameter from the
outer side (what we would now call a parameterized path).  Rather, the
estimated size is join selectivity times outer relation size times inner
relation size.  The outer relation size, after applying its restriction
clause, is indeed only 13 rows, but the inner relation size is 60e6 rows
because it has no restriction clause.  If we had an accurate join
selectivity estimate that'd be fine, but for lack of any stats about the
inheritance tree eqjoinsel just punts and returns DEFAULT_EQ_SEL, ie
0.005.  And that works out to your result.

So, nothing to see here ... 8.4 is just not very good with this type
of problem.
        regards, tom lane

Re: Bogus nestloop rows estimate in 8.4.7

От

Marti Raudsepp

Дата:

29 мая 2012 г., 05:07:21

On Tue, May 29, 2012 at 1:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So, nothing to see here ... 8.4 is just not very good with this type
> of problem.

Fair enough, thanks for your time. :)

Regards,
Marti

Re: Bogus nestloop rows estimate in 8.4.7

От

Robert Haas

Дата:

29 мая 2012 г., 13:22:33

On Mon, May 28, 2012 at 6:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Marti Raudsepp <marti@juffo.org> writes:
>> On Mon, May 28, 2012 at 11:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> However, the error in your original example is far too large to be
>>> explained by that, so I think it was tripping over something different.
>
>> Good point. But I generated a bigger data set with the above test case
>> and it gets progressively worse with more rows and partitions. (The
>> original database has 2x4 billion rows in over 100 partitions)
>
>> Here's a bigger test case, 2GB total (will take a few minutes to
>> generate). It gives a total estimate of 3900158, even though the
>> Append nodes suggest 13x2406 rows.
>
> On reflection I think this is an artifact of the lack of
> inheritance-tree stats in 8.4.  The estimated size of the join does
> *not* come from the product of the two appendrel sizes shown in EXPLAIN,
> because the inner one is a inner indexscan using a parameter from the
> outer side (what we would now call a parameterized path).  Rather, the
> estimated size is join selectivity times outer relation size times inner
> relation size.  The outer relation size, after applying its restriction
> clause, is indeed only 13 rows, but the inner relation size is 60e6 rows
> because it has no restriction clause.  If we had an accurate join
> selectivity estimate that'd be fine, but for lack of any stats about the
> inheritance tree eqjoinsel just punts and returns DEFAULT_EQ_SEL, ie
> 0.005.  And that works out to your result.

Hmm, but isn't this a case of the left hand not knowing what the right
hand is doing?  I mean, somehow we have enough information to estimate
that the index scans on b{1,2,3} are going to produce 2 rows per
execution, but having figured that out (correctly) we then proceed to
ignore it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Bogus nestloop rows estimate in 8.4.7

От

Tom Lane

Дата:

29 мая 2012 г., 13:36:23

Robert Haas <robertmhaas@gmail.com> writes:
> Hmm, but isn't this a case of the left hand not knowing what the right
> hand is doing?  I mean, somehow we have enough information to estimate
> that the index scans on b{1,2,3} are going to produce 2 rows per
> execution, but having figured that out (correctly) we then proceed to
> ignore it.

Well, if you wanted to do that, what it would amount to is saying that
the estimated size of the join relation is going to change depending on
which implementation path you consider for it, which would be a mess,
not to mention mathematically silly.  You could possibly choose to use
the size from one path that you believe more than the other paths, but
on what grounds will you make that choice; and then will you go back and
change your estimates for the paths you already did costing for?

We have bitten off some of this for cases involving join relations that
are still parameterized, but I don't think it makes much sense for
non-parameterized join paths.  And even in the parameterized case, we
are careful to use the same size estimate for all implementation paths
that yield the same parameterization.
        regards, tom lane

Re: Bogus nestloop rows estimate in 8.4.7

От

Robert Haas

Дата:

29 мая 2012 г., 13:55:24

On Tue, May 29, 2012 at 12:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Hmm, but isn't this a case of the left hand not knowing what the right
>> hand is doing?  I mean, somehow we have enough information to estimate
>> that the index scans on b{1,2,3} are going to produce 2 rows per
>> execution, but having figured that out (correctly) we then proceed to
>> ignore it.
>
> Well, if you wanted to do that, what it would amount to is saying that
> the estimated size of the join relation is going to change depending on
> which implementation path you consider for it, which would be a mess,
> not to mention mathematically silly.

Well, I do not understand this code in depth (can you tell?) but it
seems to me that we are effectively computing the size of the inner
relation in two different ways in two different parts of the code.
Whatever logic the index-scan-path is using to estimate how many rows
are going to pop out is obviously much more accurate - in this
instance - than the join selectivity estimator.  I can't help wishing
the more accurate logic were somehow being used here, without having a
very good idea what it would take to make that happen.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Bogus nestloop rows estimate in 8.4.7

От

Tom Lane

Дата:

29 мая 2012 г., 14:27:57

Robert Haas <robertmhaas@gmail.com> writes:
> Well, I do not understand this code in depth (can you tell?) but it
> seems to me that we are effectively computing the size of the inner
> relation in two different ways in two different parts of the code.

That's because we are considering two different definitions of the inner
relation: one that is stand-alone (the actual table size times the
selectivity of any non-join restriction clauses for that table), versus
one that is parameterized by means of applying an additional join clause
during the scan.  It's entirely expected that the latter estimate will
be smaller.  But that only applies to the number of rows produced by the
table scan; it should not affect our conclusions about how big the join
result is, because the same join clause is being used whether we do it
like this or as a regular join over an unparameterized table scan.

> Whatever logic the index-scan-path is using to estimate how many rows
> are going to pop out is obviously much more accurate - in this
> instance - than the join selectivity estimator.

The reason for that is that in 8.4 we don't have any good method for
making join selectivity estimates involving appendrels.  That's not a
structural problem, just an omission in the selectivity support; which
has been rectified in 9.0 and up.
        regards, tom lane

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Bogus nestloop rows estimate in 8.4.7

Вложения