[PERFORM] Rowcount estimation changes based on from clause order

Поиск

Список

Период

Сортировка

От	Ants Aasma
Тема	[PERFORM] Rowcount estimation changes based on from clause order
Дата	11 октября 2017 г. 12:33:31
Msg-id	CA+CSw_vitKcNDw+6zgSttoHRu+ZrKT3a3C5YzT+JyPuTRyLC_w@mail.gmail.com обсуждение исходный текст
Ответы	Re: [PERFORM] Rowcount estimation changes based on from clause order
Список	pgsql-performance

Дерево обсуждения

I stumbled upon a severe row count underestimation that confusingly
went away when two inner joins in the from clause were reordered. I
whittled it down to a reproducible test case.

Schema:

CREATE TABLE small (id serial primary key, ref_id int not null, subset
int not null);
CREATE TABLE big (id serial primary key, small_id int not null);

INSERT INTO small (ref_id, subset) SELECT i/2+1, i/2+1 FROM
generate_series(1,1000) i;
INSERT INTO big (small_id) SELECT (i % 1000) + 1 FROM
generate_series(1,1000000) i;

CREATE INDEX ON small (ref_id);
CREATE INDEX ON big (small_id);

ANALYZE;

And the queries, differing in only the order of joins:

SELECT * FROM     small     INNER JOIN big ON small.id = big.small_id     INNER JOIN (SELECT 1 UNION ALL SELECT 2)
lookup(ref)ON
 
lookup.ref = small.ref_id
WHERE small.subset = 42;

SELECT * FROM     small     INNER JOIN (SELECT 1 UNION ALL SELECT 2) lookup(ref) ON
lookup.ref = small.ref_id     INNER JOIN big ON small.id = big.small_id
WHERE small.subset = 42;

Resulting plan for the first case:Nested Loop  (cost=20.45..2272.13 rows=8 width=24)  ->  Nested Loop
(cost=0.28..16.69rows=1 width=16)        ->  Append  (cost=0.00..0.04 rows=2 width=4)              ->  Result
(cost=0.00..0.01rows=1 width=4)              ->  Result  (cost=0.00..0.01 rows=1 width=4)        ->  Index Scan using
small_ref_id_idxon small
 
(cost=0.28..8.32 rows=1 width=12)              Index Cond: (ref_id = (1))              Filter: (subset = 42)  ->
BitmapHeap Scan on big  (cost=20.18..2245.44 rows=1000 width=8)        Recheck Cond: (small_id = small.id)        ->
BitmapIndex Scan on big_small_id_idx  (cost=0.00..19.93
 
rows=1000 width=0)              Index Cond: (small_id = small.id)

Second case plan is identical except row count of the topmost nest loop:Nested Loop  (cost=20.45..2272.13 rows=1000
width=24)

The union subselect was in reality somewhat more complicated, but for
the row count issue the simplification does not seem to matter. The
behavior is seen both on 9.4 and on master.

Does anybody have any idea what is going on here? In the real world
case this is based on the estimation was 5 rows instead of 200k, which
resulted in quite bad plan choices downstream.

Regards,
Ants Aasma


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

[PERFORM] Rowcount estimation changes based on from clause order