Incorrect choice of Nested Loop for a skewed distribution
От | Oleg Kharin |
---|---|
Тема | Incorrect choice of Nested Loop for a skewed distribution |
Дата | |
Msg-id | 9df40087-f396-143d-5e4a-da3c97b5e536@uvadrev.ru обсуждение исходный текст |
Ответы |
Re: Incorrect choice of Nested Loop for a skewed distribution
(Justin Pryzby <pryzby@telsasoft.com>)
|
Список | pgsql-performance |
Hi All, With PostgreSQL 10 and 11, the planner doesn't use the lists of most common values to determine the selectivity of "=" for Nested Loop as it does for a normal inner join in eqjoinsel_inner(). Incorrect choice of a nested loops join strategy causes poor query performance. To demonstrate it one can make the following test case: create table t(f integer not null,g integer not null); create table u(f integer not null,g integer not null); create sequence s cache 1000; insert into t select 0,s from (select nextval('s') as s) as d; insert into t select 0,s from (select nextval('s') as s) as d; insert into t select 0,s from (select nextval('s') as s from t,t t1,t t2) as d; insert into t select 0,s from (select nextval('s') as s from t,t t1,t t2,t t3) as d; insert into t(f,g) select g,f from t; insert into u select * from t; create index t_f on t(f); vacuum analyze; The columns f and g of both tables t and u have a skewed distribution: 10010 values of 0 and 10010 unique values starting from 1. Let's see query plan for the join of t and u: explain analyze select * from t,u where t.f=u.f and t.g=u.g; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.29..7629.95 rows=25055030 width=16) (actual time=0.042..22540.629 rows=20020 loops=1) -> Seq Scan on u (cost=0.00..289.20 rows=20020 width=8) (actual time=0.011..3.025 rows=20020 loops=1) -> Index Scan using t_f on t (cost=0.29..0.36 rows=1 width=8) (actual time=0.565..1.125 rows=1 loops=20020) Index Cond: (f = u.f) Filter: (u.g = g) Rows Removed by Filter: 5004 Planning Time: 0.394 ms Execution Time: 22542.639 ms After dropping the index drop index t_f; we obtain much better query plan (without Nested Loop): QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Merge Join (cost=3439.09..442052.26 rows=25055030 width=16) (actual time=15.708..32.735 rows=20020 loops=1) Merge Cond: ((t.f = u.f) AND (t.g = u.g)) -> Sort (cost=1719.54..1769.59 rows=20020 width=8) (actual time=8.189..10.189 rows=20020 loops=1) Sort Key: t.f, t.g Sort Method: quicksort Memory: 1707kB -> Seq Scan on t (cost=0.00..289.20 rows=20020 width=8) (actual time=0.012..2.958 rows=20020 loops=1) -> Sort (cost=1719.54..1769.59 rows=20020 width=8) (actual time=7.510..9.459 rows=20020 loops=1) Sort Key: u.f, u.g Sort Method: quicksort Memory: 1707kB -> Seq Scan on u (cost=0.00..289.20 rows=20020 width=8) (actual time=0.008..2.748 rows=20020 loops=1) Planning Time: 0.239 ms Execution Time: 34.585 ms Using MCV lists in var_eq_non_const() would prevent from choosing Nested Loop in such cases. Regards, Oleg Kharin
В списке pgsql-performance по дате отправления:
Предыдущее
От: Merlin MoncureДата:
Сообщение: Re: UPGRADE TO PG11 CAUSED DEGREDATION IN PERFORMANCE