Re: increasing effective_cache_size slows down join queries by a factor of 4000x

Поиск
Список
Период
Сортировка
От A Shaposhnikov
Тема Re: increasing effective_cache_size slows down join queries by a factor of 4000x
Дата
Msg-id CA+1Wm9UY+8q8abkcMOBmTcSRdHiufKJ0vwHK=QAPmJFBiTxijQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: increasing effective_cache_size slows down join queries by a factor of 4000x  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы Re: increasing effective_cache_size slows down join queries by a factor of 4000x  (Michael Lewis <mlewis@entrata.com>)
Re: increasing effective_cache_size slows down join queries by a factor of 4000x  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Список pgsql-general

I started using the latest postgres 14.2 and the query plans there for a simplified query joining just 2 tables look like:


 explain analyze select t.*, ta.* from team as t, team_aliases as ta where ta.team_id = t.id and t.id > 200000000 order by t.id limit 1000;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------

Limit (cost=81.33..331.82 rows=1000 width=183) (actual time=31328.561..31329.565 rows=1000 loops=1)

-> Merge Join (cost=81.33..17727020.90 rows=70768783 width=183) (actual time=31328.560..31329.498 rows=1000 loops=1)

Merge Cond: (t.id = ta.team_id)

-> Index Scan using team_pkey on team t (cost=0.57..11382381.88 rows=78693167 width=175) (actual time=0.016..0.466 rows=854 loops=1)

Index Cond: (id > 205284974)

-> Index Scan using fki_team_fk on team_aliases ta (cost=0.57..4893726.20 rows=218604096 width=8) (actual time=0.010..22172.405 rows=151321830 loops=1)

Planning Time: 0.472 ms

Execution Time: 31329.654 ms

(8 rows)


Now if I add the second condition as Tomas suggested, it speeds it up by factor of 15,000:

explain analyze select t.*, ta.* from team as t, team_aliases as ta where ta.team_id = t.id and t.id > 200000000 and ta.team_id > 200000000 order by t.id limit 1000;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------

Limit (cost=81.33..720.48 rows=1000 width=183) (actual time=0.073..1.909 rows=1000 loops=1)

-> Merge Join (cost=81.33..13859983.71 rows=21685030 width=183) (actual time=0.072..1.756 rows=1000 loops=1)

Merge Cond: (t.id = ta.team_id)

-> Index Scan using team_pkey on team t (cost=0.57..11382381.88 rows=78693167 width=175) (actual time=0.016..0.695 rows=854 loops=1)

Index Cond: (id > 205284974)

-> Index Scan using fki_team_fk on team_aliases ta (cost=0.57..1896563.38 rows=66984851 width=8) (actual time=0.052..0.356 rows=1000 loops=1)

Index Cond: (team_id > 205284974)

Planning Time: 0.503 ms

Execution Time: 2.056 ms

(9 rows)



Interestingly I have a second PG 14.2 database, with identical table definitions, but about 10% smaller row counts, and the exact same query works fast there without the 2nd condition:


explain analyze select t.*, ta.* from team as t, team_aliases as ta where ta.team_id = t.id and t.id > 200000000 order by t.id limit 1000;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------

Limit (cost=1.14..1700.75 rows=1000 width=168) (actual time=0.062..4.841 rows=1000 loops=1)

-> Nested Loop (cost=1.14..5685654.91 rows=3345265 width=168) (actual time=0.060..4.700 rows=1000 loops=1)

-> Index Scan using team_pkey on team t (cost=0.57..2366113.83 rows=2807531 width=160) (actual time=0.031..0.801 rows=888 loops=1)

Index Cond: (id > 205284974)

-> Index Scan using fki_team_fk on team_aliases ta (cost=0.57..1.14 rows=4 width=8) (actual time=0.003..0.004 rows=1 loops=888)

Index Cond: (team_id = t.id)

Planning Time: 0.559 ms

Execution Time: 4.953 ms


In both databases I run the "vacuum full analyze" on all tables. I am not sure why it decides to check the join condition much later in the plan when the query runs slow? Basically, it seems to randomly decide to produce two huge multimillion row tables and then merge them on the join condition instead of looping 1000 times using indexes. Obviously, the optimizer grossly miscalculates the costs, how can we help it?




On Fri, Feb 4, 2022 at 9:01 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
On 2/4/22 05:21, A Shaposhnikov wrote:
> Tomas,
>
> thank you! The query:
>
> select 1 from data as d, data_class as dc
>      where dc.data_id = d.id and d.id > 205284974
>        and dc.data_id > 205284974     -- new condition
>     order by d.id
>     limit 1000;
>
> totally solved it - it is now fast under all conditions! I thought
> that the optimizer would be able to infer it itself.
>

Unfortunately, the optimizer is not that smart - we can do that for
equality conditions, but not for other operators. There was actually a
thread [1] exploring a possibility to extend this to inequalities, but
it went nowhere so far. It also explains why it's done only for equality
operators. In short, it's fairly expensive, makes costing of joins more
difficult, and most queries can't benefit from it (because conditions on
join keys are not that common).

BTW how does the final query plan look like? Is it using the merge sort
of nested loop? I wonder if this might be formulated as a costing issue,
pushing the planner to use the nested loop.


[1]
https://www.postgresql.org/message-id/flat/CAFQUnFhqkWuPCwQ1NmHYrisHJhYx4DoJak-dV%2BFcjyY6scooYA%40mail.gmail.com


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-general по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Rows From but with Subqueries (or a cleaner non-array-using alternative)?
Следующее
От: Michael Lewis
Дата:
Сообщение: Re: increasing effective_cache_size slows down join queries by a factor of 4000x