BUG #17207: Bad cost estimate of Merge Join despite correct row estimate

Поиск
Список
Период
Сортировка
От PG Bug reporting form
Тема BUG #17207: Bad cost estimate of Merge Join despite correct row estimate
Дата
Msg-id 17207-5265aefa79e333b4@postgresql.org
обсуждение исходный текст
Ответы Re: BUG #17207: Bad cost estimate of Merge Join despite correct row estimate  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
The following bug has been logged on the website:

Bug reference:      17207
Logged by:          Simon Perepelitsa
Email address:      sema@sema.in
PostgreSQL version: 13.4
Operating system:   macOS 10.15.7
Description:

Setup

    create table test_users (id serial primary key);
    create table test_user_sessions (id serial primary key, user_id int not
null references test_users(id));
    insert into test_users (id) select generate_series(1, 500000) as id;
    insert into test_user_sessions (user_id) values
      (1),
      (1),
      (500000);
    vacuum verbose analyze test_users, test_user_sessions;

Query

    explain analyze
    select test_user_sessions.id from test_user_sessions join test_users on
user_id = test_users.id;

    Merge Join  (cost=1.49..1.55 rows=3 width=4) (actual time=0.015..71.034
rows=3 loops=1)
      Merge Cond: (test_users.id = test_user_sessions.user_id)
      ->  Index Only Scan using test_users_pkey on test_users
(cost=0.42..12996.42 rows=500000 width=4) (actual time=0.005..40.040
rows=500000 loops=1)
            Heap Fetches: 0
      ->  Sort  (cost=1.05..1.06 rows=3 width=8) (actual time=0.009..0.010
rows=3 loops=1)
            Sort Key: test_user_sessions.user_id
            Sort Method: quicksort  Memory: 25kB
            ->  Seq Scan on test_user_sessions  (cost=0.00..1.03 rows=3
width=8) (actual time=0.004..0.005 rows=3 loops=1)
    Planning Time: 0.106 ms
    Execution Time: 71.061 ms

After set enable_mergejoin = false

    Nested Loop  (cost=0.42..14.35 rows=3 width=4) (actual time=0.019..0.025
rows=3 loops=1)
      ->  Seq Scan on test_user_sessions  (cost=0.00..1.03 rows=3 width=8)
(actual time=0.006..0.006 rows=3 loops=1)
      ->  Index Only Scan using test_users_pkey on test_users
(cost=0.42..4.44 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=3)
            Index Cond: (id = test_user_sessions.user_id)
            Heap Fetches: 0
    Planning Time: 0.078 ms
    Execution Time: 0.040 ms

As you can see, Merge Join adds just 0.5 cost on top of Seq Scan ignoring
the high cost of full index scan (0.42..12996.42). If explicitly disabled,
Nested Loop is obviously a much better join plan for such a small table (3
rows).

While it is possible for Merge Join to also finish quickly - if user_id are
all low numbers - I'm not sure if that's a realistic expectation for the
default plan.

I also tried rewriting it as a semi-join with exists/in, but the query plans
were exactly the same. Not sure why, because in some of my other queries
this makes the planner use more optimized "Semi-Join" instructions (e.g.
Nested Loop Semi-Join).

    explain analyze
    select test_user_sessions.id from test_user_sessions where exists
(select 1 from test_users where user_id = test_users.id);


В списке pgsql-bugs по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: BUG #17206: the function array_cat(anyarray, anyarray) does not exist
Следующее
От: Tom Lane
Дата:
Сообщение: Re: BUG #17206: the function array_cat(anyarray, anyarray) does not exist