Suboptimal plan when IN(...), ORDER BY and LIMIT are used (no JOINs)

Поиск
Список
Период
Сортировка
От Dmitry Koterov
Тема Suboptimal plan when IN(...), ORDER BY and LIMIT are used (no JOINs)
Дата
Msg-id CA+CZih6sM_4391XbXB7eBoXKHz4eFoX6YooOC0mSb-sxF4pX6A@mail.gmail.com
обсуждение исходный текст
Ответы Re: Suboptimal plan when IN(...), ORDER BY and LIMIT are used (no JOINs)  (Michael Lewis <mlewis@entrata.com>)
Список pgsql-general
Hi.

I'm trying to understand the logic which the planner uses in "WHERE x IN (IDS) ORDER BY y LIMIT N" queries when the correct index exists in the database.

I expected that, if IDS list is small and N is small too, the planner should've done the following: for each element in IDS, query first N items from the index, then union the results (up to IDS*N elements, thus small) and limit it by N items.

Instead, the planner decides to run a bitmap index scan, fetch thousands of rows, and then post-filter most of them. Why? Is it possible to somehow tell the planner to use individual first-N fetches?

(SET STATISTICS to 10000 for both columns doesn't change anything; also I don't see how cardinality of any of these fields can theoretically affect the plan: we still need first N items from each of the index sub-parts.)

CREATE TABLE roles(
  id bigint NOT NULL,
  id1 bigint,
  created_at timestamptz NOT NULL
);
CREATE INDEX ON roles(id1, created_at DESC);

EXPLAIN ANALYZE SELECT * FROM roles
WHERE id1 IN(
  '1001361878439251615', '1001349402553202617', '1001329448424677858',
  '1001348457743394950', '1001361706624116300', '1001338330225145648',
  '1001363186688934748', '1001366841628692013'
)
ORDER BY created_at DESC LIMIT 50

Limit  (cost=50171.99..50177.83 rows=50 width=42) (actual time=206.056..208.865 rows=50 loops=1)   ->  Gather Merge  (cost=50171.99..57802.99 rows=65404 width=42) (actual time=206.055..208.857 rows=50 loops=1)         Workers Planned: 2         Workers Launched: 2         ->  Sort  (cost=49171.97..49253.73 rows=32702 width=42) (actual time=198.944..198.948 rows=40 loops=3)               Sort Key: created_at DESC               Sort Method: top-N heapsort  Memory: 31kB               Worker 0:  Sort Method: top-N heapsort  Memory: 30kB               Worker 1:  Sort Method: top-N heapsort  Memory: 32kB               ->  Parallel Bitmap Heap Scan on roles  (cost=4209.08..48085.63 rows=32702 width=42) (actual time=78.119..180.352 rows=60563 loops=3)                     Recheck Cond: (id1 = ANY ('{1001361878439251615,1001349402553202617,1001329448424677858,1001348457743394950,1001361706624116300,1001338330225145648,1001363186688934748,1001366841628692013}'::bigint[]))                     Filter: (cred_id = '1001344096118566254'::bigint)                     Rows Removed by Filter: 526                     Heap Blocks: exact=5890                     ->  Bitmap Index Scan on roles_asset_created_desc  (cost=0.00..4189.46 rows=182139 width=0) (actual time=73.761..73.761 rows=183934 loops=1)                           Index Cond: (id1 = ANY ('{1001361878439251615,1001349402553202617,1001329448424677858,1001348457743394950,1001361706624116300,1001338330225145648,1001363186688934748,1001366841628692013}'::bigint[])) Planning Time: 0.590 ms Execution Time: 208.935 ms

В списке pgsql-general по дате отправления:

Предыдущее
От: "LE MENTEC, SANDRINE"
Дата:
Сообщение: RE: looking for a installation package to Using GSSAPI with Postgres12 for windows
Следующее
От: Kevin Brannen
Дата:
Сообщение: RE: Ways to "serialize" result set for later use?