Обсуждение: Suboptimal plan when IN(...), ORDER BY and LIMIT are used (no JOINs)

Поиск
Список
Период
Сортировка

Suboptimal plan when IN(...), ORDER BY and LIMIT are used (no JOINs)

От
Dmitry Koterov
Дата:
Hi.

I'm trying to understand the logic which the planner uses in "WHERE x IN (IDS) ORDER BY y LIMIT N" queries when the correct index exists in the database.

I expected that, if IDS list is small and N is small too, the planner should've done the following: for each element in IDS, query first N items from the index, then union the results (up to IDS*N elements, thus small) and limit it by N items.

Instead, the planner decides to run a bitmap index scan, fetch thousands of rows, and then post-filter most of them. Why? Is it possible to somehow tell the planner to use individual first-N fetches?

(SET STATISTICS to 10000 for both columns doesn't change anything; also I don't see how cardinality of any of these fields can theoretically affect the plan: we still need first N items from each of the index sub-parts.)

CREATE TABLE roles(
  id bigint NOT NULL,
  id1 bigint,
  created_at timestamptz NOT NULL
);
CREATE INDEX ON roles(id1, created_at DESC);

EXPLAIN ANALYZE SELECT * FROM roles
WHERE id1 IN(
  '1001361878439251615', '1001349402553202617', '1001329448424677858',
  '1001348457743394950', '1001361706624116300', '1001338330225145648',
  '1001363186688934748', '1001366841628692013'
)
ORDER BY created_at DESC LIMIT 50

Limit  (cost=50171.99..50177.83 rows=50 width=42) (actual time=206.056..208.865 rows=50 loops=1)   ->  Gather Merge  (cost=50171.99..57802.99 rows=65404 width=42) (actual time=206.055..208.857 rows=50 loops=1)         Workers Planned: 2         Workers Launched: 2         ->  Sort  (cost=49171.97..49253.73 rows=32702 width=42) (actual time=198.944..198.948 rows=40 loops=3)               Sort Key: created_at DESC               Sort Method: top-N heapsort  Memory: 31kB               Worker 0:  Sort Method: top-N heapsort  Memory: 30kB               Worker 1:  Sort Method: top-N heapsort  Memory: 32kB               ->  Parallel Bitmap Heap Scan on roles  (cost=4209.08..48085.63 rows=32702 width=42) (actual time=78.119..180.352 rows=60563 loops=3)                     Recheck Cond: (id1 = ANY ('{1001361878439251615,1001349402553202617,1001329448424677858,1001348457743394950,1001361706624116300,1001338330225145648,1001363186688934748,1001366841628692013}'::bigint[]))                     Filter: (cred_id = '1001344096118566254'::bigint)                     Rows Removed by Filter: 526                     Heap Blocks: exact=5890                     ->  Bitmap Index Scan on roles_asset_created_desc  (cost=0.00..4189.46 rows=182139 width=0) (actual time=73.761..73.761 rows=183934 loops=1)                           Index Cond: (id1 = ANY ('{1001361878439251615,1001349402553202617,1001329448424677858,1001348457743394950,1001361706624116300,1001338330225145648,1001363186688934748,1001366841628692013}'::bigint[])) Planning Time: 0.590 ms Execution Time: 208.935 ms

Re: Suboptimal plan when IN(...), ORDER BY and LIMIT are used (no JOINs)

От
Michael Lewis
Дата:
Your query and explain analyze output do not seem to match.

Filter: (cred_id = '1001344096118566254'::bigint)

I don't see anything like that in your query, nor an index that would support accomplishing that without filtering after fetching the 184k rows initially like the planner does.

Re: Suboptimal plan when IN(...), ORDER BY and LIMIT are used (no JOINs)

От
Dmitry Koterov
Дата:
Yeah, that was a plan for a query before its simplification. But effect is still the same, and also the question is still the same - why a bitmap scan is preferred over a number of individual index scans with fetching first 50 elements from each. (Also, replacing LIMIT 50 to LIMIT 2 doesn't seem to change anything, although having 2 here should logically make it prefer 8 index scans with selecting 2 first elements from each over selecting 186051 rows in one bitmap index scan.)

Here is the plan for the exact query with LIMIT=2:

CREATE TABLE roles(
  id bigint NOT NULL,
  id1 bigint,
  created_at timestamptz NOT NULL
);
CREATE INDEX roles_asset_created_desc ON roles(id1, created_at DESC);

EXPLAIN ANALYZE SELECT * FROM roles
WHERE id1 IN(
  '1001361878439251615', '1001349402553202617', '1001329448424677858',
  '1001348457743394950', '1001361706624116300', '1001338330225145648',
  '1001363186688934748', '1001366841628692013'
)
ORDER BY created_at DESC LIMIT 2;

 Limit  (cost=49712.75..49712.99 rows=2 width=42) (actual time=82.611..83.462 rows=2 loops=1)
   ->  Gather Merge  (cost=49712.75..67421.89 rows=151782 width=42) (actual time=82.611..83.459 rows=2 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Sort  (cost=48712.73..48902.46 rows=75891 width=42) (actual time=70.404..70.404 rows=2 loops=3)
               Sort Key: created_at DESC
               Sort Method: top-N heapsort  Memory: 25kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 25kB
               Worker 1:  Sort Method: top-N heapsort  Memory: 25kB
               ->  Parallel Bitmap Heap Scan on roles  (cost=4266.99..47953.82 rows=75891 width=42) (actual time=7.854..57.664 rows=61255 loops=3)
                     Recheck Cond: (id1 = ANY ('{1001361878439251615,1001349402553202617,1001329448424677858,1001348457743394950,1001361706624116300,1001338330225145648,1001363186688934748,1001366841628692013}'::bigint[]))
                     Heap Blocks: exact=6886
                     ->  Bitmap Index Scan on roles_asset_created_desc  (cost=0.00..4221.46 rows=182139 width=0) (actual time=14.031..14.031 rows=186051 loops=1)
                           Index Cond: (id1 = ANY ('{1001361878439251615,1001349402553202617,1001329448424677858,1001348457743394950,1001361706624116300,1001338330225145648,1001363186688934748,1001366841628692013}'::bigint[]))
 Planning Time: 0.074 ms
 Execution Time: 83.491 ms


On Wed, Apr 14, 2021 at 5:41 PM Michael Lewis <mlewis@entrata.com> wrote:
Your query and explain analyze output do not seem to match.

Filter: (cred_id = '1001344096118566254'::bigint)

I don't see anything like that in your query, nor an index that would support accomplishing that without filtering after fetching the 184k rows initially like the planner does.