Use of additional index columns in rows filtering

Поиск
Список
Период
Сортировка
От Maxim Ivanov
Тема Use of additional index columns in rows filtering
Дата
Msg-id N1xaIrU29uk5YxLyW55MGk5fz9s6V2FNtj54JRaVlFbPixD5z8sJ07Ite5CvbWwik8ZvDG07oSTN-usENLVMq2UAcizVTEd5b-o16ZGDIIU=@yamlcoder.me
обсуждение исходный текст
Ответы Re: Use of additional index columns in rows filtering
Список pgsql-hackers
Hi All,

I'd like to report what seems to be a missing optimization opportunity or understand why it is not possible to achieve.

TLDR; additional index column B specified in CREATE INDEX .. (A) INCLUDE(B) is not used to filter rows in queries like
WHEREB = $1 ORDER BY A during IndexScan. https://dbfiddle.uk/iehtq44L 


Take following database:


  CREATE TABLE t(
    a integer NOT NULL,
    b integer NOT NULL,
    d integer NOT NULL
  );

  CREATE UNIQUE INDEX t_a_include_b ON t (a) INCLUDE (b);
  -- I'd expect index above to behave as index below for the purpose
  -- of this query
  -- CREATE UNIQUE INDEX ON t(a,b);

  INSERT INTO t(
    SELECT random() * 100000000 as a,
           random() * 3 as b,
           generate_series as d FROM generate_series(1,200000)
  ) ON CONFLICT DO NOTHING;


If we filter on `a` and `b` columns while scanning index created as `(a) INCLUDE (b)` it seems to be fetching tuples
fromheap to check for condition `b = 4` despite both columns available in the index: 

    SELECT * FROM  t WHERE a > 1000000 and b = 4 ORDER BY a ASC LIMIT 10;


Here is the plan (notice high "shared hit"):

  Limit  (cost=0.42..10955.01 rows=1 width=12) (actual time=84.283..84.284 rows=0 loops=1)
    Output: a, b, d
    Buffers: shared hit=198307
    ->  Index Scan using t_a_include_b on public.t (cost=0.42..10955.01 rows=1 width=12) (actual time=84.280..84.281
rows=0loops=1) 
          Output: a, b, d
          Index Cond: (t.a > 1000000)
          Filter: (t.b = 4)
          Rows Removed by Filter: 197805
          Buffers: shared hit=198307
  Planning:
    Buffers: shared hit=30
  Planning Time: 0.201 ms
  Execution Time: 84.303 ms


And here is the plan with index on (a,b).

  Limit  (cost=0.42..4447.90 rows=1 width=12) (actual time=6.883..6.884 rows=0 loops=1)
    Output: a, b, d
    Buffers: shared hit=613
    ->  Index Scan using t_a_b_idx on public.t  (cost=0.42..4447.90 rows=1 width=12) (actual time=6.880..6.881 rows=0
loops=1)
          Output: a, b, d
          Index Cond: ((t.a > 1000000) AND (t.b = 4))
          Buffers: shared hit=613
  Planning:
    Buffers: shared hit=41
  Planning Time: 0.314 ms
  Execution Time: 6.910 ms


Because query doesn't sort on `b`, only filters on it while sorting on `a`, I'd expect indexes `(a) INCLUDE (b)` and
`(a,b)`behave exactly the same with this particular query. 

Interestingly, IndexOnlyScan is capable of using additional columns to filter rows without fetching them from the heap,
butonly for visibible tuples: 

  VACUUM FREEZE t;
  SELECT a,b FROM  t WHERE a > 1000000 and b = 4 ORDER BY a ASC LIMIT 10;

  Limit  (cost=0.42..6619.76 rows=1 width=8) (actual time=18.479..18.480 rows=0 loops=1)
    Output: a, b
    Buffers: shared hit=662
    ->  Index Only Scan using t_a_include_b on public.t  (cost=0.42..6619.76 rows=1 width=8) (actual
time=18.477..18.477rows=0 loops=1) 
          Output: a, b
          Index Cond: (t.a > 1000000)
          Filter: (t.b = 4)
          Rows Removed by Filter: 197771
          Heap Fetches: 0
          Buffers: shared hit=662

Removing VACUUM makes it behave like IndexScan and fetch candidate tuples from heap all while returning zero rows in
theresult. 


To make query plan comparable I had to force index scan on both with:

  SET enable_bitmapscan to off;
  SET enable_seqscan to off;
  SET max_parallel_workers_per_gather = 0;

Self contained fully reproducible example is in https://dbfiddle.uk/iehtq44L

Regards,
Maxim



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: Support logical replication of DDLs
Следующее
От: Peter Smith
Дата:
Сообщение: Re: [PATCH] Add pretty-printed XML output option