kill_prior_tuple and index scan costing

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема kill_prior_tuple and index scan costing
Дата
Msg-id CAH2-Wz=hz8HOcAO23SBLTHk56jwspGetLNSD56bnuymn+3h1zg@mail.gmail.com
обсуждение исходный текст
Ответы Re: kill_prior_tuple and index scan costing  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
If I run the regression tests so that the "tenk1" table is available,
and then create an index on tenk1.twothousand, I notice that simple
"where twothousand = ?" queries have query plans that look like the
following sample plan:

pg@regression:5432 [17755]=# explain (analyze, buffers, costs off)
select * from tenk1 where twothousand = 42;
┌─────────────────────────────────────────────────────────────────────────────────────────────┐
│                                         QUERY PLAN
                       │
├─────────────────────────────────────────────────────────────────────────────────────────────┤
│ Bitmap Heap Scan on tenk1 (actual time=0.023..0.032 rows=5 loops=1)
                       │
│   Recheck Cond: (twothousand = 42)
                       │
│   Heap Blocks: exact=5
                       │
│   Buffers: shared hit=7
                       │
│   ->  Bitmap Index Scan on tenk1_twothousand_idx1 (actual
time=0.015..0.015 rows=5 loops=1) │
│         Index Cond: (twothousand = 42)
                       │
│         Buffers: shared hit=2
                       │
│ Planning Time: 0.146 ms
                       │
│ Execution Time: 0.065 ms
                       │
└─────────────────────────────────────────────────────────────────────────────────────────────┘
(9 rows)

Seems reasonable. There is a bitmap index scan, more or less due to
the uncorrelated table access that would be required by an index scan
on tenk1_twothousand_idx1. We return 5 rows, and must access one heap
block for each of those 5 rows. I wonder why we don't get the
following alternative plan instead, which is slightly faster even with
the weak correlation:

pg@regression:5432 [17755]=# explain (analyze, buffers, costs off)
select * from tenk1 where twothousand = 42;
┌────────────────────────────────────────────────────────────────────────────────────────────┐
│                                         QUERY PLAN
                      │
├────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Scan using tenk1_twothousand_idx1 on tenk1 (actual
time=0.020..0.030 rows=5 loops=1) │
│   Index Cond: (twothousand = 42)
                      │
│   Buffers: shared hit=7
                      │
│ Planning Time: 0.134 ms
                      │
│ Execution Time: 0.058 ms
                      │
└────────────────────────────────────────────────────────────────────────────────────────────┘
(5 rows)

Both plans are very similar, really. The number of heap accesses and
B-Tree index page accesses is exactly the same in each case. But the
index scan plan has one non-obvious advantage, that might matter a lot
in the real world: it can apply the kill_prior_tuple optimization. (It
is never possible to use the kill_prior_tuple optimization during a
bitmap index scan.)

It makes sense that the planner determines that a bitmap index scan is
faster -- or it used to make sense. Before commit dd299df8, which made
heap TID a tiebreaker nbtree index column, we might find ourselves
accessing the same heap page multiple times, pinning it a second or a
third time within the executor (it depended on very unstable
implementation details in the nbtree code). These days we should
*reliably* access the same number of heap pages (and index pages) with
either plan. (There are a couple of caveats that I'm glossing over for
now, like pg_upgrade'd indexes.)

Is it worth considering the influence of the tiebreaker heap TID
column work in the planner, so that we get to use the kill_prior_tuple
optimization more often? I'm not planning to work on it myself, but it
seems worth considering.

FWIW, the planner requires lots of convincing before it will use the
index scan plan right now. I find that I need to set random_page_cost
to 1.6 before the planner chooses the latter plan (a CLUSTER that uses
the twothousand index works too, of course). If I come up with a
similar example that returns 10 rows (i.e. that indexes the "thousand"
row instead), random_page_cost needs to be reduced to 1.1 to produce
an equivalent query plan crossover.

-- 
Peter Geoghegan

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Magnus Hagander
Дата:
Сообщение: Re: pg_stat_progress_basebackup - progress reporting forpg_basebackup, in the server side
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: PATCH: add support for IN and @> in functional-dependencystatistics use