Re: New IndexAM API controlling index vacuum strategies

Поиск

Список

Период

Сортировка

От	Peter Geoghegan
Тема	Re: New IndexAM API controlling index vacuum strategies
Дата	29 декабря 2020 г. 07:20:44
Msg-id	CAH2-WznWWQwgydT37X=c2xe1QBV5k53HVuTfLfxhbopAAA_Kgg@mail.gmail.com обсуждение исходный текст
Ответ на	Re: New IndexAM API controlling index vacuum strategies (Masahiko Sawada <sawada.mshk@gmail.com>)
Ответы	Re: New IndexAM API controlling index vacuum strategies
Список	pgsql-hackers

Дерево обсуждения

On Mon, Dec 28, 2020 at 10:26 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> The second test checking if maintenane_work_mem space is running out
> also makes sense to me. Perhaps another idea would be to compare the
> number of collected garbage tuple to the total number of heap tuples
> so that we do lazy_vacuum_heap() only when we’re likely to reclaim a
> certain amount of garbage in the table.

Right. Or to consider if this is an anti-wraparound VACUUM might be
nice -- maybe we should skip index vacuuming + lazy_vacuum_heap() if
and only if we're under pressure to advance datfrozenxid for the whole
DB, and really need to hurry up. (I think that we could both probably
think of way too many ideas like this one.)

> That's a very good result in terms of skipping lazy_vacuum_heap(). How
> much the table and indexes bloated? Also, I'm curious about that which
> tests in choose_vacuum_strategy() turned vacuum_heap on: 130 test or
> test if maintenance_work_mem space is running out? And what was the
> impact on clearing all-visible bits?

The pgbench_accounts heap table and 3 out of 4 of its indexes (i.e.
all indexes except "abalance_ruin") had zero growth. They did not even
become larger by 1 block. As I often say when talking about work in
this area, this is not a quantitative difference -- it's a qualitative
difference. (If they grew even a tiny amount, say by only 1 block,
further growth is likely to follow.)

The "abalance_ruin" index was smaller with the patch. Its size started
off at 253,779 blocks with both the patch and master branch (which is
very small, because of B-Tree deduplication). By the end of 2 pairs of
runs for the patch (2 3 hour runs) the size grew to 502,016 blocks.
But with the master branch it grew to 540,090 blocks. (For reference,
the primary key on pgbench_accounts started out at 822,573 blocks.)

My guess is that this would compare favorably with "magic VACUUM" [1]
(I refer to a thought experiment that is useful for understanding the
principles behind bottom-up index deletion). The fact that
"abalance_ruin" becomes bloated probably doesn't have that much to do
with MVCC versioning. In other words, I suspect that the index
wouldn't be that smaller in a traditional two-phase locking database
system with the same workload. Words like "bloat" and "fragmentation"
have always been overloaded/ambiguous in highly confusing ways, which
is why I find it useful to compare a real world workload/benchmark to
some kind of theoretical ideal behavior.

This test wasn't particularly sympathetic to the patch because most of
the indexes (all but the PK) were useless -- they did not get used by
query plans. So the final size of "abalance_ruin" (or any other index)
isn't even the truly important thing IMV (the benchmark doesn't
advertise the truly important thing for me). The truly important thing
is that the worst case number of versions *per logical row* is tightly
controlled. It doesn't necessarily matter all that much if 30% of an
index's tuples are garbage, as long as the garbage tuples are evenly
spread across all logical rows in the table (in practice it's pretty
unlikely that that would actually happen, but it's possible in theory,
and if it did happen it really wouldn't be so bad).

[1] https://postgr.es/m/CAH2-Wz=rPkB5vXS7AZ+v8t3VE75v_dKGro+w4nWd64E9yiCLEQ@mail.gmail.com
--
Peter Geoghegan

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: New IndexAM API controlling index vacuum strategies