Re: [WiP] B-tree page merge during vacuum to reduce index bloat

Поиск

Список

Период

Сортировка

От	Kirk Wolak
Тема	Re: [WiP] B-tree page merge during vacuum to reduce index bloat
Дата	26 августа 17:11:43
Msg-id	CACLU5mRude0L5psEj5WS0DVDv=AHN0McfZBKV5eBoW0JqwwZDA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [WiP] B-tree page merge during vacuum to reduce index bloat (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Tue, Aug 26, 2025 at 6:33 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:

On Tue, 26 Aug 2025 at 11:40, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
> Hi hackers,
>
> Together with Kirk and Nik we spent several online hacking sessions to tackle index bloat problems [0,1,2]. As a result we concluded that B-tree index page merge should help to keep an index from pressuring shared_buffers.
>
> We are proposing a feature to automatically merge nearly-empty B-tree leaf pages during VACUUM operations to reduce index bloat. This addresses a common issue where deleted tuples leave behind sparsely populated pages that traditional page deletion cannot handle because they're not completely empty.
>
...
I'm fairly sure there is a correctness issue here; I don't think you
correctly detect the two following cases:

1.) a page (P0) is scanned by a scan, finishes processing the results,
and releases its pin. It prepares to scan the next page of the scan
(P1).
2.) a page (A) is split, with new right sibling page B,
3.) and the newly created page B is merged into its right sibling C,
4.) the scan continues on to the next page

For backward scans, if page A is the same page as the one identified
with P1, the scan won't notice that tuples from P1 (aka A) have been
moved through B to P0 (C), causing the scan to skip processing for
those tuples.
For forward scans, if page A is the same page as the one identified
with P0, the scan won't notice that tuples from P0 (A) have been moved
through B to P1 (C), causing the scan to process those tuples twice in
the same scan, potentially duplicating results.

NB: Currently, the only way for "merge" to happen is when the index
page is completely empty. This guarantees that there is no movement of
scan-visible tuples to pages we've already visited/are about to visit.
This invariant is used extensively to limit lock and pin coupling (and
thus: improve performance) in index scans; see e.g. in 1bd4bc85. This
patch will invalidate that invariant, and therefore it will require
(significantly) more work in the scan code (incl. nbtsearch.c) to
guarantee exactly-once results + no false negatives.

Kind regards,

Matthias van de Meent
Databricks

This was one of our concerns. We will review the patch mentioned.

I do have a question, one of the IDEAS we discussed was to ADD a new page that combined the 2 pages.

Making the deletion "feel" like a page split.

This has the advantage of leaving the original 2 pages alone for anyone who is currently traversing.

And like the page split, updating the links around while marking the pages for the new path.

The downside to this approach is that we are "adding 1 page to then mark 2 pages as unused".

Could you comment on this secondary approach?

Thanks in advance!

Kirk

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [WiP] B-tree page merge during vacuum to reduce index bloat