[WiP] B-tree page merge during vacuum to reduce index bloat

Поиск
Список
Период
Сортировка
От Andrey Borodin
Тема [WiP] B-tree page merge during vacuum to reduce index bloat
Дата
Msg-id 11A59C0C-A8C8-4642-8493-292D5DF8311D@yandex-team.ru
обсуждение исходный текст
Ответы Re: [WiP] B-tree page merge during vacuum to reduce index bloat
Список pgsql-hackers
Hi hackers,

Together with Kirk and Nik we spent several online hacking sessions to tackle index bloat problems [0,1,2]. As a result
weconcluded that B-tree index page merge should help to keep an index from pressuring shared_buffers. 

We are proposing a feature to automatically merge nearly-empty B-tree leaf pages during VACUUM operations to reduce
indexbloat. This addresses a common issue where deleted tuples leave behind sparsely populated pages that traditional
pagedeletion cannot handle because they're not completely empty. 

*** Implementation Overview:

The patch adds a new `mergefactor` reloption (default 5%, range 0-50%) that defines when a page becomes a candidate for
merging.During vacuum, when a page exceeds this threshold (e.g., 95% empty with default settings), we attempt to move
theremaining tuples to the right sibling page and then delete the source page using existing page deletion mechanisms. 

Key changes:
- New `mergefactor` reloption for configurable merge thresholds
- Detection logic in `btvacuumpage()` to identify merge candidates
- Tuple movement implementation in `_bt_unlink_halfdead_page()`
- WAL logging enhancements to handle cross-page dependencies during replay

The last part needs further improvements (it's simply REGBUF_FORCE_IMAGE for now), but I want to start a discussion and
askfor known problems of the approach. 

*** Correctness:

The implementation reuses existing locking, critical sections and WAL logging infrastructure. To handle cross-page
dependenciesduring WAL replay, when tuples are merged, the right sibling buffer is registered with
`REGBUF_FORCE_IMAGE`,this is a temporary workaround. 

*** Current Status & Questions:

The patch successfully reduces index bloat and handles basic scenarios, but we've identified some areas that need
communityinput: 

1. Backward Scan Correctness: The primary concern is ensuring backward scans work correctly when pages are being merged
concurrently.Since we maintain the same locking protocol as existing page deletion, I believe this should be safe, but
wouldappreciate expert review of the approach. 
2. Performance Impact: The additional checks during vacuum have minimal overhead, but broader testing would be
valuable.Worst case would be the index with leaf pattern (5%,96%,5%,96%,5%,96%...). We will attempt to merge it every
timespending time on acquiring locks. 
3. WAL Consistency: There are still some edge cases with WAL consistency checking that need refinement. I think I can
handleit, just need to spend enough time on debugging real redo instead of imaging right page. 


*** Usage:
CREATE INDEX ON table (col) WITH (mergefactor=10);  -- 10% threshold
I don't know if it would be a good idea to enable mergefactor for existing indexes.

The feature is particularly beneficial for workloads with high update/delete rates that create sparse index pages
withouttriggering complete page deletion. 

I'm attaching the patch for review and would welcome feedback on the approach, especially regarding backward scan
safetyand any other correctness concerns we may have missed. 

Thank you!


Best regards,
Andrey, Nik, Kirk.


[0] https://www.youtube.com/watch?v=3MleDtXZUlM
[1] https://www.youtube.com/watch?v=Ib3SXSFt8mE
[2] https://www.youtube.com/watch?v=D1PEdDcvZTw


Вложения

В списке pgsql-hackers по дате отправления: