FSM versus GIN pending list bloat

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема FSM versus GIN pending list bloat
Дата
Msg-id CAMkU=1xfE1MnGMkv655hB8jCs3PBTb4S5H+FnQv8kcmYzyeBDQ@mail.gmail.com
обсуждение исходный текст
Ответы Re: FSM versus GIN pending list bloat  (Heikki Linnakangas <hlinnaka@iki.fi>)
Re: FSM versus GIN pending list bloat  (Simon Riggs <simon@2ndQuadrant.com>)
Re: FSM versus GIN pending list bloat  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Список pgsql-hackers
For a GIN index with fastupdate turned on, both the user backends and autoanalyze routine will clear out the pending list, pushing the entries into the normal index structure and deleting the pages used by the pending list.  But those deleted pages will not get added to the freespace map until a vacuum is done.  This leads to horrible bloat on insert only tables, as it is never vacuumed and so the pending list space is never reused.  And the pending list is very inefficient in space usage to start with, even compared to the old style posting lists and especially compared to the new compressed ones.  (If they were aggressively recycled, this inefficient use wouldn't be much of a problem.)

Even on a table receiving mostly updates after its initial population (and so being vacuumed regularly) with default autovac setting, there is a lot of bloat.

The attached proof of concept patch greatly improves the bloat for both the insert and the update cases.  You need to turn on both features: adding the pages to fsm, and vacuuming the fsm, to get the benefit (so JJ_GIN=3).  The first of those two things could probably be adopted for real, but the second probably is not acceptable.  What is the right way to do this?  Could a variant of RecordFreeIndexPage bubble the free space up the map immediately rather than waiting for a vacuum?  It would only have to move up until it found a page with freespace already recorded in it, which the vast majority of the time would mean observing up one level and then not writing to it, assuming the pending list pages remain well clustered.

Or would a completely different approach be better, like managing the vacated pending list pages directly in the index without going to the fsm?

Cheers,

Jeff
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: tablecmds.c and lock hierarchy
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: pg_rewind tap test unstable