Re: GiST index for pgtrgm bloats a lot

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: GiST index for pgtrgm bloats a lot
Дата
Msg-id 5559E6ED.2010405@iki.fi
обсуждение исходный текст
Ответ на GiST index for pgtrgm bloats a lot  (hubert depesz lubaczewski <depesz@depesz.com>)
Ответы Re: GiST index for pgtrgm bloats a lot  (hubert depesz lubaczewski <depesz@depesz.com>)
Список pgsql-bugs
On 05/18/2015 03:12 PM, hubert depesz lubaczewski wrote:
> Hi,
> We have this situation on many servers - used versions are 9.1.14 and/or
> 9.3.6, on Linux servers.
>
> There is a table, with ~ 1 million rows.
> There are no deletes there, inserts happen at the rate of ~ 100-200
> daily (but not spread, it's usually a single moment in day where there
> happen the inserts).
> There is also HUGE number of updates - up to 200,000 rows updated per
> hour.

Are the columns that are included in the bloated index also updated that
often? If not, I'd suggest moving those columns to a separate table with
a one-to-one relationship to the main table. Or perhaps just create a
helper table that contains copies of those columns, and keep it
up-to-date with triggers. That would reduce the churn in the index.

> On these databases we have some (not many) 1-2 minute transactions, but usually
> every transaction within 30 seconds
>
> This table is vacuumed (autovacuum) quite a lot, usually at least 10 times a
> day.
>
> Is there anything we could do to help diagnose the problem, and fix it?
> I don't think I can install custom pg version, and downtime would be
> complicated to get approval, but anything about the data/database I can
> check that would allow diagnosing the bug, I'd be happy to do.

GiST doesn't have the same features that e.g B-tree has to avoid bloat:

1. When GiST has multiple equally good choices where it could insert a
tuple, it favours branches that are "earlier" in the index. Always
descending down the same branch is good for cache efficiency when you
insert multiple items with similar keys, but the downside is that the
other branches can easily have a lot of free space that goes unused,
while the "hot" branch just gets split repeatedly. This is explained in
the comments in the gistchoose() function. That leads to bloat, because
the free space isn't being used.

2. When an index page becomes completely empty, GiST doesn't try to
remove and reuse it.

It would be nice to address those in a future version, but that won't
help you right now.

- Heikki

В списке pgsql-bugs по дате отправления:

Предыдущее
От: hubert depesz lubaczewski
Дата:
Сообщение: GiST index for pgtrgm bloats a lot
Следующее
От: hubert depesz lubaczewski
Дата:
Сообщение: Re: GiST index for pgtrgm bloats a lot