Re: Fillfactor for GIN indexes

Поиск

Список

Период

Сортировка

От	Alexander Korotkov
Тема	Re: Fillfactor for GIN indexes
Дата	17 января 2015 г. 12:49:37
Msg-id	CAPpHfdvb_MsCa8Qtcj_+APDiDBE-von3AN3LeV9yrj_v9VCZYQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Fillfactor for GIN indexes (Robert Haas <robertmhaas@gmail.com>)
Ответы	Re: Fillfactor for GIN indexes Re: Fillfactor for GIN indexes
Список	pgsql-hackers

Дерево обсуждения

On Fri, Jan 16, 2015 at 8:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jan 15, 2015 at 7:06 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Alexander Korotkov wrote:
>> I'm not sure. On the one hand it's unclear why fillfactor should be
>> different from 9.4.
>> On the other hand it's unclear why it should be different from btree.
>> I propose marking this "ready for committer". So, committer can make a final
>> decision.
> OK let's do so then. My preference is to fully pack the index at
> build. GIN compression has been one of the headlines of 9.4.

I'm struggling to understand why we shouldn't just reject this patch.
On November 27th, Cedric said:

"what are the benefits of this patch ? (maybe you had some test case
or a benchmark ?)"

Nobody replied. On January 15th, you (Michael) hypothesized that
"this patch has value to control random updates on GIN indexes" but
there seem to be absolutely no test results showing that any such
value exists.

There's only value in adding a fillfactor parameter to GIN indexes if
it improves performance. There are no benchmarks showing it does.
So, why are we still talking about this?

I already wrote quite detailed explanation of subject. Let mel try to explain in shortly. GIN is two level nested btree. Thus, GIN would have absolutely same benefits from fillfactor as btree. Lack of tests showing it is, for sure, fault.

However, GIN posting trees are ordered by ItemPointer and this makes some specific. If you have freshly created table and do inserts/updates they would use the end of heap. Thus, inserts would go to the end of GIN posting tree and fillfactor wouldn't affect anything. Fillfactor would give benefits on HOT or heap space re-usage.

In the following example you can see that index size was increased greatly while updating every 20th row. It's because every update causes page split in index.

# create table test with (fillfactor=90) as (select id, array[1,2,3] v from generate_series(1,1000000) id);

# create index test_idx100 on test using gin(v) with (fillfactor=100, fastupdate=off);

# \di+

List of relations

Schema │ Name │ Type │ Owner │ Table │ Size │ Description

────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────

public │ test_idx100 │ index │ smagen │ test │ 3184 kB │

# update test set v = array[1,2] where id%20 = 0;

# \di+

List of relations

Schema │ Name │ Type │ Owner │ Table │ Size │ Description

public │ test_idx100 │ index │ smagen │ test │ 5264 kB │

(1 row)

But if we create index with fillfactor=90, index size would remain the same: no page splits.

# create table test with (fillfactor=90) as (select id, array[1,2,3] v from generate_series(1,1000000) id);

# create index test_idx90 on test using gin(v) with (fillfactor=90, fastupdate=off);

# \di+

List of relations

Schema │ Name │ Type │ Owner │ Table │ Size │ Description

────────┼────────────┼───────┼────────┼───────┼─────────┼─────────────

public │ test_idx90 │ index │ smagen │ test │ 3520 kB │

# update test set v = array[1,2] where id%20 = 0;

# \di+

List of relations

Schema │ Name │ Type │ Owner │ Table │ Size │ Description

public │ test_idx90 │ index │ smagen │ test │ 3520 kB │

(1 row)

Similar situation would be if we use fastupdate. But fastupdate takes some space for pending lists which is independent from fillfactor.

# create table test with (fillfactor=90) as (select id, array[1,2,3] v from generate_series(1,1000000) id);

# create index test_idx100 on test using gin(v) with (fillfactor=100, fastupdate=on);

# \di+

List of relations

Schema │ Name │ Type │ Owner │ Table │ Size │ Description

public │ test_idx100 │ index │ smagen │ test │ 3184 kB │

# update test set v = array[1,2] where id%20 = 0;

# vacuum test;

# \di+

List of relations

Schema │ Name │ Type │ Owner │ Table │ Size │ Description

public │ test_idx100 │ index │ smagen │ test │ 7256 kB │

# create table test with (fillfactor=90) as (select id, array[1,2,3] v from generate_series(1,1000000) id);

# create index test_idx100 on test using gin(v) with (fillfactor=90, fastupdate=on);

# \di+

List of relations

Schema │ Name │ Type │ Owner │ Table │ Size │ Description

public │ test_idx100 │ index │ smagen │ test │ 3520 kB │

# update test set v = array[1,2] where id%20 = 0;

# vacuum test;

# \di+

List of relations

Schema │ Name │ Type │ Owner │ Table │ Size │ Description

public │ test_idx100 │ index │ smagen │ test │ 5512 kB │

BTW, previous version of patch contained some bugs. Revised version is attached.

------
With best regards,
Alexander Korotkov.

Вложения

gin_fillfactor_4.patch

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Pavel Stehule
Дата: 17 января 2015 г., 09:32:57
Сообщение: Re: proposal: lock_time for pg_stat_database

Следующее

От: Alexander Korotkov
Дата: 17 января 2015 г., 14:21:35
Сообщение: Re: Fillfactor for GIN indexes

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Fillfactor for GIN indexes

Вложения

Предыдущее

Следующее