Re: Fillfactor for GIN indexes

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: Fillfactor for GIN indexes
Дата
Msg-id CAPpHfdvb_MsCa8Qtcj_+APDiDBE-von3AN3LeV9yrj_v9VCZYQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Fillfactor for GIN indexes  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Fillfactor for GIN indexes  (Alexander Korotkov <aekorotkov@gmail.com>)
Re: Fillfactor for GIN indexes  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Fri, Jan 16, 2015 at 8:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Jan 15, 2015 at 7:06 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Alexander Korotkov wrote:
>> I'm not sure. On the one hand it's unclear why fillfactor should be
>> different from 9.4.
>> On the other hand it's unclear why it should be different from btree.
>> I propose marking this "ready for committer". So, committer can make a final
>> decision.
> OK let's do so then. My preference is to fully pack the index at
> build. GIN compression has been one of the headlines of 9.4.

I'm struggling to understand why we shouldn't just reject this patch.
On November 27th, Cedric said:

"what are the benefits of this patch ? (maybe you had some test case
or a benchmark ?)"

Nobody replied.  On January 15th, you (Michael) hypothesized that
"this patch has value to control random updates on GIN indexes" but
there seem to be absolutely no test results showing that any such
value exists.

There's only value in adding a fillfactor parameter to GIN indexes if
it improves performance.  There are no benchmarks showing it does.
So, why are we still talking about this?

I already wrote quite detailed explanation of subject. Let mel try to explain in shortly. GIN is two level nested btree. Thus, GIN would have absolutely same benefits from fillfactor as btree. Lack of tests showing it is, for sure, fault.

However, GIN posting trees are ordered by ItemPointer and this makes some specific. If you have freshly created table and do inserts/updates they would use the end of heap. Thus, inserts would go to the end of GIN posting tree and fillfactor wouldn't affect anything. Fillfactor would give benefits on HOT or heap space re-usage.

In the following example you can see that index size was increased greatly while updating every 20th row. It's because every update causes page split in index.

# create table test with (fillfactor=90) as (select id, array[1,2,3] v from generate_series(1,1000000) id);
# create index test_idx100 on test using gin(v) with (fillfactor=100, fastupdate=off);
# \di+
                          List of relations
 Schema │     Name    │ Type  │ Owner  │ Table │  Size   │ Description
────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────
 public │ test_idx100 │ index │ smagen │ test  │ 3184 kB

# update test set v = array[1,2] where id%20 = 0;
# \di+
                          List of relations
 Schema │     Name    │ Type  │ Owner  │ Table │  Size   │ Description
────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────
 public │ test_idx100 │ index │ smagen │ test  │ 5264 kB
(1 row)

But if we create index with fillfactor=90, index size would remain the same: no page splits.

# create table test with (fillfactor=90) as (select id, array[1,2,3] v from generate_series(1,1000000) id);
# create index test_idx90 on test using gin(v) with (fillfactor=90, fastupdate=off);
# \di+
                          List of relations
 Schema │    Name    │ Type  │ Owner  │ Table │  Size   │ Description
────────┼────────────┼───────┼────────┼───────┼─────────┼─────────────
 public │ test_idx90 │ index │ smagen │ test  │ 3520 kB

# update test set v = array[1,2] where id%20 = 0;
# \di+
                          List of relations
 Schema │    Name    │ Type  │ Owner  │ Table │  Size   │ Description
────────┼────────────┼───────┼────────┼───────┼─────────┼─────────────
 public │ test_idx90 │ index │ smagen │ test  │ 3520 kB
(1 row)

Similar situation would be if we use fastupdate. But fastupdate takes some space for pending lists which is independent from fillfactor.

# create table test with (fillfactor=90) as (select id, array[1,2,3] v from generate_series(1,1000000) id);
# create index test_idx100 on test using gin(v) with (fillfactor=100, fastupdate=on);
# \di+
                           List of relations
 Schema │    Name     │ Type  │ Owner  │ Table │  Size   │ Description
────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────
 public │ test_idx100 │ index │ smagen │ test  │ 3184 kB

# update test set v = array[1,2] where id%20 = 0;
# vacuum test;
# \di+
                           List of relations
 Schema │    Name     │ Type  │ Owner  │ Table │  Size   │ Description
────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────
 public │ test_idx100 │ index │ smagen │ test  │ 7256 kB


# create table test with (fillfactor=90) as (select id, array[1,2,3] v from generate_series(1,1000000) id);
# create index test_idx100 on test using gin(v) with (fillfactor=90, fastupdate=on);
# \di+
                           List of relations
 Schema │    Name     │ Type  │ Owner  │ Table │  Size   │ Description
────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────
 public │ test_idx100 │ index │ smagen │ test  │ 3520 kB

# update test set v = array[1,2] where id%20 = 0;
# vacuum test;
# \di+
                           List of relations
 Schema │    Name     │ Type  │ Owner  │ Table │  Size   │ Description
────────┼─────────────┼───────┼────────┼───────┼─────────┼─────────────
 public │ test_idx100 │ index │ smagen │ test  │ 5512 kB

BTW, previous version of patch contained some bugs. Revised version is attached.

------
With best regards,
Alexander Korotkov.
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: proposal: lock_time for pg_stat_database
Следующее
От: Alexander Korotkov
Дата:
Сообщение: Re: Fillfactor for GIN indexes