Обсуждение: Negative imact of maintenance_work_mem to GIN size

Поиск
Список
Период
Сортировка

Negative imact of maintenance_work_mem to GIN size

От
Alexander Korotkov
Дата:
Hi!

I found that sometimes larger maintenance_work_mem leads to larger GIN index. That is quite strange. ISTM that it's related to posting lists compression but I can't figure out how exactly it is.

See example on delicious bookmarks dataset.


set maintenance_work_mem = '1GB';
create index js_idx1 on js using gin(v jsonb_path_idx);
set maintenance_work_mem = '16MB';
create index js_idx2 on js using gin(v jsonb_path_ops);

                        List of relations
 Schema |  Name   | Type  | Owner  | Table |  Size  | Description
--------+---------+-------+--------+-------+--------+-------------
 public | js_idx1 | index | smagen | js    | 432 MB |
 public | js_idx2 | index | smagen | js    | 309 MB |
(2 rows)

------
With best regards,
Alexander Korotkov.

Re: Negative imact of maintenance_work_mem to GIN size

От
Alexander Korotkov
Дата:
On Tue, May 20, 2014 at 4:50 AM, Alexander Korotkov <aekorotkov@gmail.com> wrote:
I found that sometimes larger maintenance_work_mem leads to larger GIN index. That is quite strange. ISTM that it's related to posting lists compression but I can't figure out how exactly it is.

It appears to be not related to posting lists compression. I did reproduce it on 9.2.

create table test as (select array[(random()*10000000)::int]::int[] as v from generate_series(1,10000000) g);
set maintenance_work_mem = '2GB';
create index test_idx1 on test using gin(v);
set maintenance_work_mem = '16MB';
create index test_idx2 on test using gin(v);

 Schema |            Name            | Type  |  Owner   |    Table    |  Size   | Description
--------+----------------------------+-------+----------+-------------+---------+-------------
 public | test_idx1                  | index | smagen   | test        | 392 MB  |
 public | test_idx2                  | index | smagen   | test        | 268 MB  |
(2 rows)

The reason of it is that we filling entry tree with inserting without any special optimization. Large maintenance_work_mem gives us ascending order of insertion. Thus insertion is performed always into rightmost page leaving rest of pages half-empty. Small maintenance_work_mem gives us more random order of insertion. Such insertions makes pages about 75% filled in average. Posting trees has special optimization for this case while entry tree doesn't.

------
With best regards,
Alexander Korotkov.