Re: Fillfactor for GIN indexes

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: Fillfactor for GIN indexes
Дата
Msg-id CAPpHfdux70vzCDoyeaf=QahsJ8stzU7xFEdFtkiLbjY7OaFLWA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Fillfactor for GIN indexes  (Heikki Linnakangas <hlinnaka@iki.fi>)
Ответы Re: Fillfactor for GIN indexes  (Michael Paquier <michael.paquier@gmail.com>)
Список pgsql-hackers
Hi!

On Wed, Jul 8, 2015 at 10:27 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
In dataPlaceToPageLeaf-function:

        if (append)
        {
                /*
                 * Even when appending, trying to append more items than will fit is
                 * not completely free, because we will merge the new items and old
                 * items into an array below. In the best case, every new item fits in
                 * a single byte, and we can use all the free space on the old page as
                 * well as the new page. For simplicity, ignore segment overhead etc.
                 */
                maxitems = Min(maxitems, freespace + GinDataPageMaxDataSize);
        }

Hmm. So after splitting the page, there is freespace + GinDataPageMaxDataSize bytes available on both pages together. freespace has been adjusted with the fillfactor, but GinDataPageMaxDataSize is not. So this overshoots, because when leafRepackItems actually distributes the segments on the pages, it will fill both pages only up to the fillfactor. This is an upper bound, so that's harmless, it only leads to some unnecessary work in dealing with the item lists. But I think that should be:

maxitems = Min(maxitems, freespace + leaf->maxdatasize);

Good catch! Fixed.
 

        else
        {
                /*
                 * Calculate a conservative estimate of how many new items we can fit
                 * on the two pages after splitting.
                 *
                 * We can use any remaining free space on the old page to store full
                 * segments, as well as the new page. Each full-sized segment can hold
                 * at least MinTuplesPerSegment items
                 */
                int                     nnewsegments;

                nnewsegments = freespace / GinPostingListSegmentMaxSize;
                nnewsegments += GinDataPageMaxDataSize / GinPostingListSegmentMaxSize;
                maxitems = Min(maxitems, nnewsegments * MinTuplesPerSegment);
        }

This branch makes the same mistake, but this is calculating a lower bound. It's important that maxitems is not set to higher value than what actually fits on the page, otherwise you can get an ERROR later in the function, when it turns out that not all the items actually fit on the page. The saving grace here is that this branch is never taken when building a new index, because index build inserts all the TIDs in order, but that seems pretty fragile. Should use leaf->maxdatasize instead of GinDataPageMaxDataSize here too.

Fixed.
 
But that can lead to funny things, if fillfactor is small, and BLCKSZ is small too. With the minimums, BLCKSZ=1024 and fillfactor=0.2, the above formula will set nnewsegments to zero. That's not going to end up well. The problem is that maxdatasize becomes smaller than GinPostingListSegmentMaxSize, which is a problem. I think GinGetMaxDataSize() needs to make sure that the returned value is always >= GinPostingListSegmentMaxSize.

Fixed. 
 
Now that we have a fillfactor, shouldn't we replace the 75% heuristic later in that function, when inserting to the rightmost page rather than building it from scratch? In B-tree, the fillfactor is applied to that case too.

Sounds reasonable. Now it works like btree.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fujii Masao
Дата:
Сообщение: Re: FPW compression leaks information
Следующее
От: Fabrízio de Royes Mello
Дата:
Сообщение: Re: [HACKERS] Re: [HACKERS] GSoC 2015 proposal: Improve the performance of “ALTER TABLE .. SET LOGGED / UNLOGGED” statement