Re: GIN improvements part 1: additional information

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: GIN improvements part 1: additional information
Дата
Msg-id 52B49C34.8060300@vmware.com
обсуждение исходный текст
Ответ на Re: GIN improvements part 1: additional information  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Ответы Re: GIN improvements part 1: additional information
Re: GIN improvements part 1: additional information
Re: GIN improvements part 1: additional information
Список pgsql-hackers
On 12/19/2013 03:33 PM, Heikki Linnakangas wrote:
> On 12/17/2013 12:49 AM, Heikki Linnakangas wrote:
>> On 12/17/2013 12:22 AM, Alexander Korotkov wrote:
>>> On Mon, Dec 16, 2013 at 3:30 PM, Heikki Linnakangas
>>> <hlinnakangas@vmware.com
>>>> wrote:
>>>
>>>> On 12/12/2013 06:44 PM, Alexander Korotkov wrote:
>>>>
>>>>   When values are packed into small groups, we have to either insert
>>>>> inefficiently encoded value or re-encode whole right part of values.
>>>>
>>>> It would probably be simplest to store newly inserted items
>>>> uncompressed,
>>>> in a separate area in the page. For example, grow the list of
>>>> uncompressed
>>>> items downwards from pg_upper, and the compressed items upwards from
>>>> pg_lower. When the page fills up, re-encode the whole page.
>>
>> I hacked together an implementation of a variant of Simple9, to see how
>> it performs. Insertions are handled per the above scheme.
>
> Here's an updated version of that, using the page layout without
> item-indexes that I described in the other post. This is much less buggy
> than that earlier crude version I posted - and unfortunately it doesn't
> compress as well. The earlier version lost some items :-(.
>
> Nevertheless, I think this page layout and code formatting is better,
> even if we switch the encoding back to the varbyte encoding in the end.

Yet another version. The encoding/decoding code is now quite isolated in
ginpostinglist.c, so it's easy to experiment with different encodings.
This patch uses varbyte encoding again.

I got a bit carried away, experimented with a bunch of different
encodings. I tried rice encoding, rice encoding with block and offset
number delta stored separately, the simple9 variant, and varbyte encoding.

The compressed size obviously depends a lot on the distribution of the
items, but in the test set I used, the differences between different
encodings were quite small.

One fatal problem with many encodings is VACUUM. If a page is completely
full and you remove one item, the result must still fit. In other words,
removing an item must never enlarge the space needed. Otherwise we have
to be able to split on vacuum, which adds a lot of code, and also makes
it possible for VACUUM to fail if there is no disk space left. That's
unpleasant if you're trying to run VACUUM to release disk space. (gin
fast updates already has that problem BTW, but let's not make it worse)

I believe that eliminates all encodings in the Simple family, as well as
PForDelta, and surprisingly also Rice encoding. For example, if you have
three items in consecutive offsets, the differences between them are
encoded as 11 in rice encoding. If you remove the middle item, the
encoding for the next item becomes 010, which takes more space than the
original.

AFAICS varbyte encoding is safe from that. (a formal proof would be nice
though).

So, I'm happy to go with varbyte encoding now, indeed I don't think we
have much choice unless someone can come up with an alternative that's
VACUUM-safe. I have to put this patch aside for a while now, I spent a
lot more time on these encoding experiments than I intended. If you
could take a look at this latest version, spend some time reviewing it
and cleaning up any obsolete comments, and re-run the performance tests
you did earlier, that would be great. One thing I'm slightly worried
about is the overhead of merging the compressed and uncompressed posting
lists in a scan. This patch will be in good shape for the final
commitfest, or even before that.

- Heikki

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: shared memory message queues
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Logging WAL when updating hintbit