Re: GIN improvements part 1: additional information

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: GIN improvements part 1: additional information
Дата
Msg-id CAPpHfdvGG4K8A+0RxBNPVfGFQ=vk2tDpRVqP--uprvCKmP=t2Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: GIN improvements part 1: additional information  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Список pgsql-hackers
On Fri, Dec 20, 2013 at 11:36 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
On 12/19/2013 03:33 PM, Heikki Linnakangas wrote:
On 12/17/2013 12:49 AM, Heikki Linnakangas wrote:
On 12/17/2013 12:22 AM, Alexander Korotkov wrote:
On Mon, Dec 16, 2013 at 3:30 PM, Heikki Linnakangas
<hlinnakangas@vmware.com
wrote:

On 12/12/2013 06:44 PM, Alexander Korotkov wrote:

  When values are packed into small groups, we have to either insert
inefficiently encoded value or re-encode whole right part of values.

It would probably be simplest to store newly inserted items
uncompressed,
in a separate area in the page. For example, grow the list of
uncompressed
items downwards from pg_upper, and the compressed items upwards from
pg_lower. When the page fills up, re-encode the whole page.

I hacked together an implementation of a variant of Simple9, to see how
it performs. Insertions are handled per the above scheme.

Here's an updated version of that, using the page layout without
item-indexes that I described in the other post. This is much less buggy
than that earlier crude version I posted - and unfortunately it doesn't
compress as well. The earlier version lost some items :-(.

Nevertheless, I think this page layout and code formatting is better,
even if we switch the encoding back to the varbyte encoding in the end.

Yet another version. The encoding/decoding code is now quite isolated in ginpostinglist.c, so it's easy to experiment with different encodings. This patch uses varbyte encoding again.

I got a bit carried away, experimented with a bunch of different encodings. I tried rice encoding, rice encoding with block and offset number delta stored separately, the simple9 variant, and varbyte encoding.

The compressed size obviously depends a lot on the distribution of the items, but in the test set I used, the differences between different encodings were quite small.

One fatal problem with many encodings is VACUUM. If a page is completely full and you remove one item, the result must still fit. In other words, removing an item must never enlarge the space needed. Otherwise we have to be able to split on vacuum, which adds a lot of code, and also makes it possible for VACUUM to fail if there is no disk space left. That's unpleasant if you're trying to run VACUUM to release disk space. (gin fast updates already has that problem BTW, but let's not make it worse)

I believe that eliminates all encodings in the Simple family, as well as PForDelta, and surprisingly also Rice encoding. For example, if you have three items in consecutive offsets, the differences between them are encoded as 11 in rice encoding. If you remove the middle item, the encoding for the next item becomes 010, which takes more space than the original.

AFAICS varbyte encoding is safe from that. (a formal proof would be nice though).

Formal proof is so. Removing number is actually replacement of two numbers with their sum. We have to proof that varbyte encoding of sum can't be longer than varbyte encoding of summands.High bit number of sum is at most one more than high bit number of larger summand. So varbyte encoding of sum is at most one byte longer than varbyte encoding of larger summand. Lesser summand is at least one byte. 
 
So, I'm happy to go with varbyte encoding now, indeed I don't think we have much choice unless someone can come up with an alternative that's VACUUM-safe. I have to put this patch aside for a while now, I spent a lot more time on these encoding experiments than I intended. If you could take a look at this latest version, spend some time reviewing it and cleaning up any obsolete comments, and re-run the performance tests you did earlier, that would be great. One thing I'm slightly worried about is the overhead of merging the compressed and uncompressed posting lists in a scan. This patch will be in good shape for the final commitfest, or even before that.

OK. I'll make tests for scans on mix of compressed and uncompressed posting lists. However, I don't expect it to be slower than both pure compressed and pure uncompressed posting lists. Overhead of compressing uncompressed posting lists is evident. But it also would be nice to measure.

------
With best regards,
Alexander Korotkov.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: GIN improvements part 1: additional information
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE