Comment in ginpostinglist.c doesn't match code

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Comment in ginpostinglist.c doesn't match code
Дата
Msg-id 33bfc20a-5c86-f50c-f5a5-58e9925d05ff@iki.fi
обсуждение исходный текст
Ответы Re: Comment in ginpostinglist.c doesn't match code  (Ashwin Agrawal <aagrawal@pivotal.io>)
Re: Comment in ginpostinglist.c doesn't match code  (Alexander Korotkov <aekorotkov@gmail.com>)
Список pgsql-hackers
Hi,

While merging Greenplum with 9.4, we ran into problems with the GIN 
posting list encoding, because Greenplum sometimes uses ItemPointers 
with offset numbers up to 32768. The GIN posting list code was written 
with the assumption that the maximum is MaxHeapTuplesPerPage, and it 
uses only 11 bits for the offset number. The fix was simple, we just 
modified ginpostinglist.c to use full 16 bits instead of 11.

However, I noticed that this comment in ginpostinglist.c that explains 
the encoding doesn't match the code:

>  * These 43-bit integers are encoded using varbyte encoding. In each byte,
>  * the 7 low bits contain data, while the highest bit is a continuation bit.
>  * When the continuation bit is set, the next byte is part of the same
>  * integer, otherwise this is the last byte of this integer.  43 bits fit
>  * conveniently in at most 6 bytes when varbyte encoded (the 6th byte does
>  * not need a continuation bit, because we know the max size to be 43 bits):
>  *
>  * 0XXXXXXX
>  * 1XXXXXXX 0XXXXYYY
>  * 1XXXXXXX 1XXXXYYY 0YYYYYYY
>  * 1XXXXXXX 1XXXXYYY 1YYYYYYY 0YYYYYYY
>  * 1XXXXXXX 1XXXXYYY 1YYYYYYY 1YYYYYYY 0YYYYYYY
>  * 1XXXXXXX 1XXXXYYY 1YYYYYYY 1YYYYYYY 1YYYYYYY YYYYYYYY
>  *
>  * X = bits used for offset number
>  * Y = bits used for block number

The code doesn't actually give the 6th byte any special treatment. If 
the input integer has the 43rd bit set, the encoding function will put a 
continuation bit on the 6th byte, and generate a 7th byte. And the 
decoding function will correctly decode that, too. So to my surprise, 
the implementation actually works for integers up 49 bits wide. However, 
there is an overflow check in the encoding function that assumes max 6 
bytes per integer. That needs to be fixed, along with the comment.

Fitting any item pointer into 6 bytes was an important property when 
this was written, because in the old pre-9.4 format, posting lists were 
as arrays, with 6 bytes per item pointer. The maximum of 6 bytes per 
integer in the new format guaranteed that we could convert any page from 
the old format to the new format, after pg_upgrade, so that the new 
format was never larger than the old format. But I don't think we need 
to worry much about that anymore. Luckily, no one has ran into this 
while trying to upgrade. It would require having a 9.3 cluster with a 
table larger than 16 TB (with 8k block size), with a GIN index on it, 
and a posting list with TIDs more than 2^31 blocks distance, on a full 
page. So, not a problem in practice.

In summary, the comment in ginpostinglist.c is wrong, and the overflow 
check needs to be fixed. Patch attached.

The patch also includes a little unit test module to test this without 
creating a 16 TB table. A whole new test module seems a bit like 
overkill just for this, but clearly we were missing test coverage here. 
And it will come handy, if we want to invent a new better posting list 
format in the future. Thoughts on whether to include the test module or not?

- Heikki

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Refactoring of connection with password prompt loop for frontends
Следующее
От: Surafel Temesgen
Дата:
Сообщение: Take skip header out of a loop in COPY FROM