Re: GIN improvements part 1: additional information

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: GIN improvements part 1: additional information
Дата
Msg-id 52B1B63F.6030104@vmware.com
обсуждение исходный текст
Ответ на Re: GIN improvements part 1: additional information  (Alexander Korotkov <aekorotkov@gmail.com>)
Ответы Re: GIN improvements part 1: additional information
Список pgsql-hackers
On 12/18/2013 01:45 PM, Alexander Korotkov wrote:
> On Tue, Dec 17, 2013 at 2:49 AM, Heikki Linnakangas <hlinnakangas@vmware.com
>> wrote:
>
>> On 12/17/2013 12:22 AM, Alexander Korotkov wrote:
>>   2) Storage would be easily extendable to hold additional information as
>>> well.
>>> Better compression shouldn't block more serious improvements.
>>>
>>
>> I'm not sure I agree with that. For all the cases where you don't care
>> about additional information - which covers all existing users for example
>> - reducing disk size is pretty important. How are you planning to store the
>> additional information, and how does using another encoding gets in the way
>> of that?
>
> I was planned to store additional information datums between
> varbyte-encoded tids. I was expected it would be hard to do with PFOR.
> However, I don't see significant problems in your implementation of Simple9
> encoding. I'm going to dig deeper in your version of patch.

Ok, thanks.

I had another idea about the page format this morning. Instead of having 
the item-indexes at the end of the page, it would be more flexible to 
store a bunch of self-contained posting list "segments" on the page. So 
I propose that we get rid of the item-indexes, and instead store a bunch 
of independent posting lists on the page:

typedef struct
{    ItemPointerData first;   /* first item in this segment (unpacked) */    uint16      nwords;         /* number of
wordsthat follow */    uint64      words[1];    /* var length */
 
} PostingListSegment;

Each segment can be encoded and decoded independently. When searching 
for a particular item (like on insertion), you skip over segments where 
'first' > the item you're searching for.

This format offers a lot more flexibility compared to the separate item 
indexes. First, we don't need to have another fixed sized area on the 
page, which simplifies the page format. Second, we can more easily 
re-encode only one segment on the page, on insertion or vacuum. The 
latter is particularly important with the Simple-9 encoding, which 
operates one word at a time rather than one item at a time; removing or 
inserting an item in the middle can require a complete re-encoding of 
everything that follows. Third, when a page is being inserted into and 
contains only uncompressed items, you don't waste any space for unused 
item indexes.

While we're at it, I think we should use the above struct in the inline 
posting lists stored directly in entry tuples. That wastes a few bytes 
compared to the current approach in the patch (more alignment, and 
'words' is redundant with the number of items stored on the tuple 
header), but it simplifies the functions handling these lists.

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: SQL objects UNITs (was: Extension Templates S03E11)
Следующее
От: Tatsuo Ishii
Дата:
Сообщение: Re: ALTER SYSTEM SET command to change postgresql.conf parameters