prefix btree implementation

Поиск
Список
Период
Сортировка
От Qingqing Zhou
Тема prefix btree implementation
Дата
Msg-id Pine.LNX.4.58.0510042359110.15377@eon.cs
обсуждение исходный текст
Ответы Re: prefix btree implementation  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: prefix btree implementation  (Bricklen Anderson <BAnderson@PresiNET.com>)
Список pgsql-hackers
I am not sure if this idea was mentioned before.

The basic prefix btree idea is quite straightforward, i.e., try to
compress the key items within a data page by sharing the common prefix.
Thus the fanout of the page is increased and the benefits is obvious
theorectically.

Consider the following cases: 1/ Multi-attributes index, example page
looks like this:   [1, 1, 'abc'], [1, 1, 'bde'], ..., [1, 22, 'mmk']   - So [1, 1] or [1] is compressible

2/ Index on string, example page looks like this:   ['aaab'] ['aaac'], ..., ['aabc']   - So ['aaa'] or ['aa'] is
compressible

3/ Both 1 and 2, example page looks like this:   [1, 'abc', 1], [1, 'abc', 2], ..., [1, 'abk', 1]   - So [1, 'abc'] or
[1,'ab'] is compressible
 

4/ And even this, example page looks like this:   [0x00001234], [0x00001235], ..., [0x00002213]   - So [0x0000] is
compressible

So together, there are basically four types of possible sharing:
column-wise (case 1), character-wise (case 2), column-character-wise (case
3), and byte-wise (case 4).

To compress these prefixes, we have the following questions:

1/ What types of prefix compression shall we support?

We can consider case 3 as a generalized case of 1 and 2, thus at least we
can implement for case 1, 2 and 3 now. Case 4 could be seen as a special
case of case 2, so the framework should be scalable enough to add supports
to 4. In short, we will implement 1,2 and 3 now.

2/ When to do prefix compression and what index operations will be
affected?

To simplify things, we can do it when we do bottom-up index rebuild. We
won't try to compress the prefix when inserting new data. By this design,
I could think that the index operations affected will include btree build,
btree split (new page has to inheritant prefix information).

3/ How to represent each index tuple and where to store the common prefix?

Index tuples involved in the compression have to remember how much have
been shared and the shared prefix should be stored within the same page.
The basic idea is that we will use bit 13 of IndexTuple->t_info to
indicate that if it shares a prefix or not, and the common prefix will be
stored in the special area of an index page. The common prefix data will
have format {length|value}.  Thanks that the PageHeader->pd_special is
dyanmic, so the common prefix could have different length on each page.
Based on these changes, we will change index_getattr() and _bt_pageinit()
accordingly.

4/ WAL support?

We only affect btree build and btree split. For btree build, we don't
worry too much, since the data is already page-wise xloged. We will add
prefix information to btree split xlog record (xl_btree_split).

Comments?

Regards,
Qingqing


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Announcing Veil
Следующее
От: Tom Lane
Дата:
Сообщение: Re: prefix btree implementation