prefix btree implementation
От | Qingqing Zhou |
---|---|
Тема | prefix btree implementation |
Дата | |
Msg-id | Pine.LNX.4.58.0510042359110.15377@eon.cs обсуждение исходный текст |
Ответы |
Re: prefix btree implementation
(Tom Lane <tgl@sss.pgh.pa.us>)
Re: prefix btree implementation (Bricklen Anderson <BAnderson@PresiNET.com>) |
Список | pgsql-hackers |
I am not sure if this idea was mentioned before. The basic prefix btree idea is quite straightforward, i.e., try to compress the key items within a data page by sharing the common prefix. Thus the fanout of the page is increased and the benefits is obvious theorectically. Consider the following cases: 1/ Multi-attributes index, example page looks like this: [1, 1, 'abc'], [1, 1, 'bde'], ..., [1, 22, 'mmk'] - So [1, 1] or [1] is compressible 2/ Index on string, example page looks like this: ['aaab'] ['aaac'], ..., ['aabc'] - So ['aaa'] or ['aa'] is compressible 3/ Both 1 and 2, example page looks like this: [1, 'abc', 1], [1, 'abc', 2], ..., [1, 'abk', 1] - So [1, 'abc'] or [1,'ab'] is compressible 4/ And even this, example page looks like this: [0x00001234], [0x00001235], ..., [0x00002213] - So [0x0000] is compressible So together, there are basically four types of possible sharing: column-wise (case 1), character-wise (case 2), column-character-wise (case 3), and byte-wise (case 4). To compress these prefixes, we have the following questions: 1/ What types of prefix compression shall we support? We can consider case 3 as a generalized case of 1 and 2, thus at least we can implement for case 1, 2 and 3 now. Case 4 could be seen as a special case of case 2, so the framework should be scalable enough to add supports to 4. In short, we will implement 1,2 and 3 now. 2/ When to do prefix compression and what index operations will be affected? To simplify things, we can do it when we do bottom-up index rebuild. We won't try to compress the prefix when inserting new data. By this design, I could think that the index operations affected will include btree build, btree split (new page has to inheritant prefix information). 3/ How to represent each index tuple and where to store the common prefix? Index tuples involved in the compression have to remember how much have been shared and the shared prefix should be stored within the same page. The basic idea is that we will use bit 13 of IndexTuple->t_info to indicate that if it shares a prefix or not, and the common prefix will be stored in the special area of an index page. The common prefix data will have format {length|value}. Thanks that the PageHeader->pd_special is dyanmic, so the common prefix could have different length on each page. Based on these changes, we will change index_getattr() and _bt_pageinit() accordingly. 4/ WAL support? We only affect btree build and btree split. For btree build, we don't worry too much, since the data is already page-wise xloged. We will add prefix information to btree split xlog record (xl_btree_split). Comments? Regards, Qingqing
В списке pgsql-hackers по дате отправления: