Index Tuple Compression Approach?

Поиск
Список
Период
Сортировка
От Dawid Kuroczko
Тема Index Tuple Compression Approach?
Дата
Msg-id 758d5e7f0708150939w1a08590aw1abcfcde402122df@mail.gmail.com
обсуждение исходный текст
Ответ на Index Tuple Compression Approach?  (Chris Browne <cbbrowne@acm.org>)
Ответы Re: Index Tuple Compression Approach?
Список pgsql-hackers
On 8/14/07, Chris Browne <cbbrowne@acm.org> wrote:
> I recently had a chat with someone who was pretty intimate with Adabas
> for a number of years who's in the process of figuring things out
> about PostgreSQL.  We poked at bits of the respective implementations,
> seeing some similarities and differences.  He pointed out one aspect
> of index handling that could (in principle) be an interesting
> optimization.
>
> Evidently, in Adabas, index leaf nodes were not simply tuples, but
> lists where the index value would not be repeated.
>
> In PostgreSQL, if you have the index value 'abc', and there are 10
> tuples with that value, then you'll have a page full of tuples of the
> following form:
>
> |abc|ptr[rec1]|abc|ptr[rec2]|abc|ptr[rec3]| ...and so forth...
>
> Now, the Adabas approach was rather different.  It would only have the
> index value once, and then have the list of tuple pointers:
>
> |abc|ptr[rec1],ptr[rec2],ptr[rec3],...[ptr[rec10]|
>
> This could allow a fair bit of compression, for cases where the index
> value is not unique.

Interesting.  Some time ago I've played a little with quite a big table
which constained path (file path) as a primary key.  It did have sense
to have a strucure (SELECTs were mostly ORDER BY path WHERE path >
'/foo' LIMIT n).
The actual index was only a little bit smaller than the table (there were
maybe 4 or 5 columns there).

Some time ago I've had an idea that it might be possible to compress
th index size, even if it is a unique index.  Take the path example.
My idea would be to to split indexed value to 8-byte chunks.
For example: /var/lib/postgresql/8.2/main would be split into: "/var/lib" "/postgre" "sql/8.2" -- these would be
inserteredinto a tree as a "scaffold",
 
and only vacuum should remove them.. "main" -- this would be a leaf node.  It could be repeated in non-unique
indexes.

[/etc/pas] -- scaffold-node|-"swd"    -- leaf node
[/etc/sha]|-"dow"
[/var/lib]   -- a problematic mixed scaffold/leaf node.[/postgre] |-"sql" |-"sql/8.2" [sql/8.2/]  |-"main"  |-"foobar"

The scaffold nodes would be there to guarantee that there is some
place to attach leafs to.  They should not be removed by DELETE
(unless we are sure no other node depends on them).

Advantages?  The repeated values (as "/var/lib/postgresql/8.2")
are not repeated -- they are embedded into tree, as a "scaffold",
actual nodes that are significant (files, not directories, in my
example) are put as actual leafs.

I guess it would reduce large indexes size and at the same time
it could remove limitation that B-tree index cannot index values
larger than 1/3 of the database page.  8-byte chunks was given
as an example here, perhaps larger value would be better.

(Of course redesigning schema to put directories separate from
files woul be useful, but it would not help with ORDER BY .. LIMIT
queries -- they would have to be JOIN-ed and re-sorted in memory
I'm afraid).
 Regards,   Dawid


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Marc G. Fournier"
Дата:
Сообщение: Re: CVS corruption/mistagging?
Следующее
От: Tom Lane
Дата:
Сообщение: Re: XID wraparound and busy databases