Обсуждение: Prefix compression of B-Tree keys

Поиск
Список
Период
Сортировка

Prefix compression of B-Tree keys

От
Claudio Freire
Дата:
On Thu, Jan 30, 2014 at 9:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Geoghegan <pg@heroku.com> writes:
>> On more occasions than I care to recall, someone has suggested that it
>> would be valuable to do something with strxfrm() blobs in order to
>> have cheaper locale-aware text comparisons. One obvious place to do so
>> would be in indexes, but in the past that has been dismissed on the
>> following grounds:
>
>> * Index-only scans need fully formed datums to work, and strxfrm() is
>> a one-way function (or so I believe). There is no reason to think that
>> the original string can be reassembled from the blob, so clearly that
>> won't fly.
>
>> * There is a high cost to be paid in storage overhead. Even for
>> collations like "en_US.UTF-8", that can mean that the blob is as much
>> as 3-4 times larger than the original text string. Who is to say that
>> we'll come out ahead even with the savings of just doing a strcmp()
>> rather than a strcoll()?
>
> Quite aside from the index bloat risk, this effect means a 3-4x reduction
> in the maximum string length that can be indexed before getting the
> dreaded "Values larger than 1/3 of a buffer page cannot be indexed" error.
> Worse, a value insertion might well succeed, with the failure happening
> only (much?) later when that entry is chosen as a page split boundary.
>
> It's possible that TOAST compression of the strings would save you, but
> I'm not convinced of that; it certainly doesn't seem like we could
> guarantee no post-insertion failures that way.

Maybe not TOAST compression, but prefix compression.

I've been wondering lately, whether a format change in the B-Tree
could be worth the effort: store values with prefix compression. It
would certainly help indexes of many kinds (text-valued, multi-column,
etc...).

Now, I haven't checked if it's already done. Sorry if it is. I did
mock around btree code a lot and don't remember any of this, but I do
remember stuff that could be used to achieve it (specifically, all the
index-only-scan machinery, which allows for implicit info).

Granted, this isn't strxfrm, but it may provide some of the benefits
(faster comparisons because less bytes are compared?).

On Fri, Jan 31, 2014 at 1:51 AM, Peter Geoghegan <pg@heroku.com> wrote:
> We're already suffering from the fact
> that strcoll() mandates a NULL-terminated string, since we have to
> copy text datums to a buffer to deal with the impedance mismatch.
> Furthermore, multi-column comparisons probably have a lot of overhead
> at present from all the indirection alone.

Which makes the above even more likely to help.



Re: Prefix compression of B-Tree keys

От
Peter Geoghegan
Дата:
On Thu, Jan 30, 2014 at 9:26 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> Maybe not TOAST compression, but prefix compression.

I've thought about it as well. It's totally feasible, and likely
worthwhile, but a little more tricky.

> I've been wondering lately, whether a format change in the B-Tree
> could be worth the effort: store values with prefix compression. It
> would certainly help indexes of many kinds (text-valued, multi-column,
> etc...).

You can play tricks with the AM version, generally stored in the meta
page, as in the recent commit
36a35c550ac114caa423bcbe339d3515db0cd957, without breaking pg_upgrade.
Although a lot of these techniques may not actually require that kind
of additional effort.

> Now, I haven't checked if it's already done. Sorry if it is. I did
> mock around btree code a lot and don't remember any of this, but I do
> remember stuff that could be used to achieve it (specifically, all the
> index-only-scan machinery, which allows for implicit info).

I think it would make sense to do it on leaf pages, where there is
frequently a lot of redundancy. The reason that I myself wouldn't do
it first is that offhand I think that it'd be harder to come up with a
very generic infrastructure to make it work across diverse types, and
it isn't that useful where there isn't much redundancy in prefixes.
The B-Tree code doesn't really know anything about the type indexed,
and that's a useful property. But it's still definitely a useful goal.


-- 
Peter Geoghegan



Re: Prefix compression of B-Tree keys

От
Claudio Freire
Дата:
On Fri, Jan 31, 2014 at 3:51 AM, Peter Geoghegan <pg@heroku.com> wrote:
>> Now, I haven't checked if it's already done. Sorry if it is. I did
>> mock around btree code a lot and don't remember any of this, but I do
>> remember stuff that could be used to achieve it (specifically, all the
>> index-only-scan machinery, which allows for implicit info).
>
> I think it would make sense to do it on leaf pages, where there is
> frequently a lot of redundancy. The reason that I myself wouldn't do
> it first is that offhand I think that it'd be harder to come up with a
> very generic infrastructure to make it work across diverse types, and
> it isn't that useful where there isn't much redundancy in prefixes.
> The B-Tree code doesn't really know anything about the type indexed,
> and that's a useful property. But it's still definitely a useful goal.

Well, for multi-column it should be really simple. You already have
all the necessary information to decide on the prefix (the common
prefix columns of leftmost and rightmost keys). It's harder for text
columns, since you must abstract out the "common prefix check","append
prefix", and "remove prefix" operations. That would probably require
extending opclasses.