Re: Making strxfrm() blobs in indexes work

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Making strxfrm() blobs in indexes work
Дата
Msg-id 27262.1391128477@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Making strxfrm() blobs in indexes work  (Peter Geoghegan <pg@heroku.com>)
Ответы Re: Making strxfrm() blobs in indexes work  (Peter Geoghegan <pg@heroku.com>)
Список pgsql-hackers
Peter Geoghegan <pg@heroku.com> writes:
> On more occasions than I care to recall, someone has suggested that it
> would be valuable to do something with strxfrm() blobs in order to
> have cheaper locale-aware text comparisons. One obvious place to do so
> would be in indexes, but in the past that has been dismissed on the
> following grounds:

> * Index-only scans need fully formed datums to work, and strxfrm() is
> a one-way function (or so I believe). There is no reason to think that
> the original string can be reassembled from the blob, so clearly that
> won't fly.

> * There is a high cost to be paid in storage overhead. Even for
> collations like "en_US.UTF-8", that can mean that the blob is as much
> as 3-4 times larger than the original text string. Who is to say that
> we'll come out ahead even with the savings of just doing a strcmp()
> rather than a strcoll()?

Quite aside from the index bloat risk, this effect means a 3-4x reduction
in the maximum string length that can be indexed before getting the
dreaded "Values larger than 1/3 of a buffer page cannot be indexed" error.
Worse, a value insertion might well succeed, with the failure happening
only (much?) later when that entry is chosen as a page split boundary.

It's possible that TOAST compression of the strings would save you, but
I'm not convinced of that; it certainly doesn't seem like we could
guarantee no post-insertion failures that way.

Also, detoasting of strings that hadn't been long enough to need toasting
before could easily eat any savings.

> I'm sure anyone that has read this far knows where I'm going with
> this: why can't we just have strxfrm() blobs in the inner pages,
> implying large savings for a big majority of text comparisons that
> service index scans, without bloating the indexes too badly, and
> without breaking anything? We only use inner pages to find leaf pages.
> They're redundant copies of the data within the index.

It's a cute idea though, and perhaps worth pursuing as long as you've
got the pitfalls in mind.
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Merlin Moncure
Дата:
Сообщение: Re: jsonb and nested hstore
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: Prohibit row-security + inheritance in 9.4?