Re: Making strxfrm() blobs in indexes work

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Making strxfrm() blobs in indexes work
Дата
Msg-id CAM3SWZQozR1R8qAgF+bkdbXf_4=gJw6cSQnMHa4r=pHbSeYKOg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Making strxfrm() blobs in indexes work  (Peter Geoghegan <pg@heroku.com>)
Ответы Re: Making strxfrm() blobs in indexes work  (Peter Geoghegan <pg@heroku.com>)
Список pgsql-hackers
On Thu, Jan 30, 2014 at 5:05 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Thu, Jan 30, 2014 at 5:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> That's not hard to prevent. If that should happen, we don't go with
>>> the strxfrm() datum. We have a spare IndexTuple bit we could use to
>>> mark when the optimization was applied.
>>
>> You'd need a bit per column, no?
>
> I don't think so. It would be no big deal if it was all or nothing.

I've done some more digging. It turns out that the 1977 paper "An
Encoding Method for Multifield Sorting and Indexing" describes a
technique that involves concatenating multiple column values and
comparing them using a simple strcmp(). Apparently this is something
that was in system R back in the 1970s. Here is a description of that
paper:

"Sequences of character strings with an order relation imposed between
sequences are considered. An encoding scheme is described which
produces a single, order-preserving string from a sequence of strings.
The original sequence can be recovered from the encoded string, and
one sequence of strings precedes another if and only if the encoding
of the first precedes the encoding of the second. The strings may be
variable length, without a maximum length restriction, and no symbols
need be reserved for control purposes. Hence any symbol may occur in
any string. The scheme is useful for multifield sorting, multifield
indexing, and other applications where ordering on more than one field
is important."

I think we should implement the scheme in this paper, for inner B-Tree
pages. The paper describes how lexicographic sort order must be
preserved, so it's pretty much identical to what I've described,
except it doesn't say anything about inner pages presumably because
there was no Unicode back then, and I don't think that B+Trees/B-link
trees had fully caught on yet. We're already suffering from the fact
that strcoll() mandates a NULL-terminated string, since we have to
copy text datums to a buffer to deal with the impedance mismatch.
Furthermore, multi-column comparisons probably have a lot of overhead
at present from all the indirection alone.

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: pgindent behavior we could do without
Следующее
От: Claudio Freire
Дата:
Сообщение: Prefix compression of B-Tree keys