Re: Collation-aware comparisons in GIN opclasses

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Collation-aware comparisons in GIN opclasses
Дата
Msg-id 54290EDB.2090406@vmware.com
обсуждение исходный текст
Ответ на Collation-aware comparisons in GIN opclasses  (Alexander Korotkov <aekorotkov@gmail.com>)
Ответы Re: Collation-aware comparisons in GIN opclasses  (Oleg Bartunov <obartunov@gmail.com>)
Список pgsql-hackers
On 09/15/2014 06:28 PM, Alexander Korotkov wrote:
> Hackers,
>
> some GIN opclasses uses collation-aware comparisons while they don't need
> to do especially collation-aware comparison. Examples are text[] and hstore
> opclasses.

Hmm. It would be nice to use the index for inequality searches, at least
on text[]. We don't support that currently, but it would require
collation-awareness.

> Depending on collation this may make them a much slower.
>
> See example.
>
> # show lc_collate ;
>   lc_collate
> ─────────────
>   ru_RU.UTF-8
> (1 row)
>
> # create table test as (select array_agg(i::text) from
> generate_series(1,1000000) i group by (i-1)/10);
> SELECT 100000
>
> # create index test_idx on test using gin(array_agg);
> CREATE INDEX
> Time: *26930,423 ms*
>
> # create index test_idx2 on test using gin(array_agg collate "C");
> CREATE INDEX
> Time: *5143,682 ms*
>
> Index creation with collation "ru_RU.UTF-8" is 5 times slower while
> collation has absolutely no effect on index functionality.

It occurs to me that practically all of those comparisons happen when we
populate the red-black Tree, during the index build. The purpose of the
red-black tree is to collect identical keys together, but there is
actually no requirement that the order of the red-black tree matches the
order of the index. It also isn't strictly required that it recognizes
equal keys as equal. The only requirement is that it doesn't incorrectly
put two keys that are equal according to the compare-function, into two
different nodes.

We could therefore use plain memcmp() to compare the Datums while
building the red-black tree. Keys that are bit-wise equal are surely
considered as equal by the compare-function. That makes the index build
a lot faster. With the attached quick patch:

postgres=# create index test_idx on test using gin(array_agg );
CREATE INDEX
Time: 880.620 ms

This is on my laptop. Without the patch, that takes about 4.7 seconds
with the C locale, so this is much faster than even using the C locale.

- Heikki

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Add generate_series(numeric, numeric)
Следующее
От: Oleg Bartunov
Дата:
Сообщение: Re: Collation-aware comparisons in GIN opclasses