Re: Collation-aware comparisons in GIN opclasses

Поиск
Список
Период
Сортировка
От Oleg Bartunov
Тема Re: Collation-aware comparisons in GIN opclasses
Дата
Msg-id CAF4Au4zYsJUmqH=zSiBrUuN2HTBr8Yhq8a8T+T707a0E0CPCmw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Collation-aware comparisons in GIN opclasses  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Список pgsql-hackers


On Mon, Sep 29, 2014 at 11:48 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
On 09/15/2014 06:28 PM, Alexander Korotkov wrote:
Hackers,

some GIN opclasses uses collation-aware comparisons while they don't need
to do especially collation-aware comparison. Examples are text[] and hstore
opclasses.

Hmm. It would be nice to use the index for inequality searches, at least on text[]. We don't support that currently, but it would require collation-awareness.

Depending on collation this may make them a much slower.

See example.

# show lc_collate ;
  lc_collate
─────────────
  ru_RU.UTF-8
(1 row)

# create table test as (select array_agg(i::text) from
generate_series(1,1000000) i group by (i-1)/10);
SELECT 100000

# create index test_idx on test using gin(array_agg);
CREATE INDEX
Time: *26930,423 ms*

# create index test_idx2 on test using gin(array_agg collate "C");
CREATE INDEX
Time: *5143,682 ms*

Index creation with collation "ru_RU.UTF-8" is 5 times slower while
collation has absolutely no effect on index functionality.

It occurs to me that practically all of those comparisons happen when we populate the red-black Tree, during the index build. The purpose of the red-black tree is to collect identical keys together, but there is actually no requirement that the order of the red-black tree matches the order of the index. It also isn't strictly required that it recognizes equal keys as equal. The only requirement is that it doesn't incorrectly put two keys that are equal according to the compare-function, into two different nodes.


Good point, Heikki. I experienced several times this problem, fixed it with C-locale and forgot again. Now, it's time to fix !
 
We could therefore use plain memcmp() to compare the Datums while building the red-black tree. Keys that are bit-wise equal are surely considered as equal by the compare-function. That makes the index build a lot faster. With the attached quick patch:

postgres=# create index test_idx on test using gin(array_agg );
CREATE INDEX
Time: 880.620 ms

This is on my laptop. Without the patch, that takes about 4.7 seconds with the C locale, so this is much faster than even using the C locale.

Hmm, on my MBA I got 
17277.734 (patch) vs 39151.562 for  ru_RU.UTF-8 and
6131.929 (patch)   vs 6131.929   for  C

Not much :(
 

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Collation-aware comparisons in GIN opclasses
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}