some GIN opclasses uses collation-aware comparisons while they don't need to do especially collation-aware comparison. Examples are text[] and hstore opclasses.
Hmm. It would be nice to use the index for inequality searches, at least on text[]. We don't support that currently, but it would require collation-awareness.
Depending on collation this may make them a much slower.
See example.
# show lc_collate ; lc_collate ───────────── ru_RU.UTF-8 (1 row)
# create table test as (select array_agg(i::text) from generate_series(1,1000000) i group by (i-1)/10); SELECT 100000
# create index test_idx on test using gin(array_agg); CREATE INDEX Time: *26930,423 ms*
# create index test_idx2 on test using gin(array_agg collate "C"); CREATE INDEX Time: *5143,682 ms*
Index creation with collation "ru_RU.UTF-8" is 5 times slower while collation has absolutely no effect on index functionality.
It occurs to me that practically all of those comparisons happen when we populate the red-black Tree, during the index build. The purpose of the red-black tree is to collect identical keys together, but there is actually no requirement that the order of the red-black tree matches the order of the index. It also isn't strictly required that it recognizes equal keys as equal. The only requirement is that it doesn't incorrectly put two keys that are equal according to the compare-function, into two different nodes.
Good point, Heikki. I experienced several times this problem, fixed it with C-locale and forgot again. Now, it's time to fix !
We could therefore use plain memcmp() to compare the Datums while building the red-black tree. Keys that are bit-wise equal are surely considered as equal by the compare-function. That makes the index build a lot faster. With the attached quick patch:
postgres=# create index test_idx on test using gin(array_agg ); CREATE INDEX Time: 880.620 ms
This is on my laptop. Without the patch, that takes about 4.7 seconds with the C locale, so this is much faster than even using the C locale.
Hmm, on my MBA I got 17277.734 (patch) vs 39151.562 for ru_RU.UTF-8 and