Re: Reduce build times of pg_trgm GIN indexes

Поиск

Список

Период

Сортировка

От	David Geier
Тема	Re: Reduce build times of pg_trgm GIN indexes
Дата	9 января 15:06:42
Msg-id	e5dd01c6-c469-405d-aea2-feca0b2dc34d@gmail.com обсуждение исходный текст
Ответ на	Re: Reduce build times of pg_trgm GIN indexes (Heikki Linnakangas <hlinnaka@iki.fi>)
Ответы	Re: Reduce build times of pg_trgm GIN indexes
Список	pgsql-hackers

Дерево обсуждения

Hi Heikki!

Thanks for looking at the patch set.

On 06.01.2026 18:00, Heikki Linnakangas wrote:
> On 05/01/2026 17:01, David Geier wrote:
>> - v1-0002-Optimized-comparison-functions.patch: Use FunctionCallInvoke()
>> instead of FunctionCall2Coll(). This saves a bunch of per-comparison
>> setup code, such as calling InitFunctionCallInfoData().
> 
> You lose the check for NULL result with this. That's probably still
> worth checking.

It seems like existing code where all args are not null, has that safety
check. Added it for consistency.

>> - v1-0004-Avoid-dedup-and-sort-in-ginExtractEntries.patch
>> ginExtractEntries() deduplicates and sorts the entries returned from the
>> extract value function. In case of pg_trgm, that is completely redundant
>> because the trigrams are already deduplicated and sorted. The current
>> version of this patch is just to demonstrate the potential. We need to
>> think about what we want here. Ideally, we would require the extraction
>> function to provide the entries deduplicated and sorted. Alternatively,
>> we could indicate to ginExtractEntries() if the entries are already
>> deduplicated and sorted. If we don't want to alter the signature of the
>> extract value function, we could e.g. use the MSB of the nentries
>> argument.
> 
> Yeah, this seems wrong as it is. You're assuming that if the extract
> function returns nullFlags == NULL, the array is already sorted and
> deduped.

As said, that was just for demonstration purposes of the possible gains.
I've changed the code now such that the extractValue function of the GIN
index can indicate via the third argument uniqueAndSorted, if the
returned keys are already unique and sorted.

Unfortunately, it seems like this patch regresses performance. See
measurements below. I haven't had the time to investigate why that is.
It's pretty counter intuitive, given that this patch effectively only
removes code. Maybe you could re-test patch 0004 and share your runtimes?

>> v1-0007-Faster-qunique-comparator.patch: qunique() doesn't require a
>> full sort comparator (-1 = less, 0 = equal, 1 = greater) but only a
>> equal/unequal comparator (e.g. 0 = equal and 1 = unequal). The same
>> optimization can be done in plenty of other places in our code base.
>> Likely, in most of them the gains are insignificant.
> 
> Makes sense. I'm a little disappointed the compiler won't do that
> optimization for us..

I thought the same.

> 
> Perhaps we should introduce a new qunique_eq() function with a different
> callback signature:
> 
> /* like qunique(), but the callback function returns true/false rather
> than int */
> static inline size_t
> qunique_eq(void *array, size_t elements, size_t width,
>         bool (*equal) (const void *, const void *))
> 

I would prefer to change qunique() instead. That would enforce using an
adequate comparison function from the get go. There are only ~15 calls
to qunique(). So refactoring this should also be a fairly small patch. I
can do that if there's agreement for that approach.

>> v1-0008-Add-ASCII-fastpath-to-generate_trgm_only.patch: Typically lots
>> of text is actually ASCII. Hence, we provide a fast path for this case
>> which is exercised if the MSB of the current character is unset.
> 
> This uses pg_ascii_tolower() when for ASCII characters when built with
> the IGNORECASE. I don't think that's correct, if the proper collation
> would do something more complicated for than what pg_ascii_tolower() does.

Oh, that's evil. I had tested that specifically. But it only worked
because the code in master uses str_tolower() with
DEFAULT_COLLATION_OID. So using a different locale like in the following
example does something different than when creating a database with the
same locale.

postgres=# select lower('III' COLLATE "tr_TR");
 lower
-------
 ııı

postgres=# select show_trgm('III' COLLATE "tr_TR");
        show_trgm
-------------------------
 {"  i"," ii","ii ",iii}
(1 row)

But when using tr_TR as default locale of the database the following
happens:

postgres=# select lower('III' COLLATE "tr_TR");
 lower
-------
 ııı

postgres=# select show_trgm('III');sü
               show_trgm
---------------------------------------
 {0xbbd8dd,0xf26fab,0xf31e1a,0x2af4f1}

I'm wondering if that's intentional to begin with. Shouldn't the code
instead pass PG_GET_COLLATION() to str_tolower()? Might require some
research to see how other index types handle locales.

Coming back to the original problem: the lengthy comment at the top of
pg_locale_libc.c, suggests that in some cases ASCII characters are
handled the pg_ascii_tolower() way for the default locale. See for
example tolower_libc_mb(). So a character by character conversion using
that function will yield a different result than strlower_libc_mb(). I'm
wondering why that is.

Anyways, we could limit the optimization to only kick in when the used
locale follows the same rules as pg_ascii_tolower(). We could test that
when creating the locale and store that info in pg_locale_struct.

Thoughts?

> Did you measure how big is the impact from each individual patch?
> Patches 1 and 2 seem pretty much ready to be committed, but I wonder if
> they make any difference on their own.

Here is the impact of each patch. I ran again CREATE INDEX three times
and took the fastest run. The run of each patch includes all previous
patches as well. For example, the timings for patch 0003 were measured
with a binary that also had patch 0002 and 0001 applied. To get the
impact of each patch in isolation, the delta to the previous run was taken.

Code                                | movies |delta  | lineitem | delta
------------------------------------|--------|-------|------------------
master                              | 10,311 | 0     | 256,986  | 0
v1-0001-Inline-ginCompareAttEntries |  9,694 | 617   | 239,778  | 17,208
v1-0002-Optimized-comparison-func   |  9,510 | 184   | 238,094  |  1,684
v1-0003-Use-sort_template.h         |  8,661 | 849   | 231,190  |  6,904
v1-0004-Avoid-dedup-and-sort-in     |  9,305 | -644  | 232,472  | -1,282
v1-0005-Make-btint4cmp-branchless   |  8,240 | 1,065 | 228,387  |  4,085
v1-0006-Use-radix-sort              |  6,976 | 1,264 | 207,687  | 20,700
v1-0007-Faster-qunique-comparator   |  5,911 | 1,065 | 203,744  |  3,943
v1-0008-Add-ASCII-fastpath          |  3,409 | 2,502 | 161,469  | 42,275

Attached is v2 of the patch set with the aforementioned changes. I've
also fixed the white space errors in 0003, 0004 and 0008, as reported by
Kirill.

--
David Geier

Вложения

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Reduce build times of pg_trgm GIN indexes

Вложения