Re: GUCs to control abbreviated sort keys

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: GUCs to control abbreviated sort keys
Дата
Msg-id CAH2-WzmsdP72PWGJ-091yqT2A0HQP7yp0TQ9nkgGCASp_cwr9A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: GUCs to control abbreviated sort keys  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers
On Fri, Jan 27, 2023 at 12:34 PM Jeff Davis <pgsql@j-davis.com> wrote:
> It is non-deterministic, but I tried with two generated files, and got
> similar results.

Jeff and I coordinated off-list. It turned out that the
nondeterministic nature of the program to generate test data was
behind my initial inability to recreate Jeff's results. Once Jeff
provided me with the exact data that he saw the problem with, I was
able to recreate the problematic case for abbreviated keys.

It turns out that this was due to aborting abbreviation way too late
in the process. It would happen relatively late in the process, when
more than 50% of all tuples had already had abbreviations generated by
ICU. This was a marginal case for abbreviated keys, which is precisely
why it only happened this long into the process. That factor is also
likely why I couldn't recreate the problem at first, even though I had
test data that was substantially the same as the data required to show
the problem.

Attached patch fixes the issue. It teaches varstr_abbrev_abort to do
something similar to every other abbreviated keys abort function: stop
estimating cardinality entirely (give up on giving up) once there are
a certain number of distinct abbreviated keys, regardless of any other
factor.

This is very closely based on existing code from numeric_abbrev_abort,
though I use a cutoff of 10k rather than a cutoff of 100k. This
difference is justified by the special considerations for text, where
we authoritative comparisons have further optimizations such as
strcoll caching and the memcmp equality fast path. It's also required
to actually fix the test case at hand -- 100k isn't enough to avoid
the performance issue Jeff reported.

I think that this should be committed to HEAD only.

-- 
Peter Geoghegan

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: undersized unions
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)