Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
Дата
Msg-id 20180129010213.rt3nu73ilff756f6@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop  ("Todd A. Cook" <tcook@blackducksoftware.com>)
Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
On 2018-01-26 18:48:35 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2017-12-10 23:09:42 +0100, Tomas Vondra wrote:
> >> FWIW I do agree the data sets shared in this thread are pretty extreme
> >> and it doesn't make much sense to slow the regular cases. I'll be
> >> perfectly happy if we stop the OOM, making those cases fast is a bonus.
> 
> > Yea, agreed on that. I'm kinda inclined to go for stop-growing in 10,
> > and so something better in 11. And then later possibly backpatch if
> > we've grown some confidence?
> 
> +1.  I'd like to see some response to this included in 10.2, and time
> grows short for that.

Here are two patches that I think we want for 10.2, and the start of one
that I think we want for master.  0002 is needed because otherwise the
lack of extra growth leads to noticeably worse performance when filling
an underestimated a coordinator hash table from the workers - turns out
our hash combine (and most hash combines) let a lot of clustering
survive. By adding a final hashing round the bit perturbation is near
perfect.  The commit messages need to be polished a bit, but other than
that I think these are reasonable fixes. Plan to push by Monday evening
at the latest.

The third patch is a version of the random IV discussed in this
thread. I do think we want to add usage of the extended hash functions,
as prototyped by Tomas, as that actually helps to fix issues with actual
hash conflicts. But we additionally need a fallback path for types
without extended hashtables, and the random IV is a good idea
nonetheless.  There's no ABI difference in my patch, so I think this is
actually something we could backpatch. But I don't think it's urgent, so
I'm not planning to do that for 10.2.  The one thing that could confuse
people is that it can lead to output order changes from run to run - I
think that's actually good, nobody should rely on hashagg etc output
being stable, but it might be a bit much in a stable release?


In my tests this solves the worst performance issues in Todd's case,
Tomas's, Thomas's and still does ok performancwith with a TPC-H Q18
(which showcases the underestimated worker hashtable into leader
hashtable issue).

Greetings,

Andres Freund

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: BUG #15025: PSQL CLI - inconsistency when both -d and -U supplies a username
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop