Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
Дата
Msg-id 20180129193219.fk5vnliembhacuys@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
Hi,

On 2018-01-29 13:45:10 -0500, Tom Lane wrote:
> Also, the fact that you needed one more ORDER BY in testing on a
> single machine is already proof of my worry about the regression
> tests, and it'll only get worse when testing across a variety
> of machines.

That's an entirely unconvincing argument. A lot of changes to hashtable
code will result in ordering changes, and has done so in the past. Sure
there occasionally was need for a followup ORDER BY addition, but that's
not that bad. The fact that only a single order by was needed when
entirely changing the hash function output (by randomizing it) doesn't
at all seem to suggest that it's a huge problem.


On 2018-01-29 14:11:26 -0500, Tom Lane wrote:
> One other point here is that it's not really clear to me what a randomly
> varying IV is supposed to accomplish.  Surely we're not intending that
> it prevents somebody from crafting a data set that causes bad hash
> performance.

I do think we should make that harder.


> If a user with DB access wants to cause a performance
> problem, there are and always will be plenty of other avenues to making
> that happen.

Sure, but that's different from somebody triggering, e.g. by website
input, such behaviour.


> If the idea is that for a data set that otherwise would have
> bad hash performance, choosing a different IV would (almost always) fix
> it, that sounds good but you're ignoring the inverse case: for a data set
> that works fine, there would be some choices of IV that create a problem
> where there was none before.  I see no reason to think that the
> probability of the former kind of situation is higher than the latter.

It's easily intentionally triggerable vs. not.


> So I'm on board with using the extended hash functions when available,
> but I'm not convinced that a varying IV buys us anything but trouble.

I think we really need both. A constant IV of e.g. 0 for a single-column
hashtable doesn't really fix much, even though it does improve hash
quality a bit for multi-column indexes. A random IV does address issues
due to a lot of tuples falling into the same bucket, but does not
address actual collisions. Both together are pretty convincing tho.

Greetings,

Andres Freund


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop
Следующее
От: Greg Stark
Дата:
Сообщение: Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop