Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop
| От | Tom Lane | 
|---|---|
| Тема | Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in an infinite loop | 
| Дата | |
| Msg-id | 5429.1517250335@sss.pgh.pa.us обсуждение исходный текст | 
| Ответ на | Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop (Andres Freund <andres@anarazel.de>) | 
| Ответы | Re: BUG #14932: SELECT DISTINCT val FROM table gets stuck in aninfinite loop | 
| Список | pgsql-bugs | 
Andres Freund <andres@anarazel.de> writes:
> Here are two patches that I think we want for 10.2, and the start of one
> that I think we want for master.  0002 is needed because otherwise the
> lack of extra growth leads to noticeably worse performance when filling
> an underestimated a coordinator hash table from the workers - turns out
> our hash combine (and most hash combines) let a lot of clustering
> survive. By adding a final hashing round the bit perturbation is near
> perfect.  The commit messages need to be polished a bit, but other than
> that I think these are reasonable fixes. Plan to push by Monday evening
> at the latest.
The first 2 seem OK to me by eyeball, though I've not done performance
testing.
> The third patch is a version of the random IV discussed in this
> thread. I do think we want to add usage of the extended hash functions,
> as prototyped by Tomas, as that actually helps to fix issues with actual
> hash conflicts. But we additionally need a fallback path for types
> without extended hashtables, and the random IV is a good idea
> nonetheless.  There's no ABI difference in my patch, so I think this is
> actually something we could backpatch. But I don't think it's urgent, so
> I'm not planning to do that for 10.2.  The one thing that could confuse
> people is that it can lead to output order changes from run to run - I
> think that's actually good, nobody should rely on hashagg etc output
> being stable, but it might be a bit much in a stable release?
I disagree: people should reasonably expect the same query and same
data and same plan to give consistent results.  When we stuck in the
"synchronous seqscans" feature, which broke that property, we were very
quickly forced by user complaints to provide a way to shut it off.
I'm also concerned that we'd have to lobotomize a bunch of regression
test cases to ensure stable results there.
IOW, I think randomizing hashkeys is unacceptable for HEAD, let alone
back-patching.
            regards, tom lane
		
	В списке pgsql-bugs по дате отправления: