Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys
Дата
Msg-id 4199.1487278280@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Feb 16, 2017 at 2:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I initially thought about driving the shutoff strictly from the estimate
>> of the MCV frequency, without involving the more general ndistinct
>> computation that estimate_hash_bucketsize does.  I'm not sure how much
>> that would do for your concern, but at least the MCV frequency doesn't
>> involve quite as much extrapolation as ndistinct.

> Hmm, so we could do something like: if the estimated frequency of the
> least-common MCV is enough to make one bucket overflow work_mem, then
> don't use a hash join?  That would still be prone to some error (in
> both directions, really) but it seems less likely to spit out
> completely stupid results than relying on ndistinct, which never gets
> very big even in a 10TB table.

No, it'd be the *most* common MCV, because we're concerned about the
worst-case (largest) bucket size.  But that's good, really, because the
highest MCV frequency will be the one we have most statistical
confidence in.  There's generally a whole lot of noise in the tail-end
MCV numbers.

Also, I'd be inclined to do nothing (no shutoff) if we have no MCV
stats.  That would be an expected case if the column is believed unique,
and it's probably a better fallback behavior when we simply don't have
stats.  With the ndistinct-based rule, we'd be shutting off hashjoin
almost always when we don't have stats.  Given how long it took us
to recognize this problem, that's probably the wrong default.
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fabien COELHO
Дата:
Сообщение: Re: [HACKERS] Small issue in online devel documentation build
Следующее
От: Robert Haas
Дата:
Сообщение: Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys