Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys
Дата
Msg-id 874.1487273899@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Feb 16, 2017 at 2:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> This might be overly aggressive, because it will pretty much shut off
>> any attempt to use hash joining on a large inner relation unless we
>> have statistics for it (and those stats are favorable).  But having
>> seen this example, I think we need to be worried.

> I do think that's worrying, but on the other hand it seems like this
> solution could disable many hash joins that would actually be fine.  I
> don't think the largest ndistinct estimates we ever generate are very
> large, and therefore this seems highly prone to worry even when
> worrying isn't really justified.

I initially thought about driving the shutoff strictly from the estimate
of the MCV frequency, without involving the more general ndistinct
computation that estimate_hash_bucketsize does.  I'm not sure how much
that would do for your concern, but at least the MCV frequency doesn't
involve quite as much extrapolation as ndistinct.

There's a practical problem from final_cost_hashjoin's standpoint,
which is that it has noplace to cache the MCV frequency separately from
estimate_hash_bucketsize's output.  In HEAD we could just add some more
fields to RestrictInfo, but that would be an unacceptable ABI break in
the back branches.  Maybe we could get away with replacing the float8
bucketsize fields with two float4 fields --- it seems unlikely that we
need more than 6 digits of precision for these numbers, and I doubt any
extensions are touching the bucketsize fields.
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys
Следующее
От: Robert Haas
Дата:
Сообщение: Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys