Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys
Дата
Msg-id CA+Tgmoa3UdJ0cOU_vVyWFZM9mPkLvLdqZWxNow-2wPkk2xtL3g@mail.gmail.com
обсуждение исходный текст
Ответ на [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys  (Peter Geoghegan <pg@bowt.ie>)
Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Thu, Feb 16, 2017 at 2:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The planner doesn't currently worry about work_mem restrictions when
> planning a hash join, figuring that the executor should be able to
> subdivide the data arbitrarily finely by splitting buckets at runtime.
> However there's a thread here:
> https://www.postgresql.org/message-id/flat/CACw4T0p4Lzd6VpwptxgPgoTMh2dEKTQBGu7NTaJ1%2BA0PRx1BGg%40mail.gmail.com
> exhibiting a case where a hash join was chosen even though a single
> value accounts for three-quarters of the inner relation.  Bucket
> splitting obviously can never separate multiple instances of the
> same value, so this choice forced the executor to try to load
> three-quarters of the (very large) inner relation into memory at once;
> unsurprisingly, it failed.
>
> To fix this, I think we need to discourage use of hash joins whenever
> a single bucket is predicted to exceed work_mem, as in the attached
> draft patch.  The patch results in changing from hash to merge join
> in one regression test case, which is fine; that case only cares about
> the join order not the types of the joins.
>
> This might be overly aggressive, because it will pretty much shut off
> any attempt to use hash joining on a large inner relation unless we
> have statistics for it (and those stats are favorable).  But having
> seen this example, I think we need to be worried.

I do think that's worrying, but on the other hand it seems like this
solution could disable many hash joins that would actually be fine.  I
don't think the largest ndistinct estimates we ever generate are very
large, and therefore this seems highly prone to worry even when
worrying isn't really justified.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: [HACKERS] Avoiding OOM in a hash join with many duplicate inner keys