Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets

Поиск

Список

Период

Сортировка

От	Robert Haas
Тема	Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets
Дата	26 февраля 2009 г. 09:22:58
Msg-id	603c8f070902260522h4230869fkf91597ad31c30279@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Ответы	Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets
Список	pgsql-hackers

Дерево обсуждения

On Thu, Feb 26, 2009 at 4:22 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> I haven't been following this thread closely, so pardon if this has been
> discussed already.
>
> The patch doesn't seem to change the cost estimates in the planner at all.
> Without that, I'd imagine that the planner rarely chooses a multi-batch hash
> join to begin with.

AFAICS, a multi-batch hash join happens when you are joining two big,
unsorted paths.  The planner essentially compares the cost of sorting
the two paths and then merge-joining them versus the cost of a hash
join.  It doesn't seem to be unusual for the hash join to come out the
winner, although admittedly I haven't played with it a ton.  You
certainly could try to model it in the costing algorithm, but I'm not
sure how much benefit you'd get out of it: if you're doing this a lot
you're probably better off creating indices.

> Joshua, in the tests that you've been running, did you have to rig the
> planner with "enable_mergjoin=off" or similar, to get the queries to use
> hash joins?

I didn't have to fiddle anything, but Josh's tests were more exhaustive.

...Robert

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets