Re: a few crazy ideas about hash joins

Поиск

Список

Период

Сортировка

От	Lawrence, Ramon
Тема	Re: a few crazy ideas about hash joins
Дата	3 апреля 2009 г. 15:24:22
Msg-id	6EEA43D22289484890D119821101B1DF05190DEB@exchange20.mercury.ad.ubc.ca обсуждение исходный текст
Ответ на	a few crazy ideas about hash joins (Robert Haas <robertmhaas@gmail.com>)
Ответы	Re: a few crazy ideas about hash joins (Robert Haas <robertmhaas@gmail.com>) Re: a few crazy ideas about hash joins (Greg Stark <stark@enterprisedb.com>)
Список	pgsql-hackers

Дерево обсуждения

> While investigating some performance problems recently I've had cause
> to think about the way PostgreSQL uses hash joins.  So here are a few
> thoughts.  Some of these have been brought up before.
>
> 1. When the hash is not expected to spill to disk, it preserves the
> pathkeys of the outer side of the join.  If the optimizer were allowed
> to assume that, it could produce significantly more efficient query
> plans in some cases.

This is definitely possible, but you will have to dynamically modify the
execution path if the hash join ends up to be more than one batch.

> 3. Avoid building the exact same hash table twice in the same query.
> This happens more often you'd think.  For example, a table may have
> two columns creator_id and last_updater_id which both reference person
> (id).  If you're considering a hash join between paths A and B, you
> could conceivably check whether what is essentially a duplicate of B
> has already been hashed somewhere within path A.  If so, you can reuse
> that same hash table at zero startup-cost.
> 4. As previously discussed, avoid hashing for distinct and then
> hashing the results for a hash join on the same column with the same
> operators.
>
> Thoughts on the value and/or complexity of implementation of any of
these?

I would be interested in working with you on any of these changes to
hash join if you decide to pursue them.   I am especially interested in
looking at the hash aggregation code and potentially improving its
efficiency.

We have implemented a multi-way hash join (can join more than 2 tables
at a time) which may help with cases #3 and #4.  Performance results
look very good, and we are planning on building a patch for this over
the summer.

--
Ramon Lawrence

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Tom Lane
Дата: 03 апреля 2009 г., 15:23:32
Сообщение: Re: Path separator

Следующее

От: Tom Lane
Дата: 03 апреля 2009 г., 15:38:06
Сообщение: Re: Crash in gist insertion on pathological box data

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: a few crazy ideas about hash joins

Предыдущее

Следующее