Re: BUG #2930: Hash join abyssmal with many null fields.

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: BUG #2930: Hash join abyssmal with many null fields.
Дата
Msg-id 25727.1169832146@sss.pgh.pa.us
обсуждение исходный текст
Ответ на BUG #2930: Hash join abyssmal with many null fields.  ("Maciej Babinski" <maciej+postgres@apathy.killer-robot.net>)
Ответы Re: BUG #2930: Hash join abyssmal with many null fields.  (Maciej Babinski <maciej@killer-robot.net>)
Список pgsql-bugs
Maciej Babinski <maciej@apathy.killer-robot.net> writes:
> Tom Lane wrote:
>> I see no bug here.  AFAICT your "much faster" query gets that way by
>> having eliminated all the candidate join rows on the B side.

> The additional clause eliminates no rows beyond what the existing
> clause would.  Any row eliminated by "b.join_id IS NOT NULL" could not
> possibly have satisfied "a.join_id = b.join_id".

Hmm.  You assume that the = operator is strict, which is probably true,
but the hash join code isn't assuming that.

It might be worth checking for the case, though.  What's happening,
since we go ahead and put the null rows into the hash table, is that
they all end up in the same hash chain because they all get hash code 0.
And then that very long chain gets searched for each null outer row.
If we knew the join operator is strict we could discard null rows
immediately on both sides.

> Please note that if the join columns are not null, but still produce
> no matches for the join, the results are fast without the need for an
> extra clause in the join:

Yeah, because the rows get reasonably well distributed into different
hash buckets.  The optimizer will avoid a hash if it sees the data is
not well-distributed, but IIRC it's not considering nulls when it
makes that decision.

            regards, tom lane

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: BUG #2930: Hash join abyssmal with many null fields.
Следующее
От: "Michael Schmidt"
Дата:
Сообщение: BUG #2931: Can't capture pg_dump Password prompt