Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions

Поиск
Список
Период
Сортировка
От David G. Johnston
Тема Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions
Дата
Msg-id CAKFQuwYCPBfqrPt6d97HXC8mGVr04ySAXsJc9BhVceVb+RWXGw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions  ("David G. Johnston" <david.g.johnston@gmail.com>)
Ответы Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions
Список pgsql-general
On Mon, Jul 24, 2017 at 7:58 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Mon, Jul 24, 2017 at 3:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The cost to form the inner hash is basically negligible whether it's
de-duped or not, but if it's not (known) de-duped then the cost
estimate for the semijoin is going to rise some, and that discourages
selecting it.

​Why does the "hash semi join" care about duplication of values on the inner relation?  Doesn't it only care whether a given bucket exists irrespective of its contents?

​Rather, it cares about the contents is-so-far as confirming that at least one of the tuples in the bucket indeed has the same joining value as the outer relation (lost track of the fact that two values can share the same hash).  But once it finds one it can move onto the new outer relation tuple while an inner join would have to spend more time looking for additional matches.

David J.

В списке pgsql-general по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [GENERAL] Perfomance of IN-clause with many elements and possible solutions