Re: [HACKERS] <> join selectivity estimate question

Поиск
Список
Период
Сортировка
От Ashutosh Bapat
Тема Re: [HACKERS] <> join selectivity estimate question
Дата
Msg-id CAFjFpReAhpxukbBMKfrZ7NLM3T-0imUAQUnPw55QG_AHOQF1+Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] <> join selectivity estimate question  (Thomas Munro <thomas.munro@enterprisedb.com>)
Ответы Re: [HACKERS] <> join selectivity estimate question  (Thomas Munro <thomas.munro@enterprisedb.com>)
Список pgsql-hackers
On Thu, Jul 20, 2017 at 11:04 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Fri, Jun 2, 2017 at 4:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I don't think it does really.  The thing about a <> semijoin is that it
>> will succeed unless *every* join key value from the inner query is equal
>> to the outer key value (or is null).  That's something we should consider
>> to be of very low probability typically, so that the <> selectivity should
>> be estimated as nearly 1.0.  If the regular equality selectivity
>> approaches 1.0, or when there are expected to be very few rows out of the
>> inner query, then maybe the <> estimate should start to drop off from 1.0,
>> but it surely doesn't move linearly with the equality selectivity.
>
> Ok, here I go like a bull in a china shop: please find attached a
> draft patch.  Is this getting warmer?
>
> In the comment for JOIN_SEMI I mentioned a couple of refinements I
> thought of but my intuition was that we don't go for such sensitive
> and discontinuous treatment of stats; so I made the simplifying
> assumption that RHS always has more than 1 distinct value in it.
>
> Anti-join <> returns all the nulls from the LHS, and then it only
> returns other LHS rows if there is exactly one distinct non-null value
> in RHS and it happens to be that one.  But if we make the same
> assumption I described above, namely that there are always at least 2
> distinct values on the RHS, then the join selectivity is just
> nullfrac.
>

The patch looks good to me.

+       /*
+        * For semi-joins, if there is more than one distinct key in the RHS
+        * relation then every non-null LHS row must find a match since it can
+        * only be equal to one of them.
The word "match" confusing. Google's dictionary entry gives "be equal
to (something) in quality or strength." as its meaning. May be we want
to reword it as "... LHS row must find a joining row in RHS ..."?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Etsuro Fujita
Дата:
Сообщение: [HACKERS] Mishandling of WCO constraints in direct foreign table modification
Следующее
От: Yugo Nagata
Дата:
Сообщение: [HACKERS] [PATCH] A hook for session start