Re: [HACKERS] <> join selectivity estimate question

Поиск
Список
Период
Сортировка
От Ashutosh Bapat
Тема Re: [HACKERS] <> join selectivity estimate question
Дата
Msg-id CAFjFpRe3OP4FRkLyXO45zn2hEXa9AyjtXGdDHZRFqFGZE+ZYNw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] <> join selectivity estimate question  (Thomas Munro <thomas.munro@enterprisedb.com>)
Ответы Re: [HACKERS] <> join selectivity estimate question  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Thu, Jul 20, 2017 at 5:30 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Thu, Jul 20, 2017 at 11:47 PM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>> On Thu, Jul 20, 2017 at 11:04 AM, Thomas Munro
>> <thomas.munro@enterprisedb.com> wrote:
>>> On Fri, Jun 2, 2017 at 4:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> I don't think it does really.  The thing about a <> semijoin is that it
>>>> will succeed unless *every* join key value from the inner query is equal
>>>> to the outer key value (or is null).  That's something we should consider
>>>> to be of very low probability typically, so that the <> selectivity should
>>>> be estimated as nearly 1.0.  If the regular equality selectivity
>>>> approaches 1.0, or when there are expected to be very few rows out of the
>>>> inner query, then maybe the <> estimate should start to drop off from 1.0,
>>>> but it surely doesn't move linearly with the equality selectivity.
>>>
>>> Ok, here I go like a bull in a china shop: please find attached a
>>> draft patch.  Is this getting warmer?
>>>
>>> In the comment for JOIN_SEMI I mentioned a couple of refinements I
>>> thought of but my intuition was that we don't go for such sensitive
>>> and discontinuous treatment of stats; so I made the simplifying
>>> assumption that RHS always has more than 1 distinct value in it.
>>>
>>> Anti-join <> returns all the nulls from the LHS, and then it only
>>> returns other LHS rows if there is exactly one distinct non-null value
>>> in RHS and it happens to be that one.  But if we make the same
>>> assumption I described above, namely that there are always at least 2
>>> distinct values on the RHS, then the join selectivity is just
>>> nullfrac.
>>>
>>
>> The patch looks good to me.
>>
>> +       /*
>> +        * For semi-joins, if there is more than one distinct key in the RHS
>> +        * relation then every non-null LHS row must find a match since it can
>> +        * only be equal to one of them.
>> The word "match" confusing. Google's dictionary entry gives "be equal
>> to (something) in quality or strength." as its meaning. May be we want
>> to reword it as "... LHS row must find a joining row in RHS ..."?
>
> Thanks!  Yeah, here's a version with better comments.

Thanks. Your version is better than mine.

>
> Does anyone know how to test a situation where the join is reversed according to
> get_join_variables, or "complicated cases where we can't tell for sure"?
>

explain select * from pg_class c right join pg_type t on (c.reltype =
t.oid); would end up with  *join_is_reversed = true; Is that what you
want? For a semi-join however I don't know how to induce that. AFAIU,
in a semi-join there is only one direction in which join can be
specified.

I didn't get the part about "complicated cases where we can't tell for sure".
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: tushar
Дата:
Сообщение: [HACKERS] pg_upgrade failed if view contain natural left join condition
Следующее
От: tushar
Дата:
Сообщение: [HACKERS] pg_upgrade failed if view is based on sequence