Re: How to hint 2 coulms IS NOT DISTINCT FROM each other

Поиск
Список
Период
Сортировка
От Merlin Moncure
Тема Re: How to hint 2 coulms IS NOT DISTINCT FROM each other
Дата
Msg-id CAHyXU0whPYPA8LGw7cxSjN8rvFO=CurzaLwSZhfCOm7WMA+cfQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: How to hint 2 coulms IS NOT DISTINCT FROM each other  (Kim Rose Carlsen <krc@hiper.dk>)
Ответы Re: How to hint 2 coulms IS NOT DISTINCT FROM each other  (Alban Hertroys <haramrae@gmail.com>)
Список pgsql-general
On Fri, Nov 4, 2016 at 8:08 AM, Kim Rose Carlsen <krc@hiper.dk> wrote:
>>> It might raise another problem, that the nulls are generated through LEFT
>
>>> JOINS where no rows are defined. Then the 0 or -1 value need to be
>>> a computed value. Won't this throw off index lookups? (I might be
>>> more confused in this area).
>>
>>Not following this.
>
> The nulls are generated by something like this
>     SELECT c.circuit_id,
>                    cc.customer_id
>        FROM circuit AS c
> LEFT JOIN circuit_customer AS cc
>              ON c.circuit_id = cc.circuit_id
>
> To make a magic '0' customer we would be required to use
>   COALESCE(cc.customer_id, '0')
> I dont think the optimizer will do anything clever with the '0' we have
> computed from null.

It would if you explicitly indexed it as such;
CREATE INDEX ON circuit_customer((COALESCE(customer_id, '0'));

> I could ofc. by default assign all unassigned circuits to '0' in
> circuit_customer. I'm not a fan though.

hm, why not?  null generally means 'unknown' and that's why it fails
any equality test.

>>BTW, if you want a fast plan over the current
>>data without consideration of aesthetics, try this:
>>
>>CREATE VIEW view_circuit_with_status AS (
>>    SELECT r.*,
>>          s.circuit_status,
>>          s.customer_id AS s_customer_id,
>>          p.line_speed,
>>          p.customer_id AS p_customer_id
>>     FROM view_circuit r
>>     JOIN view_circuit_product_main s
>>      ON r.circuit_id = s.circuit_id
>>      AND r.customer_id, s.customer_id
>>     JOIN view_circuit_product p
>>       ON r.circuit_id = p.circuit_id
>>      AND r.customer_id, s.customer_id
>>  UNION ALL SELECT r.*,
>>          s.circuit_status,
>>          s.customer_id AS s_customer_id,
>>          p.line_speed,
>>          p.customer_id AS p_customer_id
>>     FROM view_circuit r
>>     JOIN view_circuit_product_main s
>>       ON r.circuit_id = s.circuit_id
>>      AND r.customer_id IS NULL
>>      AND  s.customer_id IS NULL
>>     JOIN view_circuit_product p
>>       ON r.circuit_id = p.circuit_id>
>
> I will have to figure something out, but this specific case is still
> problematic
> since we would like to filter this view using different criteria's, like
> circuit_no,
> products or customers.

the above is logically equivalent to IS NOT DISTINCT FROM; you should
be able to query it as you would have done the original view.

> But with all these detours, I assume that a change to IS NOT DISTINCT FROM,
> is difficult or not wanted?

Well, not exactly.  In your case you are trying to treat null as a
specific value and pass it through join operations.

TBH, this is a pretty dubious approach: null is not supposed to be
equal to anything and any join vs null should come up empty --
logically at least.  INDF works around this of course but it's not a
recommended approach (my usage is generally restricted to, "has this
value changed since yesterday? etc").

I'm not an expert backend structures for indexing and optimization but
I know enough to suspect that optimizing INDF might cause
implementation headaches in various places, as do other irregular
syntactical approaches in SQL.  I think minimally optimizing INDF
would require converting it to an operator on par with '=' which is a
pretty large infrastructure change for an edge optimization case.
The fact that there are solid optimization strategies already on the
table (UNION ALL, expr index COALESCE()) does not help.

merlin


В списке pgsql-general по дате отправления:

Предыдущее
От: Kim Rose Carlsen
Дата:
Сообщение: Re: How to hint 2 coulms IS NOT DISTINCT FROM each other
Следующее
От: "Hu, Patricia"
Дата:
Сообщение: What is the best thing to do with PUBLIC schema in Postgresql database