Re: How to hint 2 coulms IS NOT DISTINCT FROM each other

Поиск
Список
Период
Сортировка
От Kim Rose Carlsen
Тема Re: How to hint 2 coulms IS NOT DISTINCT FROM each other
Дата
Msg-id AM4PR0501MB26107E6E6E83C44647A0275CC7AD0@AM4PR0501MB2610.eurprd05.prod.outlook.com
обсуждение исходный текст
Ответ на Re: How to hint 2 coulms IS NOT DISTINCT FROM each other  (Scott Marlowe <scott.marlowe@gmail.com>)
Ответы Re: How to hint 2 coulms IS NOT DISTINCT FROM each other  (Merlin Moncure <mmoncure@gmail.com>)
Re: How to hint 2 coulms IS NOT DISTINCT FROM each other  ("David G. Johnston" <david.g.johnston@gmail.com>)
Список pgsql-general

On Fri, Oct 28, 2016 at 10:29 AM, Kim Rose Carlsen <krc@hiper.dk> wrote:

> Hi
>
> I was wondering if there is a way to hint that two columns in two different
> tables IS NOT DISTINCT FROM each other. So that the optimizer may assume if
> table_a.key = 'test' THEN table_b.key = 'test' .
>
> The equals operator already does this but it does not handle NULLS very well
> (or not at all). And it seems IS NOT DISTINCT FROM is not indexable and
> doesn't establish the same inference rules as equals.

The whole idea behing Postgres' query planner is that you don't have
to use any hints. Late model versions of postgres handle nulls fine,
but nulls are never "equal" to anything else. I.e. where xxx is null
works with indexes. Where x=y does not, since null <> null.

Suggestion for getting help, put a large-ish aka production sized
amount of data into your db, run your queries with explain analyze and
feed them to https://explain.depesz.com/ and post the links here along
with the slow queries. A lot of times the fix is non-obvious if you're
coming from another db with a different set of troubleshooting skills
> for slow queries.

The problem is how to reduce the problem into its core, without introducing 
all the unnecessary.

Maybe simplifying the problem, also makes it impossible to say where I go 
wrong. It might be that I try to push too much logic into the SQL layer 
and Im adding too many layers of abstraction to accomplish what I want. 
So let me try and elaborate a little more.

I have couple a tables describing resources (circuits) and allocation 
of resources to customers and products.

First layer is a view called view_circuit. This view (left) join any table 
the circuit table reference through a foreign key (it gives exactly the same
rows and columns as circuit table + some extra information like customer_id).

Second layer is 2 views
1) a view describing if the circuit is active or inactive, lets call it 
   view_circuit_product_main
2) a view describing line_speed about the circuit, lets call it 
   view_circuit_product

These views use aggregations (both GROUP BY and SELECT DISTINCT ON (...)) 
if this has any relevance.

Third layer
Next step is to add a view that tells both (joins the two views together
on circuit_id). lets call the new view view_circuit_with_status

This view is defined as

CREATE VIEW view_circuit_with_status AS (
   SELECT r.*,
          s.circuit_status,
          s.customer_id AS s_customer_id,
          p.line_speed,
          p.customer_id AS p_customer_id
     FROM view_circuit r
     JOIN view_circuit_product_main s
       ON r.circuit_id = s.circuit_id
      AND r.customer_id IS NOT DISTINCT FROM s.customer_id
     JOIN view_circuit_product p
       ON r.circuit_id = p.circuit_id
      AND r.customer_id IS NOT DISTINCT FROM s.customer_id
);

SELECT * FROM view_circuit_with_status WHERE customer_id = 1;

Since customer_id is exposed through view_circuit the planner assumes view_circuit.customer_id = 1 and from there attempts to join 
view_circuit_product_main and view_circuit_product using circuit_id. 
This is not running optimal.

However if we change our query to allow the inference rule to take place, the query is executed very fast.

SELECT * FROM view_circuit_with_status WHERE customer_id = 1 AND s_customer_id = 1 AND p_customer_id = 1;

If a circuit is not assigned to any customers customer_id is set to NULL. This is the reason I can't use = operator. If I do use = then I can't find circuit which are unassigned, but the query do run effective.

I can see this still ends up being quite abstract, but the point is it would be quite beneficial if IS NOT DISTINCT used the same rules as = operator.

I have attached the 2 query plans


-
Kim Carlsen
Do you use potatoes for long posts here?







В списке pgsql-general по дате отправления:

Предыдущее
От: Melvin Davidson
Дата:
Сообщение: Re: Remove Standby (SLAVE) from Primary (MASTER) - Postgres9.1
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: How to hint 2 coulms IS NOT DISTINCT FROM each other