Обсуждение: [PERFORM] strange and slow joining of nested views

Поиск
Список
Период
Сортировка

[PERFORM] strange and slow joining of nested views

От
Titus von Boxberg
Дата:
Dear all,

I got the following problem for which I could not find a solution by searching the archives:

I have Tables Ta, Tb, Tc with primary keys as bigserials.
Ta references Tb references Tc.
Not all but most rows in Ta reference exactly one row in Tb.

Above the Tables I have views Va, Vb, Vc that gather some other information in addition to the Tables Ta,Tb,Tc.
Va outputs columns of Vb outputs columns of Vc.
also, some other nested selects form columns of Va, Vb, Vc.

my problem is the join in Va:
It basically says

SELECT Ta.PK, Ta.FK-Tb, Ta.x…, Ta.y, Vb.x, Vb.y, …
FROM Ta, Vb
WHERE Vb.TbPK = Ta.TbFK
AND <somerestrictionOnTa>

Even if I select exactly one row of Ta which also results in exacly one row of Vb (and then Vc),
the planner seems to always create all possible lines (of a subselect) in Vb and afterwards drop all
lines but one when performing the join.

When I replace Va's join of Ta and Vb by directly placing Vb's SELECT in Va
the planner finds that it needs only one row of the former Vb (which is then incorporated in Va).

Also, when I replace the above join by subselects resulting in a
SELECT Ta.PK-id, Ta.FK-Tb, Ta.x…,
(SELECT Vb.x FROM Vb WHERE Vb.TbPK = Ta.TbFK) AS x,
(SELECT Vb.y FROM Vb WHERE Vb.TbPK = Ta.TbFK) AS y
WHERE e.g Ta.PK-id = <singlevalue>

the planner is - for each subselect - able to perform Vb's operations on only the one row
that matches the FK of Ta.
That the planner repeats this for each of the above subselects of Vb is a
different story which I don't understand either.


My question is:
Is there any way to convince the planner that it makes sense for
the Vb joined with Ta into Va to first select one row of Tb and then perform the rest of Vb on this one row?
(And why is the plan for the regular join form differing from the subselects?)

Here is the explain,analyze of Va's SELECT using four fast subselects on Vb:
https://explain.depesz.com/s/2tp

Here is it for the original Va's SELECT with the slow join of Ta, Vb:
https://explain.depesz.com/s/oKS

BTW: That some expressions in Vb are slow and inefficient is understood and can be corrected by me.
That's what made this problem visible but to my understanding this should not matter for the question.

Addition information:
"PostgreSQL 9.6.1, compiled by Visual C++ build 1800, 64-bit"
The same problem existed on 9.3 before. I updated to 9.6.1 to see if it gets better which it did not.
Hardware:
VM running Windows 10 or WS2012R2 on WS2012R2 HyperV running on Xeon E5-2600. SSD buffered HDDs.
I have the impression that this problem was at least invisible on 8.1 which I used before 9.3.

Any insight is welcome.
Let me know if more information is needed to analyze the question.

Regards,
Titus


Re: [PERFORM] strange and slow joining of nested views

От
Tom Lane
Дата:
Titus von Boxberg <titus@elbe-informatik.de> writes:
> I got the following problem for which I could not find a solution by searching the archives:
> I have Tables Ta, Tb, Tc with primary keys as bigserials.
> Ta references Tb references Tc.
> Not all but most rows in Ta reference exactly one row in Tb.

Hm, your problem query has 11 table scans (not to mention a couple of
subplans) so you're oversimplifying here.  Anyway, I think that increasing
join_collapse_limit and/or from_collapse_limit to at least 11 might help.
As-is, you're more or less at the mercy of whether your textual query
structure corresponds to a good join order.

            regards, tom lane


Re: [PERFORM] strange and slow joining of nested views

От
Titus von Boxberg
Дата:
> Von: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Gesendet: Samstag, 4. Februar 2017 06:16
> 
> Titus von Boxberg <titus@elbe-informatik.de> writes:
> > I got the following problem for which I could not find a solution by
> searching the archives:
> > I have Tables Ta, Tb, Tc with primary keys as bigserials.
> > Ta references Tb references Tc.
> > Not all but most rows in Ta reference exactly one row in Tb.
> 
> Hm, your problem query has 11 table scans (not to mention a couple of
> subplans) so you're oversimplifying here.  Anyway, I think that
> increasing join_collapse_limit and/or from_collapse_limit to at least 11
> might help.
> As-is, you're more or less at the mercy of whether your textual query
> structure corresponds to a good join order.
> 
>             regards, tom lane

Thanks, I found the problem:

In the slow join case the planner always fails to restrict
one subselect in the joined view using EXISTS and one with a SUM clause
to the the one row that actually gets used by the join.
Both use functions that I forgot to declare STABLE.
After correcting this, the query is fast and the explain output looks like expected.

Still, it would be nice to know what makes the join different from a subselect.
setting geqo = off and varying join_collapse_limit and from_collapse_limit
from 1 to 50 did not change anything in the initial behaviour.
Shouldn't the planner eventually find them being equivalent?

Regards,
Titus