Обсуждение: Eliminating unnecessary left joins

Поиск
Список
Период
Сортировка

Eliminating unnecessary left joins

От
"Ottó Havasvölgyi"
Дата:
Hi,
 
When using views built with left joins, and then querying against these views, there are a lot of join in the plan that are not necessary, because I don't select/use any column of each table in the views every time. Tables that are left joined and never referenced anywhere else in the query  should be removed from the plan. I think this can be done without any other analyzation or catalog lookup, so it is a quite cheap optimization step, and doing it won't influence the result, but the query will run faster.
This way with a complex query against these views usually the half of the join can be eliminated, and the plan will be quite more optimal.
Why left join a table if never used/referenced in the query?
 
How easy is to teach Postgres to this?
 
I would like to help somehow to introduce this feature as soon as possible. What should I do?
 
Thanks,
Otto

Re: Eliminating unnecessary left joins

От
Tom Lane
Дата:
"Ottó Havasvölgyi" <havasvolgyi.otto@gmail.com> writes:
> When using views built with left joins, and then querying against these
> views, there are a lot of join in the plan that are not necessary, because I
> don't select/use any column of each table in the views every time. Tables
> that are left joined and never referenced anywhere else in the query  should
> be removed from the plan.

That might cause you to get the wrong number of copies of some rows ---
what if a row of the left table should join to multiple rows on the right?
        regards, tom lane


Re: Eliminating unnecessary left joins

От
Andreas Pflug
Дата:
Tom Lane wrote:
> "Ottó Havasvölgyi" <havasvolgyi.otto@gmail.com> writes:
>   
>> When using views built with left joins, and then querying against these
>> views, there are a lot of join in the plan that are not necessary, because I
>> don't select/use any column of each table in the views every time. Tables
>> that are left joined and never referenced anywhere else in the query  should
>> be removed from the plan.
>>     
>
> That might cause you to get the wrong number of copies of some rows ---
> what if a row of the left table should join to multiple rows on the right?
>   
That would be trouble. But I've seen quite some cases where the right
can contain only zero or one row, because of PK constraints. In this
case, elimination would be safe.


Regards,
Andreas


Re: Eliminating unnecessary left joins

От
"Simon Riggs"
Дата:
On Fri, 2007-04-06 at 19:38 -0400, Tom Lane wrote:
> "Ottó Havasvölgyi" <havasvolgyi.otto@gmail.com> writes:
> > When using views built with left joins, and then querying against these
> > views, there are a lot of join in the plan that are not necessary, because I
> > don't select/use any column of each table in the views every time. Tables
> > that are left joined and never referenced anywhere else in the query  should
> > be removed from the plan.
> 
> That might cause you to get the wrong number of copies of some rows ---
> what if a row of the left table should join to multiple rows on the right?

In the case that PKs match between the tables, then exclusion is safe. 

This would enable vertical partitioning, so is a very desirable feature.

If this was possible, it would be a commonly used optimisation.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: Eliminating unnecessary left joins

От
"Nicolas Barbier"
Дата:
2007/4/7, Andreas Pflug <pgadmin@pse-consulting.de>:

> Tom Lane wrote:
>
>> "Ottó Havasvölgyi" <havasvolgyi.otto@gmail.com> writes:
>>
>>> When using views built with left joins, and then querying against these
>>> views, there are a lot of join in the plan that are not necessary, because I
>>> don't select/use any column of each table in the views every time. Tables
>>> that are left joined and never referenced anywhere else in the query  should
>>> be removed from the plan.
>>
>> That might cause you to get the wrong number of copies of some rows ---
>> what if a row of the left table should join to multiple rows on the right?
>
>That would be trouble. But I've seen quite some cases where the right
>can contain only zero or one row, because of PK constraints. In this
> case, elimination would be safe.

I would like to mention that this kind of structure is used by
Hibernate (ORM for Java/.NET) for mapping class hierarchies. I can
attest that this optimization is supported by MS-SQL and I think (not
tested) also by Oracle.

To recapitulate, the optimization would be: Remove left outer joined
tables from the join list, if they are not used by the query, and the
join attributes are a key for it (I assume an equality join).

Typical example:

PARENT_CLASS (PK: ID)
CHILD_CLASS (PK: ID)

In query:

SELECT   P.ID
FROM                   PARENT_CLASS P   LEFT OUTER JOIN CHILD_CLASS C ON P.ID = C.ID;

the join on CHILD_CLASS can be eliminated, because the join attribute
ID is a key for it, and none of its attributes are used in the query.

Hibernate:
<url:http://www.hibernate.org/>

Hibernate Inheritance Mapping:
<url:http://www.hibernate.org/hib_docs/reference/en/html/inheritance.html>

greetings,
Nicolas

-- 
Nicolas Barbier
http://www.gnu.org/philosophy/no-word-attachments.html


Re: Eliminating unnecessary left joins

От
"Ottó Havasvölgyi"
Дата:
Sorry, I have left out the PK requirement.
What Nicolas wrote is right, I also use an O/R mapper and inheritance is solved with vertical partitioning. The tables are connected to each other with the PK. And the mapper defines views for each class with left joins. The mapper generates queries based on these views. A high fraction of the joins would be eliminated almost in every query.
 
My simple example:
 
Class hierarchy and fields:
Shape (ID, X, Y)
|
+-Circle (ID, Radius)
|
+-Rectangle (ID, Width, Height)
 
The mapper creates 3 tables with the columns next to the class name.
And it creates 3 views. One of them: 
 
RectangleView:  SELECT r."ID" as "ID", s."X" as "X", s."Y" as "Y", r."Width" as "Width", r."Height" as "Height" FROM "Rectangle" r LEFT JOIN "Shape" s ON ( r.ID=s.ID)
 
Now if I query Rectangle object IDs, whose Width is greater than 5, it will generate this:
 
SELECT "ID" FROM RectangleView WHERE "Width">5
 
In this case I don't need to left join the Shape table, because X and Y columns are not used.
 
 
The other typical situation is when I execute more complex, not-O/Rmapper-generated SQL commands based on these views for reporting. For example the average width of rectangles whose height is greater than 10.
----------------------------------------------------
 
This optimization should be also applied to subqueries.
 
 
 
Is this optimization relatively easy to introduce?
 
I would gladly work on this, but unfortunately I don't know the codebase at all.
I would really appreciate if someone competent implemented this feature in 8.4.
 
Thank you in advance,
Otto
 

Re: Eliminating unnecessary left joins

От
"Nicolas Barbier"
Дата:
2007/4/7, Ottó Havasvölgyi <havasvolgyi.otto@gmail.com>:

> My simple example:
>
> Class hierarchy and fields:
> Shape (ID, X, Y)
> |
> +-Circle (ID, Radius)
> |
> +-Rectangle (ID, Width, Height)
>
> The mapper creates 3 tables with the columns next to the class name.
> And it creates 3 views. One of them:
>
> RectangleView:  SELECT r."ID" as "ID", s."X" as "X", s."Y" as "Y", r."Width"
> as "Width", r."Height" as "Height" FROM "Rectangle" r LEFT JOIN "Shape" s ON
> ( r.ID=s.ID)

I find this view definition a bit strange: why is there a left outer
join? I expect there to be a FK from Rectangle.ID to Shape.ID ("all
rectangles are shapes"), which makes the definition totally equivalent
with one in which a normal join is used (whether attributes of Shape
are used or not).

The main use case I see for the original optimization is ORMs that
join in a whole hierarchy, even when only a part of it is needed. I
guess that that is rather common. The ORM that I use does exactly
this, because the main target-DBMSs (MS-SQL and Oracle) do the
optimization for it.

Example (somewhat less contrived than my previous one):

Imagine an implementation of the typical "books that are borrowed by
people" n-m relationship, using three tables ("Book", "Borrowed",
"Person"). Let's find all books that have been borrowed by a certain
person.

The "non-ORM" version would be something like:

SELECT Book.*
FROM        Book   JOIN Borrowed ON Borrowed.book_id = Book.id
WHERE Borrowed.person_id = <x>;

Now assume that Borrowed is a class hierarchy mapped into multiple
tables by a typical ORM. The query would probably become something
like:

SELECT Book.*
FROM             Book        JOIN Borrowed_Parent ON Borrowed_Parent.book_id = Book.id   LEFT JOIN Borrowed_Child1 ON
Borrowed_Child1.id= Borrowed_Parent.id   LEFT JOIN Borrowed_Child2 ON Borrowed_Child2.id = Borrowed_Parent.id   (...) 
WHERE Borrowed_Parent.person_id = <x>;

It is clear that the children of the hierarchy are needlessly joined
in (as the only attribute that is actually needed is person_id, which
is on the parent level). It is not always trivial for the ORM to find
that out, without writing stuff that looks suspiciously similar to a
DBMS optimizer.

Maybe it is debatable whether this optimization should be done by the
application (i.e. the ORM) or by the DBMS. I am personally in favor of
doing it in the DBMS.

greetings,
Nicolas

--
Nicolas Barbier
http://www.gnu.org/philosophy/no-word-attachments.html


Re: Eliminating unnecessary left joins

От
"Ottó Havasvölgyi"
Дата:
My mapper joins the parent classes' tables to the current class' table in the view. In the ShapeView only ID, X, and Y is selected from the shape table, and none of the child tables are touched, opposite to your sample. But even though all Shape objects (circles and rectangles too) are in the resultset as Shape objects. I see this storage model quite consistent. 
You are right, that this can be done with inner join too, this is an option in the mapper. Oracle and MSSQL performs this left join optimization, so it is usually used with left join by other mapper users. I have asked them (the developers of the mapper) to perform this optimization at mapper level because not all DBMSs supported this optimization, but it seemed they didn't like this idea... And then I came here. This optimization would be useful for every Postgres users.
 
To be honest I also find your sample strange, more exactly that *sibling* child tables are left joined to the parent. Maybe because the storage model is different than in my mapper.
 
In my case the left joined parent tables should be excluded by the optimizer if possible.
 
Best regards,
Otto
 

 
2007/4/8, Nicolas Barbier <nicolas.barbier@gmail.com>:
2007/4/7, Ottó Havasvölgyi < havasvolgyi.otto@gmail.com>:

> My simple example:
>
> Class hierarchy and fields:
> Shape (ID, X, Y)
> |
> +-Circle (ID, Radius)
> |
> +-Rectangle (ID, Width, Height)
>
> The mapper creates 3 tables with the columns next to the class name.
> And it creates 3 views. One of them:
>
> RectangleView:  SELECT r."ID" as "ID", s."X" as "X", s."Y" as "Y", r."Width"
> as "Width", r."Height" as "Height" FROM "Rectangle" r LEFT JOIN "Shape" s ON
> ( r.ID=s.ID)

I find this view definition a bit strange: why is there a left outer
join? I expect there to be a FK from Rectangle.ID to Shape.ID ("all
rectangles are shapes"), which makes the definition totally equivalent
with one in which a normal join is used (whether attributes of Shape
are used or not).

The main use case I see for the original optimization is ORMs that
join in a whole hierarchy, even when only a part of it is needed. I
guess that that is rather common. The ORM that I use does exactly
this, because the main target-DBMSs (MS-SQL and Oracle) do the
optimization for it.

Example (somewhat less contrived than my previous one):

Imagine an implementation of the typical "books that are borrowed by
people" n-m relationship, using three tables ("Book", "Borrowed",
"Person"). Let's find all books that have been borrowed by a certain
person.

The "non-ORM" version would be something like:

SELECT Book.*
FROM
        Book
   JOIN Borrowed ON Borrowed.book_id = Book.id
WHERE Borrowed.person_id = <x>;

Now assume that Borrowed is a class hierarchy mapped into multiple
tables by a typical ORM. The query would probably become something
like:

SELECT Book.*
FROM
             Book
        JOIN Borrowed_Parent ON Borrowed_Parent.book_id = Book.id
   LEFT JOIN Borrowed_Child1 ON Borrowed_Child1.id = Borrowed_Parent.id
   LEFT JOIN Borrowed_Child2 ON Borrowed_Child2.id = Borrowed_Parent.id
   (...)
WHERE Borrowed_Parent.person_id = <x>;

It is clear that the children of the hierarchy are needlessly joined
in (as the only attribute that is actually needed is person_id, which
is on the parent level). It is not always trivial for the ORM to find
that out, without writing stuff that looks suspiciously similar to a
DBMS optimizer.

Maybe it is debatable whether this optimization should be done by the
application (i.e. the ORM) or by the DBMS. I am personally in favor of
doing it in the DBMS.

greetings,
Nicolas

--
Nicolas Barbier
http://www.gnu.org/philosophy/no-word-attachments.html

Re: Eliminating unnecessary left joins

От
Jim Nasby
Дата:
I agree with others that the way that query is constructed is a bit
odd, but it does bring another optimization to mind: when doing an
inner-join between a parent and child table when RI is defined
between them, if the query only refers to the child table you can
drop the parent table from the join, because each row in the child
table must have one and only one row in the parent.

Use-case: I'll often use views to make it easier to query several
related tables, but not all queries against the views need to hit
every table. IE: if a table has several status fields that have RI to
parent tables that describe what each status is, you sometimes will
query for the status description, sometimes not.

I suspect that checking to see if tables have the right unique keys
or RI would add a noticeable amount of extra work to query planning,
so we might want a GUC to disable it.

On Apr 7, 2007, at 12:45 PM, Ottó Havasvölgyi wrote:

> Sorry, I have left out the PK requirement.
> What Nicolas wrote is right, I also use an O/R mapper and
> inheritance is solved with vertical partitioning. The tables are
> connected to each other with the PK. And the mapper defines views
> for each class with left joins. The mapper generates queries based
> on these views. A high fraction of the joins would be eliminated
> almost in every query.
>
> My simple example:
>
> Class hierarchy and fields:
> Shape (ID, X, Y)
> |
> +-Circle (ID, Radius)
> |
> +-Rectangle (ID, Width, Height)
>
> The mapper creates 3 tables with the columns next to the class name.
> And it creates 3 views. One of them:
>
> RectangleView:  SELECT r."ID" as "ID", s."X" as "X", s."Y" as "Y",
> r."Width" as "Width", r."Height" as "Height" FROM "Rectangle" r
> LEFT JOIN "Shape" s ON ( r.ID=s.ID)
>
> Now if I query Rectangle object IDs, whose Width is greater than 5,
> it will generate this:
>
> SELECT "ID" FROM RectangleView WHERE "Width">5
>
> In this case I don't need to left join the Shape table, because X
> and Y columns are not used.
>
>
> The other typical situation is when I execute more complex, not-O/
> Rmapper-generated SQL commands based on these views for reporting.
> For example the average width of rectangles whose height is greater
> than 10.
> ----------------------------------------------------
>
> This optimization should be also applied to subqueries.
>
>
>
> Is this optimization relatively easy to introduce?
>
> I would gladly work on this, but unfortunately I don't know the
> codebase at all.
> I would really appreciate if someone competent implemented this
> feature in 8.4.
>
> Thank you in advance,
> Otto
>

--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)




Re: Eliminating unnecessary left joins

От
"Ottó Havasvölgyi"
Дата:
Jim,
 
Maybe odd, but simpler to optimize this way.
 
Your idea would be also a very good optimization, there was already a discussion about that here: http://archives.postgresql.org/pgsql-performance/2006-01/msg00151.php , but that time Tom refused it because it was too expensive and rare. Maybe now he has a different opinion.
However, left join optimization is lot simpler and cheaper, and can be useful not only for O/R mappers, but for efficient vertical partitioning as Simon mentioned.
 
Best regards,
Otto 

 
2007/4/12, Jim Nasby <decibel@decibel.org>:
I agree with others that the way that query is constructed is a bit
odd, but it does bring another optimization to mind: when doing an
inner-join between a parent and child table when RI is defined
between them, if the query only refers to the child table you can
drop the parent table from the join, because each row in the child
table must have one and only one row in the parent.

Use-case: I'll often use views to make it easier to query several
related tables, but not all queries against the views need to hit
every table. IE: if a table has several status fields that have RI to
parent tables that describe what each status is, you sometimes will
query for the status description, sometimes not.

I suspect that checking to see if tables have the right unique keys
or RI would add a noticeable amount of extra work to query planning,
so we might want a GUC to disable it.

On Apr 7, 2007, at 12:45 PM, Ottó Havasvölgyi wrote:

> Sorry, I have left out the PK requirement.
> What Nicolas wrote is right, I also use an O/R mapper and
> inheritance is solved with vertical partitioning. The tables are
> connected to each other with the PK. And the mapper defines views
> for each class with left joins. The mapper generates queries based
> on these views. A high fraction of the joins would be eliminated
> almost in every query.
>
> My simple example:
>
> Class hierarchy and fields:
> Shape (ID, X, Y)
> |
> +-Circle (ID, Radius)
> |
> +-Rectangle (ID, Width, Height)
>
> The mapper creates 3 tables with the columns next to the class name.
> And it creates 3 views. One of them:
>
> RectangleView:  SELECT r."ID" as "ID", s."X" as "X", s."Y" as "Y",
> r."Width" as "Width", r."Height" as "Height" FROM "Rectangle" r
> LEFT JOIN "Shape" s ON ( r.ID=s.ID)
>
> Now if I query Rectangle object IDs, whose Width is greater than 5,
> it will generate this:
>
> SELECT "ID" FROM RectangleView WHERE "Width">5
>
> In this case I don't need to left join the Shape table, because X
> and Y columns are not used.
>
>
> The other typical situation is when I execute more complex, not-O/
> Rmapper-generated SQL commands based on these views for reporting.
> For example the average width of rectangles whose height is greater
> than 10.
> ----------------------------------------------------
>
> This optimization should be also applied to subqueries.
>
>
>
> Is this optimization relatively easy to introduce?
>
> I would gladly work on this, but unfortunately I don't know the
> codebase at all.
> I would really appreciate if someone competent implemented this
> feature in 8.4.
>
> Thank you in advance,
> Otto
>

--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com       512.569.9461 (cell)



Re: Eliminating unnecessary left joins

От
"Zeugswetter Andreas ADI SD"
Дата:
> Maybe odd, but simpler to optimize this way.
>
> Your idea would be also a very good optimization, there was
> already a discussion about that here:
> http://archives.postgresql.org/pgsql-performance/2006-01/msg00
> 151.php, but that time Tom refused it because it was too
> expensive and rare. Maybe now he has a different opinion.
> However, left join optimization is lot simpler and cheaper,
> and can be useful not only for O/R mappers, but for efficient
> vertical partitioning as Simon mentioned.

For the views use case there is a simple solution without the expensive optimization:
If you have a PK FK relationship simply rewrite the view to use a left join instead
of a join. Since there is always one row on the outer (PK) side it makes no difference to the result set.
And then the left join optimization can be used.

Andreas


Re: Eliminating unnecessary left joins

От
Stephan Szabo
Дата:
On Wed, 11 Apr 2007, Jim Nasby wrote:

> I agree with others that the way that query is constructed is a bit
> odd, but it does bring another optimization to mind: when doing an
> inner-join between a parent and child table when RI is defined
> between them, if the query only refers to the child table you can
> drop the parent table from the join, because each row in the child
> table must have one and only one row in the parent.

I don't think that's quite true without qualifications. First, I think it
needs to be an immediate constraint (and I don't remember how we handle
set constraints inside functions that might be called from a statement, so
it might need to be not deferrable). Second, I think you also need to take
care of NULLs since child rows with NULLs in the key pass the constraint
but have no rows in the parent and would get culled by the inner join.

Also, there's a possible issue that constraints do not actually guarantee
that they always hold true, merely that they hold true at particular
times. I don't know if it's possible to get a statement executed such that
it would see the table state between the action and constraint check or
if such is allowed by spec.


Re: Eliminating unnecessary left joins

От
"Robert Haas"
Дата:
I have this exact problem a lot.  There are actually cases where you can
eliminate regular joins, not just left joins.  For example:

CREATE TABLE partner (id            serial,name            varchar(40) not null,primary key (id)
);

CREATE TABLE project (id            serial,name            varchar(40) not null,partner_id        integer not null
referencesproject (id) 
);

CREATE VIEW project_view AS
SELECT p.id, p.name, p.partner_id, pp.name AS partner
FROM project p, partner pp
WHERE p.partner_id = pp.id;

If someone does a select from project_view and does not select the
partner column, the join can be eliminated, because the not null and
foreign key constraints on the partner_id column guarantee that there
will always be exactly one matching row in the project table.

If you didn't have the NOT NULL constraint on the partner_id column,
you'd have to write the view this way, as described in the original
email:

CREATE VIEW project_view AS
SELECT p.id, p.name, p.partner_id, pp.name AS partner
FROM project p LEFT JOIN partner pp ON p.partner_id = pp.id;

In this example, I just have one join, so the benefit to eliminating it
is minimal (unless the tables are very large).  But in the real
application, project_view joins the project table against six other
tables using inner joins (all against the primary keys of those other
tables) and four additional tables using left joins (also against the
primary keys of those other tables). Most queries only use a subset of
these columns - a typical query requires evaluating only about three of
the ten joins.

...Robert


Re: Eliminating unnecessary left joins

От
"Zeugswetter Andreas ADI SD"
Дата:
> I have this exact problem a lot.  There are actually cases
> where you can eliminate regular joins, not just left joins.
> For example:
>
> CREATE TABLE partner (
>     id            serial,
>     name            varchar(40) not null,
>     primary key (id)
> );
>
> CREATE TABLE project (
>     id            serial,
>     name            varchar(40) not null,
>     partner_id        integer not null references project (id)
^^^^^^^ -- I assume typo, should be partner
> );
>
> CREATE VIEW project_view AS
> SELECT p.id, p.name, p.partner_id, pp.name AS partner FROM
> project p, partner pp WHERE p.partner_id = pp.id;

Same advice to you:

1. add not null to your id's
2. CREATE VIEW project_view AS  SELECT p.id, p.name, p.partner_id, pp.name AS partner FROM   project p left outer join
partnerpp ON p.partner_id = pp.id; 
3. wait (or implement :-) the left join optimization in pg

Andreas


Re: Eliminating unnecessary left joins

От
"Ottó Havasvölgyi"
Дата:
Hi,
 
Could you Bruce please add a TODO item for this feature?
The description could look something like this:
 
Eliminate the table T from the query/subquery if the following requirements are satisfied:
1. T is left joined
2. T is referenced only in the join expression where it is left joined
3. the left join's join expression is a simple equality expression like T1.C1=T2.C2; T1!=T2 and (T==T1 or T==T2)
4. the column of T in the join exression is the primary key of T
 
----------------------------------------------------
 
I hope it is comlete.
I think this is the simplest case, so we should start with this.
 
Thanks,
Otto
 

Re: Eliminating unnecessary left joins

От
"Nicolas Barbier"
Дата:
2007/4/16, Ottó Havasvölgyi <havasvolgyi.otto@gmail.com>:

> Eliminate the table T from the query/subquery if the following requirements
> are satisfied:
> 1. T is left joined
> 2. T is referenced only in the join expression where it is left joined
> 3. the left join's join expression is a simple equality expression like
> T1.C1=T2.C2; T1!=T2 and (T==T1 or T==T2)
> 4. the column of T in the join exression is the primary key of T

Condition 4 should be: the column of T in the join expression is a key
of T (i.e. it doesn't need to be the PK, a UNIQUE constraint would be
enough).

This process can be done recursively (implementation doesn't have to
be recursive, of course), to eliminate whole sub-trees of the join
tree.

Nicolas

--
Nicolas Barbier
http://www.gnu.org/philosophy/no-word-attachments.html


Re: Eliminating unnecessary left joins

От
"Ottó Havasvölgyi"
Дата:
But then what about the null values? Perhaps unique + notnull is better?
 
Otto

 
2007/4/20, Nicolas Barbier <nicolas.barbier@gmail.com>:
2007/4/16, Ottó Havasvölgyi <havasvolgyi.otto@gmail.com >:

> Eliminate the table T from the query/subquery if the following requirements
> are satisfied:
> 1. T is left joined
> 2. T is referenced only in the join expression where it is left joined
> 3. the left join's join expression is a simple equality expression like
> T1.C1=T2.C2; T1!=T2 and (T==T1 or T==T2)
> 4. the column of T in the join exression is the primary key of T

Condition 4 should be: the column of T in the join expression is a key
of T (i.e. it doesn't need to be the PK, a UNIQUE constraint would be
enough).

This process can be done recursively (implementation doesn't have to
be recursive, of course), to eliminate whole sub-trees of the join
tree.

Nicolas

--
Nicolas Barbier
http://www.gnu.org/philosophy/no-word-attachments.html