Обсуждение: Query slow again after adding an `OR` operation (was: Slow PostgreSQL10.6 query)

Поиск
Список
Период
Сортировка

Query slow again after adding an `OR` operation (was: Slow PostgreSQL10.6 query)

От
Behrang Saeedzadeh
Дата:

The query (generated by Hibernate) got a bit more complex and performance degraded again. I have uploaded all the details here (with changed table names, etc.): https://github.com/behrangsa/slow-query

In short, the new query is:

```
SELECT inv.id           AS i_id,      inv.invoice_date AS inv_d,      inv.invoice_xid  AS inv_xid,      inv.invoice_type AS inv_type,      brs.branch_id    AS br_id,      cinvs.company_id AS c_id
FROM invoices inv        LEFT OUTER JOIN branch_invoices brs ON inv.id = brs.invoice_id        LEFT OUTER JOIN company_invoices cinvs ON inv.id = cinvs.invoice_id        INNER JOIN branches br ON brs.branch_id = br.id
WHERE brs.branch_id IN (SELECT br1.id                       FROM branches br1                                INNER JOIN access_rights ar1 ON br1.id = ar1.branch_id                                INNER JOIN users usr1 ON ar1.user_id = usr1.id                                INNER JOIN groups grp1 ON ar1.group_id = grp1.id                                INNER JOIN group_permissions gpr1 ON grp1.id = gpr1.group_id                                INNER JOIN permissions prm1 ON gpr1.permission_id = prm1.id                       WHERE usr1.id = 1636                         AND prm1.code = 'C2'                         AND ar1.access_type = 'T1')  OR brs.branch_id IN (SELECT br3.id                       FROM companies cmp                                INNER JOIN branches br3 ON cmp.id = br3.company_id                                INNER JOIN access_rights ar2 ON cmp.id = ar2.company_id                                INNER JOIN users usr2 ON ar2.user_id = usr2.id                                INNER JOIN groups g2 ON ar2.group_id = g2.id                                INNER JOIN group_permissions gpr2 ON g2.id = gpr2.group_id                                INNER JOIN permissions prm2 ON gpr2.permission_id = prm2.id                       WHERE usr2.id = 1636                         AND prm2.code = 'C2'                         AND ar2.access_type = 'T1'                       ORDER BY br3.id)
ORDER BY inv.invoice_date DESC, br.name ASC
LIMIT 12;
```

I tried tweaking join_collapse_limit and from_collapse_limit (I tried up to 30) but couldn't improve the performance (I also increased geqo_threshold to join_collapse_limit + 2).

Any chance of making PostgreSQL 10.6 choose a better plan without rewriting the Hibernate generated query?

Best regards,
Behrang Saeedzadeh

Re: Query slow again after adding an `OR` operation (was: SlowPostgreSQL 10.6 query)

От
Michael Lewis
Дата:
Are you prefixing this auto generated query with set join_collapse_limit = 30, or are you changing the default and reloading config? That is, can you impact ONLY this query with these config changes? I wouldn't assume so, so any hack/query hint like turning off hashjoins (which seem to be chosen instead of nested loop because of bad estimates for this plan) will likely have serious impact on other queries.

I know you don't have the flexibility to change the query to be one that follows best practices, but it is a bit disappointing that your ORM generates that OR condition instead of something like brs.branch_id IN (query1 union all query2). The join to branch_invoices also must function as inner join rather than left, but I am not sure if declaring a join type as left impacts the performance significantly.

When performance matters, there's nothing quite like being able to customize the query directly.

Re: Query slow again after adding an `OR` operation (was: SlowPostgreSQL 10.6 query)

От
David Rowley
Дата:
On Thu, 10 Oct 2019 at 01:07, Behrang Saeedzadeh <behrangsa@gmail.com> wrote:
>
> This is a follow up to
https://www.postgresql.org/message-id/flat/CAERAJ%2B-1buiJ%2B_JWEo0a9Ao-CVMWpgp%3DEnFx1dJtnB3WmMi2zQ%40mail.gmail.com
>
> The query (generated by Hibernate) got a bit more complex and performance degraded again. I have uploaded all the
detailshere (with changed table names, etc.): https://github.com/behrangsa/slow-query
 
>
> In short, the new query is:

The query mostly appears slow due to the "Rows Removed By Filter" in
the OR condition. The only way to get around not scanning the entire
branch_invoices table would be to somehow write the way in such a way
that allows it to go on the inner side of the join.

You could do that if you ensure there's an index on branch_invoices
(branch_id) and format the query as:

SELECT inv.id           AS i_id,
       inv.invoice_date AS inv_d,
       inv.invoice_xid  AS inv_xid,
       inv.invoice_type AS inv_type,
       brs.branch_id    AS br_id,
       cinvs.company_id AS c_id
FROM invoices inv
         LEFT OUTER JOIN branch_invoices brs ON inv.id = brs.invoice_id
         LEFT OUTER JOIN company_invoices cinvs ON inv.id = cinvs.invoice_id
         INNER JOIN branches br ON brs.branch_id = br.id
WHERE brs.branch_id IN (SELECT br1.id
                        FROM branches br1
                                 INNER JOIN access_rights ar1 ON
br1.id = ar1.branch_id
                                 INNER JOIN users usr1 ON ar1.user_id = usr1.id
                                 INNER JOIN groups grp1 ON
ar1.group_id = grp1.id
                                 INNER JOIN group_permissions gpr1 ON
grp1.id = gpr1.group_id
                                 INNER JOIN permissions prm1 ON
gpr1.permission_id = prm1.id
                        WHERE usr1.id = 1636
                          AND prm1.code = 'C2'
                          AND ar1.access_type = 'T1')
UNION ALL
SELECT br3.id
                        FROM companies cmp
                                 INNER JOIN branches br3 ON cmp.id =
br3.company_id
                                 INNER JOIN access_rights ar2 ON
cmp.id = ar2.company_id
                                 INNER JOIN users usr2 ON ar2.user_id = usr2.id
                                 INNER JOIN groups g2 ON ar2.group_id = g2.id
                                 INNER JOIN group_permissions gpr2 ON
g2.id = gpr2.group_id
                                 INNER JOIN permissions prm2 ON
gpr2.permission_id = prm2.id
                        WHERE usr2.id = 1636
                          AND prm2.code = 'C2'
                          AND ar2.access_type = 'T1')
ORDER BY inv.invoice_date DESC, br.name ASC
LIMIT 12;

The planner may then choose to pullup the subquery and uniquify it
then put it on the outside of a nested loop join then lookup the
branch_invoices record using the index on branch_id.  I think this is
quite a likely plan since the planner estimates there's only going to
be 1 row from each of the subqueries.

Also note, that the LEFT JOIN you have to branch_invoices is not
really a left join since you're insisting that the branch_id must be
in the first or 2nd sub-plan. There's no room for it to be NULL. The
planner will just convert that to an INNER JOIN with the above query
since that'll give it the flexibility to put the subquery in the IN
clause on the outside of the join (after having uniquified it).
You'll need to decide what you actually want the behaviour to be here.
If you do need those NULL rows then you'd better move your WHERE quals
down into the join condition for branch_invoices table. I'd suggest
testing with some mock-up data if you're uncertain of what I mean.

If you find that is faster and you can't rewrite the query due to it
having been generated by Hibernate, then that sounds like a problem
with Hibernate.  PostgreSQL does not currently attempt to do any
rewrites which convert OR clauses to use UNION or UNION ALL. No amount
of tweaking the planner settings is going to change that fact.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services