Planner producing 100% duplicate subplans when unneeded

Поиск
Список
Период
Сортировка
От Daniel Grace
Тема Planner producing 100% duplicate subplans when unneeded
Дата
Msg-id AANLkTikU6k1BgcMBcgkgOGNbSxq-juHtKMPRJP5BxS9g@mail.gmail.com
обсуждение исходный текст
Ответы Re: Planner producing 100% duplicate subplans when unneeded  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-bugs
This is all on Postgres 9.0.0:

I'm working on the definition for a view that, as part of its output,
includes three columns that each contain a sum of values in a table
that are in one of a few different states -- essentially: for each
parent, give me the sum of children in the foo state, a second sum of
the children in the bar state, and a third sum of the children in the
baz state.

Thinking I'd be clever and avoid multiple almost-identical subqueries
to for each SUM, I decided to do it in a single subquery that returns
an array of all 3 sums and then split it out into its component parts
higher up (see the example below if you don't understand what I mean).
 Unfortunately, the query planner doesn't quite handle this case: For
each reference to the subquery value, it duplicates the subquery and
thus its plan:

CREATE TABLE parent (
    id INT PRIMARY KEY
);
INSERT INTO parent(id) SELECT GENERATE_SERIES(1, 1000);

CREATE TABLE child (
    parent_id INT,
    v1 INT,
    v2 INT,
    PRIMARY KEY(parent_id, v1)
);

INSERT INTO child(parent_id, v1, v2)
SELECT p.id, v1.id, RANDOM() * 500
FROM
    GENERATE_SERIES(1, 1000) AS p(id)
    CROSS JOIN GENERATE_SERIES(1, 20) AS v1(id)
;

EXPLAIN SELECT *, child_data[1] AS foo, child_data[2] AS bar,
child_data[3] AS baz FROM (
    SELECT p.id,
        (
            SELECT
                ARRAY[
                    SUM(c.v2),
                    SUM(CASE WHEN c.v1 > 15 THEN c.v2 ELSE 0 END),
                    SUM(CASE WHEN c.v1 >  5 THEN c.v2 ELSE 0 END)
                ]
            FROM child AS c
            WHERE c.parent_id=p.id
        ) AS child_data
    FROM parent AS p
) AS t;

produces:

Seq Scan on wings_sky.parent p (cost=0.00..161113.12 rows=1000 width=4)
Output: p.id, (SubPlan 1), ((SubPlan 2))[1], ((SubPlan 3))[2], ((SubPlan 4))[3]
SubPlan 1
-> Aggregate (cost=40.26..40.27 rows=1 width=8)
Output: ARRAY[sum(c.v2), sum(CASE WHEN (c.v1 > 15) THEN c.v2 ELSE 0
END), sum(CASE WHEN (c.v1 > 5) THEN c.v2 ELSE 0 END)]
-> Index Scan using child_pkey on wings_sky.child c (cost=0.00..40.10
rows=20 width=8)
Output: c.parent_id, c.v1, c.v2
Index Cond: (c.parent_id = $0)
SubPlan 2
-> Aggregate (cost=40.26..40.27 rows=1 width=8)
Output: ARRAY[sum(c.v2), sum(CASE WHEN (c.v1 > 15) THEN c.v2 ELSE 0
END), sum(CASE WHEN (c.v1 > 5) THEN c.v2 ELSE 0 END)]
-> Index Scan using child_pkey on wings_sky.child c (cost=0.00..40.10
rows=20 width=8)
Output: c.parent_id, c.v1, c.v2
Index Cond: (c.parent_id = $0)
SubPlan 3
-> Aggregate (cost=40.26..40.27 rows=1 width=8)
Output: ARRAY[sum(c.v2), sum(CASE WHEN (c.v1 > 15) THEN c.v2 ELSE 0
END), sum(CASE WHEN (c.v1 > 5) THEN c.v2 ELSE 0 END)]
-> Index Scan using child_pkey on wings_sky.child c (cost=0.00..40.10
rows=20 width=8)
Output: c.parent_id, c.v1, c.v2
Index Cond: (c.parent_id = $0)
SubPlan 4
-> Aggregate (cost=40.26..40.27 rows=1 width=8)
Output: ARRAY[sum(c.v2), sum(CASE WHEN (c.v1 > 15) THEN c.v2 ELSE 0
END), sum(CASE WHEN (c.v1 > 5) THEN c.v2 ELSE 0 END)]
-> Index Scan using child_pkey on wings_sky.child c (cost=0.00..40.10
rows=20 width=8)
Output: c.parent_id, c.v1, c.v2
Index Cond: (c.parent_id = $0)

There's another method of writing this that is more efficient in this
PARTICULAR case:


EXPLAIN VERBOSE SELECT parent.id, t.foo, t.bar, t.baz
FROM
    parent
    INNER JOIN (
        SELECT
            child.parent_id,
            SUM(child.v2) AS foo,
            SUM(CASE WHEN child.v1 > 15 THEN child.v2 ELSE 0 END) AS bar,
            SUM(CASE WHEN child.v1 >  5 THEN child.v2 ELSE 0 END) AS baz
        FROM
            child
        GROUP BY
            child.parent_id
    ) AS t ON t.parent_id=parent.id

Hash Join (cost=547.12..556.62 rows=200 width=28)
Output: parent.id, (sum(child.v2)), (sum(CASE WHEN (child.v1 > 15)
THEN child.v2 ELSE 0 END)), (sum(CASE WHEN (child.v1 > 5) THEN
child.v2 ELSE 0 END))
Hash Cond: (child.parent_id = parent.id)
-> HashAggregate (cost=483.12..487.62 rows=200 width=12)
Output: child.parent_id, sum(child.v2), sum(CASE WHEN (child.v1 > 15)
THEN child.v2 ELSE 0 END), sum(CASE WHEN (child.v1 > 5) THEN child.v2
ELSE 0 END)
-> Seq Scan on wings_sky.child (cost=0.00..291.06 rows=19206 width=12)
Output: child.parent_id, child.v1, child.v2
-> Hash (cost=34.00..34.00 rows=2400 width=4)
Output: parent.id
-> Seq Scan on wings_sky.parent (cost=0.00..34.00 rows=2400 width=4)
Output: parent.id


However, the second plan performs poorly in cases in the case where
parent_id can be more than one possible value (i.e. there is no WHERE
parent_id=1 clause) -- it takes nearly as long in that instance as it
does with no WHERE clause altogether.  Both plans run the same speed
with one parent_id.  The first plan starts losing speed gradually as
the number of parents increase; the second plan is either
all-or-nothing.



In the first case, it seems inefficient to duplicate the subplan for
each reference -- I'd think the (corrected) plan should look something
like this:

Seq Scan on wings_sky.parent p (cost=0.00..161113.12 rows=1000 width=4)
Output: p.id, (SubPlan 1), ((SubPlan 1))[1], ((SubPlan 1))[2], ((SubPlan 1))[3]
SubPlan 1
-> Aggregate (cost=40.26..40.27 rows=1 width=8)
Output: ARRAY[sum(c.v2), sum(CASE WHEN (c.v1 > 15) THEN c.v2 ELSE 0
END), sum(CASE WHEN (c.v1 > 5) THEN c.v2 ELSE 0 END)]
-> Index Scan using child_pkey on wings_sky.child c (cost=0.00..40.10
rows=20 width=8)
Output: c.parent_id, c.v1, c.v2
Index Cond: (c.parent_id = $0)

Is there any chance this might be looked at in a future release?

--
Daniel Grace
AGE, LLC
System Administrator and Software Developer
dgrace@wingsnw.com // www.wingsnw.com

В списке pgsql-bugs по дате отправления:

Предыдущее
От: "Llewellyn, Mark (ADP DSI)"
Дата:
Сообщение: Re: BUG #5650: Postgres service showing as stopped when in fact it is running
Следующее
От: "Alexey Parshin"
Дата:
Сообщение: BUG #5680: Failure to start: too many private dirs demanded