Обсуждение: 7.3 vs 7.2 - different query plan, bad performance

Поиск
Список
Период
Сортировка

7.3 vs 7.2 - different query plan, bad performance

От
Ryszard Lach
Дата:
Hi.

I have a problem with performance after upgrading from 7.2 to 7.3. Let's
see two simple tables:

CREATE TABLE a (
    id integer,
    parent_id integer
);

with 1632 records, and

CREATE TABLE b (
    id integer
);

with 5281 records, and a litle more complex view:

CREATE VIEW v_c AS
    SELECT t1.id,
    (SELECT count(*) AS count FROM a t3 WHERE (t3.parent_id = t2.id)) AS children_count
    FROM (b t1 LEFT JOIN a t2 ON ((t1.id = t2.id)));


Now see the query run under explain analyze:

Postgresql 7.2:

siaco=# explain analyze select count(*) from v_c;
NOTICE:  QUERY PLAN:

Aggregate  (cost=219.66..219.66 rows=1 width=8) (actual time=162.75..162.75 rows=1 loops=1)
  ->  Merge Join  (cost=139.66..207.16 rows=5000 width=8) (actual time=95.07..151.46 rows=5281 loops=1)
        ->  Sort  (cost=69.83..69.83 rows=1000 width=4) (actual time=76.18..82.37 rows=5281 loops=1)
              ->  Seq Scan on b t1  (cost=0.00..20.00 rows=1000 width=4) (actual time=0.02..22.02 rows=5281 loops=1)
        ->  Sort  (cost=69.83..69.83 rows=1000 width=4) (actual time=18.86..25.38 rows=5281 loops=1)
              ->  Seq Scan on a t2  (cost=0.00..20.00 rows=1000 width=4) (actual time=0.02..6.70 rows=1632 loops=1)
Total runtime: 164.34 msec
EXPLAIN


Postgresql 7.3:

siaco=# explain analyze  select count(*) from v_c;
                                                        QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=224.66..224.66 rows=1 width=8) (actual time=5691.77..5691.77 rows=1 loops=1)
   ->  Subquery Scan v_c  (cost=139.66..212.16 rows=5000 width=8) (actual time=24.72..5687.77 rows=5281 loops=1)
         ->  Merge Join  (cost=139.66..212.16 rows=5000 width=8) (actual time=24.72..5681.55 rows=5281 loops=1)
               Merge Cond: ("outer".id = "inner".id)
               ->  Sort  (cost=69.83..72.33 rows=1000 width=4) (actual time=18.82..21.09 rows=5281 loops=1)
                     Sort Key: t1.id
                     ->  Seq Scan on b t1  (cost=0.00..20.00 rows=1000 width=4) (actual time=0.01..7.28 rows=5281
loops=1)
               ->  Sort  (cost=69.83..72.33 rows=1000 width=4) (actual time=4.74..7.15 rows=5281 loops=1)
                     Sort Key: t2.id
                     ->  Seq Scan on a t2  (cost=0.00..20.00 rows=1000 width=4) (actual time=0.02..2.13 rows=1632
loops=1)
               SubPlan
                 ->  Aggregate  (cost=22.51..22.51 rows=1 width=0) (actual time=1.07..1.07 rows=1 loops=5281)
                       ->  Seq Scan on a t3  (cost=0.00..22.50 rows=5 width=0) (actual time=0.80..1.06 rows=1
loops=5281)
                             Filter: (parent_id = $0)
 Total runtime: 5693.62 msec
(15 rows)



I can't understand where comes the big difference in query plan from, and
(that's more important) - how to force postgres 7.3 to execute it more
efficient?

Notice, that both databases on both machines are identical and machine with
postgres 7.3 is even faster than the other one.

Richard.

--
"First they ignore you. Then they laugh at you. Then they
fight you. Then you win." - Mohandas Gandhi.

Re: 7.3 vs 7.2 - different query plan, bad performance

От
Bruno Wolff III
Дата:
On Fri, Jun 13, 2003 at 20:45:06 +0200,
  Ryszard Lach <siaco@autograf.pl> wrote:

> I can't understand where comes the big difference in query plan from, and
> (that's more important) - how to force postgres 7.3 to execute it more
> efficient?

I am guessing that your are really using 7.3.x and not 7.3. There was
a bug in 7.3 that was fixed in 7.3.1 or 7.3.2 with subselects. However
this fix was made with safety in mind (as it was a point release)
and resulted in some queries running slower. A complete fix was made for
7.4. To test to see if this is really the problem, you could try a 7.4
snapshot or 7.3 to see if you get improved plans.

Re: 7.3 vs 7.2 - different query plan, bad performance

От
Manfred Koizar
Дата:
On Fri, 13 Jun 2003 20:45:06 +0200, Ryszard Lach <siaco@autograf.pl>
wrote:
>I have a problem with performance after upgrading from 7.2 to 7.3.

Try
    VACUUM ANALYSE;

and then re-run your query.  If it is still slow, post the new EXPLAIN
ANALYSE output here.

Servus
 Manfred

Re: 7.3 vs 7.2 - different query plan, bad performance

От
siaco@allegro.pl
Дата:
On Sun, Jun 15, 2003 at 09:48:08PM +0200, Manfred Koizar wrote:
> On Fri, 13 Jun 2003 20:45:06 +0200, Ryszard Lach <siaco@autograf.pl>
> wrote:
> >I have a problem with performance after upgrading from 7.2 to 7.3.
>
> Try
>     VACUUM ANALYSE;
>
> and then re-run your query.  If it is still slow, post the new EXPLAIN
> ANALYSE output here.
>

Hm. I've tried it too. I don't see a big difference:

siaco=# explain analyze  select count(*) from v_c;
                                                        QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=210.83..210.83 rows=1 width=8) (actual time=5418.09..5418.09 rows=1 loops=1)
   ->  Subquery Scan v_c  (cost=28.40..197.63 rows=5281 width=8) (actual time=4.59..5414.13 rows=5281 loops=1)
         ->  Hash Join  (cost=28.40..197.63 rows=5281 width=8) (actual time=4.58..5407.73 rows=5281 loops=1)
               Hash Cond: ("outer".id = "inner".id)
               ->  Seq Scan on b t1  (cost=0.00..76.81 rows=5281 width=4) (actual time=0.01..9.68 rows=5281 loops=1)
               ->  Hash  (cost=24.32..24.32 rows=1632 width=4) (actual time=3.29..3.29 rows=0 loops=1)
                     ->  Seq Scan on a t2  (cost=0.00..24.32 rows=1632 width=4) (actual time=0.01..1.88 rows=1632
loops=1)
               SubPlan
                 ->  Aggregate  (cost=28.41..28.41 rows=1 width=0) (actual time=1.02..1.02 rows=1 loops=5281)
                       ->  Seq Scan on a t3  (cost=0.00..28.40 rows=3 width=0) (actual time=0.76..1.01 rows=1
loops=5281)
                             Filter: (parent_id = $0)
 Total runtime: 5433.65 msec


--
"First they ignore you. Then they laugh at you. Then they
fight you. Then you win." - Mohandas Gandhi.

Re: 7.3 vs 7.2 - different query plan, bad performance

От
Manfred Koizar
Дата:
On Mon, 16 Jun 2003 08:38:50 +0200, siaco@allegro.pl wrote:
>[After VACUUM ANALYSE ...] I don't see a big difference:
>
>siaco=# explain analyze  select count(*) from v_c;
>                                                        QUERY PLAN

>---------------------------------------------------------------------------------------------------------------------------
> Aggregate  (cost=210.83..210.83 rows=1 width=8) (actual time=5418.09..5418.09 rows=1 loops=1)
>   ->  Subquery Scan v_c  (cost=28.40..197.63 rows=5281 width=8) (actual time=4.59..5414.13 rows=5281 loops=1)
>         ->  Hash Join  (cost=28.40..197.63 rows=5281 width=8) (actual time=4.58..5407.73 rows=5281 loops=1)
>               Hash Cond: ("outer".id = "inner".id)
>               ->  Seq Scan on b t1  (cost=0.00..76.81 rows=5281 width=4) (actual time=0.01..9.68 rows=5281 loops=1)
>               ->  Hash  (cost=24.32..24.32 rows=1632 width=4) (actual time=3.29..3.29 rows=0 loops=1)
>                     ->  Seq Scan on a t2  (cost=0.00..24.32 rows=1632 width=4) (actual time=0.01..1.88 rows=1632
loops=1)
>               SubPlan
>                 ->  Aggregate  (cost=28.41..28.41 rows=1 width=0) (actual time=1.02..1.02 rows=1 loops=5281)
>                       ->  Seq Scan on a t3  (cost=0.00..28.40 rows=3 width=0) (actual time=0.76..1.01 rows=1
loops=5281)
>                             Filter: (parent_id = $0)
> Total runtime: 5433.65 msec

Ok, now we have something to work on.

.)  I guess you are not really interested in

    SELECT count(*) FROM v_c;

If you were, you would simply

    SELECT count(*) from b;

Try

    EXPLAIN ANALYSE SELECT * FROM v_c;

and you will see that 7.2 produces a plan that is almost equal to that
produced by 7.3.

.)  Without any index a seq scan is the best you can get.  A scan of a
takes only 1 ms, but doing it 5000 times gives 5 seconds.  Try

    CREATE INDEX a_parent ON a(parent_id);

.)  Wouldn't

CREATE VIEW v_c AS
SELECT t1.id, count(t3.id) AS children_count
  FROM (b t1 LEFT JOIN a t2 ON (t1.id = t2.id))
       LEFT JOIN a t3 ON (t3.parent_id = t2.id)
 GROUP BY t1.id;

give the same results as your view definition with the subselect?  And
under some assumptions about your data even

CREATE VIEW v_c AS
SELECT b.id, count(a.id) AS children_count
  FROM b
       LEFT JOIN a ON (a.parent_id = b.id)
 GROUP BY b.id;

might work.  But I think I don't understand your requirements.  Why
are you not interested in the children_count for an id that doesn't
have a parent itself?

.)  To answer your original question:  The difference seems to be that
7.2 does not evaluate the subselect in the SELECT list, when you are
only asking for count(*).

Servus
 Manfred

Re: 7.3 vs 7.2 - different query plan, bad performance

От
Ryszard Lach
Дата:
On Mon, Jun 16, 2003 at 12:31:08PM +0200, Manfred Koizar wrote:
>
> Ok, now we have something to work on.
>
> .)  I guess you are not really interested in
>
>     SELECT count(*) FROM v_c;
>
> If you were, you would simply
>
>     SELECT count(*) from b;
>

That's right.

> Try
>
>     EXPLAIN ANALYSE SELECT * FROM v_c;
>
> and you will see that 7.2 produces a plan that is almost equal to that
> produced by 7.3.

That is not.

I'm, pasting query plan from 7.2 once again (after vacuum analyze):

siaco=# explain analyze select count(*) from v_c;
NOTICE:  QUERY PLAN:
Aggregate  (cost=213.83..213.83 rows=1 width=8) (actual time=90.43..90.43 rows=1 loops=1)
  ->  Hash Join  (cost=29.40..200.63 rows=5281 width=8) (actual time=11.14..78.48 rows=5281 loops=1)
        ->  Seq Scan on b t1  (cost=0.00..78.81 rows=5281 width=4) (actual time=0.01..26.40 rows=5281 loops=1)
        ->  Hash  (cost=25.32..25.32 rows=1632 width=4) (actual time=10.99..10.99 rows=0 loops=1)
              ->  Seq Scan on a t2  (cost=0.00..25.32 rows=1632 width=4) (actual time=0.02..6.30 rows=1632 loops=1)
Total runtime: 90.74 msec
EXPLAIN

> might work.  But I think I don't understand your requirements.  Why
> are you not interested in the children_count for an id that doesn't
> have a parent itself?

The point is, that my tables (and queries) are a 'little' bit more complicated
and I wanted to give as simple example as I could. I think that problem is that
subselects are _much_slower_ executed in 7.3 than in 7.2, just as someone
already wrote here.


> .)  To answer your original question:  The difference seems to be that
> 7.2 does not evaluate the subselect in the SELECT list, when you are
> only asking for count(*).

That looks reasonably.

Thanks for all your help,

Richard.

--
"First they ignore you. Then they laugh at you. Then they
fight you. Then you win." - Mohandas Gandhi.

Re: 7.3 vs 7.2 - different query plan, bad performance

От
Manfred Koizar
Дата:
On Mon, 16 Jun 2003 13:41:47 +0200, Ryszard Lach <siaco@autograf.pl>
wrote:
>On Mon, Jun 16, 2003 at 12:31:08PM +0200, Manfred Koizar wrote:
>>     EXPLAIN ANALYSE SELECT * FROM v_c;

>siaco=# explain analyze select count(*) from v_c;
                                ^^^^^^ ^
See the difference?  I bet if you
     EXPLAIN ANALYSE SELECT * FROM v_c;
you get a much longer runtime.

BTW, did the index on a.parent_id help?  In my test it improved
runtime from 59449.71 msec to 1203.26 msec (SELECT * with Postgres
7.2).

Servus
 Manfred

Re: 7.3 vs 7.2 - different query plan, bad performance

От
Ryszard Lach
Дата:
On Mon, Jun 16, 2003 at 03:24:52PM +0200, Manfred Koizar wrote:
> On Mon, 16 Jun 2003 13:41:47 +0200, Ryszard Lach <siaco@autograf.pl>
> wrote:
> >On Mon, Jun 16, 2003 at 12:31:08PM +0200, Manfred Koizar wrote:
> >>     EXPLAIN ANALYSE SELECT * FROM v_c;
>
> >siaco=# explain analyze select count(*) from v_c;
>                                 ^^^^^^ ^
> See the difference?  I bet if you
>      EXPLAIN ANALYSE SELECT * FROM v_c;
> you get a much longer runtime.

Yes, indeed.

> BTW, did the index on a.parent_id help?  In my test it improved
> runtime from 59449.71 msec to 1203.26 msec (SELECT * with Postgres
> 7.2).

Oh yeah... Thanks a lot once more.

Richard.

--
"First they ignore you. Then they laugh at you. Then they
fight you. Then you win." - Mohandas Gandhi.

Re: 7.3 vs 7.2 - different query plan, bad performance

От
Tom Lane
Дата:
Ryszard Lach <siaco@autograf.pl> writes:
> The point is, that my tables (and queries) are a 'little' bit more
> complicated and I wanted to give as simple example as I could. I think
> that problem is that subselects are _much_slower_ executed in 7.3 than
> in 7.2, just as someone already wrote here.

No, the problem is that 7.3 fails to notice that it doesn't really need
to execute the subselect at all.  This is the price we paid for being
sure that a post-release bug fix wouldn't break anything more serious.
There is a better fix in place for 7.4.

            regards, tom lane