Обсуждение: Place of subselect

Поиск

Список

Период

Сортировка

Place of subselect

От

"Guillaume Bog"

Дата:

25 ноября 2008 г., 06:35:02

Hi dear Postgres users.

I have performance issues if I do the following pseudo-query:

SELECT a, b, (SELECT count(*) FROM t2 WHERE something) AS c
FROM t1 ORDER BY a LIMIT 10;

After some tests, it seems to me that the subquery on t2 is computed for all rows of t1. As I don't "ORDER BY c", there is no need to compute c for every row. I know I can (or should ?) work with joins or with a subquery in the from clause, but I'd like to make sure there is no other way before changing my sqls.

A subjective reason for me to prefer subqueries in fields instead of joins of sub tables is that, when it only relates to the text displayed, it is easyer to read and to change, and I mess less with agregate functions.

Thanks.

Re: Place of subselect

От

"A. Kretschmer"

Дата:

25 ноября 2008 г., 06:56:31

am  Tue, dem 25.11.2008, um 15:34:57 +0800 mailte Guillaume Bog folgendes:
> Hi dear Postgres users.
>
> I have performance issues if I do the following pseudo-query:
>
> SELECT a, b, (SELECT count(*) FROM t2 WHERE something) AS c
> FROM t1 ORDER BY a LIMIT 10;
>
> After some tests, it seems to me that the subquery on t2 is computed for all
> rows of t1. As I don't "ORDER BY c", there is no need to compute c for every
> row. I know I can (or should ?) work with joins or with a subquery in the from
> clause, but I'd like to make sure there is no other way before changing my
> sqls.

Please check your presumption with explain analyse <your query>.

For example:

test=*# explain analyse select t1.*, (select count(1) from t2) from t1 order by 1 limit 5;
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Limit  (cost=186.54..186.55 rows=5 width=4) (actual time=0.087..0.104 rows=3 loops=1)
   InitPlan
     ->  Aggregate  (cost=36.75..36.76 rows=1 width=0) (actual time=0.022..0.024 rows=1 loops=1)
           ->  Seq Scan on t2  (cost=0.00..31.40 rows=2140 width=0) (actual time=0.004..0.008 rows=1 loops=1)
   ->  Sort  (cost=149.78..155.13 rows=2140 width=4) (actual time=0.082..0.088 rows=3 loops=1)
         Sort Key: i
         ->  Seq Scan on t1  (cost=0.00..31.40 rows=2140 width=4) (actual time=0.046..0.056 rows=3 loops=1)
 Total runtime: 0.197 ms
(8 rows)


Both tables executes only one scan.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: Place of subselect

От

"Guillaume Bog"

Дата:

25 ноября 2008 г., 07:44:42

On Tue, Nov 25, 2008 at 15:56, A. Kretschmer <andreas.kretschmer@schollglas.com> wrote:

am Tue, dem 25.11.2008, um 15:34:57 +0800 mailte Guillaume Bog folgendes:
> Hi dear Postgres users.
>
> I have performance issues if I do the following pseudo-query:
>
> SELECT a, b, (SELECT count(*) FROM t2 WHERE something) AS c
> FROM t1 ORDER BY a LIMIT 10;
>
> After some tests, it seems to me that the subquery on t2 is computed for all
> rows of t1. As I don't "ORDER BY c", there is no need to compute c for every
> row. I know I can (or should ?) work with joins or with a subquery in the from
> clause, but I'd like to make sure there is no other way before changing my
> sqls.

Please check your presumption with explain analyse <your query>.

For example:

test=*# explain analyse select t1.*, (select count(1) from t2) from t1 order by 1 limit 5;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Limit (cost=186.54..186.55 rows=5 width=4) (actual time=0.087..0.104 rows=3 loops=1)
InitPlan
-> Aggregate (cost=36.75..36.76 rows=1 width=0) (actual time=0.022..0.024 rows=1 loops=1)
-> Seq Scan on t2 (cost=0.00..31.40 rows=2140 width=0) (actual time=0.004..0.008 rows=1 loops=1)
-> Sort (cost=149.78..155.13 rows=2140 width=4) (actual time=0.082..0.088 rows=3 loops=1)
Sort Key: i
-> Seq Scan on t1 (cost=0.00..31.40 rows=2140 width=4) (actual time=0.046..0.056 rows=3 loops=1)
Total runtime: 0.197 ms
(8 rows)

Both tables executes only one scan.

It seems that you are right. By further testing I found that a WHERE condition in the subquery was making the query hundred times slower. As I'm not very familiar with explain analyze, I paste them below. Why do I have "merge join" and "merge cond" in one case and "subplan" in the other case? Note that "u_xref_ug_id" is a reference and therefore b-tree indexed.

vf_cn2fr=# EXPLAIN ANALYZE SELECT ug_id AS id, ug_en AS name, ug_type AS type,
(SELECT count(*) FROM forms_groups JOIN users ON fg_xref_u_id = u_id WHERE u_xref_ug_id = ug_id) as groupes
FROM users_groups ORDER BY "ug_type","ug_en" LIMIT 5;
                                                                                 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=513022.74..513022.76 rows=5 width=26) (actual time=31172.258..31172.271 rows=5 loops=1)
   -> Sort (cost=513022.74..513023.74 rows=397 width=26) (actual time=31172.255..31172.259 rows=5 loops=1)
         Sort Key: ug_type, ug_en
         -> Seq Scan on users_groups (cost=0.00..513005.61 rows=397 width=26) (actual time=83.273..31167.266 rows=397 loops=1)
               SubPlan
                 -> Aggregate (cost=1292.18..1292.19 rows=1 width=0) (actual time=78.498..78.499 rows=1 loops=397)
                       -> Hash Join (cost=146.18..1290.52 rows=663 width=0) (actual time=30.023..78.389 rows=102 loops=397)
                             Hash Cond: (forms_groups.fg_xref_u_id = users.u_id)
                             -> Seq Scan on forms_groups (cost=0.00..985.88 rows=40488 width=4) (actual time=0.005..42.046 rows=40490 loops=372)
                             -> Hash (cost=137.10..137.10 rows=726 width=4) (actual time=0.306..0.306 rows=112 loops=397)
                                   -> Index Scan using users_u_xref_ug_id_idx on users (cost=0.00..137.10 rows=726 width=4) (actual time=0.013..0.169 rows=112 loops=397)
                                         Index Cond: (u_xref_ug_id = $0)
Total runtime: 31172.363 ms
(13 rows)

vf_cn2fr=# EXPLAIN ANALYZE SELECT ug_id AS id, ug_en AS name, ug_type AS type,
(SELECT count(*) FROM forms_groups JOIN users ON fg_xref_u_id = u_id) as groupes
FROM users_groups ORDER BY "ug_type","ug_en" LIMIT 5;
                                                                                 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4530.86..4530.87 rows=5 width=26) (actual time=325.353..325.365 rows=5 loops=1)
   InitPlan
     -> Aggregate (cost=4505.74..4505.75 rows=1 width=0) (actual time=320.673..320.674 rows=1 loops=1)
           -> Merge Join (cost=0.00..4404.52 rows=40488 width=0) (actual time=0.176..281.602 rows=40490 loops=1)
                 Merge Cond: (forms_groups.fg_xref_u_id = users.u_id)
                 -> Index Scan using forms_groups_fg_xref_u_id_idx on forms_groups (cost=0.00..1576.38 rows=40488 width=4) (actual time=0.020..64.556 rows=40490 loops=1)
                 -> Index Scan using users_pkey on users (cost=0.00..2212.00 rows=44313 width=4) (actual time=0.015..73.373 rows=47689 loops=1)
   -> Sort (cost=25.11..26.10 rows=397 width=26) (actual time=325.350..325.355 rows=5 loops=1)
         Sort Key: ug_type, ug_en
         -> Seq Scan on users_groups (cost=0.00..7.97 rows=397 width=26) (actual time=320.693..321.192 rows=397 loops=1)
Total runtime: 325.457 ms
(11 rows)

Andreas
--
Andreas Kretschmer
Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Place of subselect

От

"A. Kretschmer"

Дата:

25 ноября 2008 г., 08:02:43

am  Tue, dem 25.11.2008, um 16:44:34 +0800 mailte Guillaume Bog folgendes:
> It seems that you are right. By further testing I found that a WHERE condition
> in the subquery was making the query hundred times slower. As I'm not very
> familiar with explain analyze, I paste them below. Why do I have "merge join"
> and "merge cond" in one case and "subplan" in the other case? Note that
> "u_xref_ug_id" is a reference and therefore b-tree indexed.
>
>
> vf_cn2fr=# EXPLAIN ANALYZE SELECT ug_id AS id, ug_en AS name, ug_type AS type,
> (SELECT count(*) FROM forms_groups JOIN users ON fg_xref_u_id = u_id WHERE
> u_xref_ug_id = ug_id) as groupes
>  FROM users_groups ORDER BY "ug_type","ug_en" LIMIT 5;


The were-condition contains both inner and outer tables, because of that
the subselect performs for every record of the outer table.


Maybe someone else can tell you an advice how to rewrite the query for
better performance.



Regards, Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: Place of subselect

От

Tom Lane

Дата:

25 ноября 2008 г., 12:19:46

"Guillaume Bog" <guibog@gmail.com> writes:
> I have performance issues if I do the following pseudo-query:

> SELECT a, b, (SELECT count(*) FROM t2 WHERE something) AS c
> FROM t1 ORDER BY a LIMIT 10;

> After some tests, it seems to me that the subquery on t2 is computed for all
> rows of t1.

Yeah. The SQL specification says that ORDER BY happens after computing
the SELECT output-list.  In some cases that'll get optimized but you
can't count on it.

You can probably improve matters by using a sub-select:

SELECT a, b, (SELECT count(*) FROM t2 WHERE something) AS c
FROM ( SELECT a, b, ... FROM t1 ORDER BY a LIMIT 10 ) ss;

            regards, tom lane

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Place of subselect

Place of subselect

Re: Place of subselect

Re: Place of subselect

Re: Place of subselect

Re: Place of subselect