Обсуждение: Slow queries consisting inner selects and order bys & hack to speed up

Поиск
Список
Период
Сортировка

Slow queries consisting inner selects and order bys & hack to speed up

От
Ümit Öztosun
Дата:
Hello,

Our application uses typical queries similar to following (very simplified):

SELECT
 part_id,
 part_name,
 (SELECT
  SUM(amount) FROM part_movements M
  WHERE P.part_id = M.part_id
 ) as part_amount
FROM parts P
LIMIT 50

The parts table holds thousands of items. Movement table stores yearly
movement information of those items. We are presenting results to users page
by page, hence the limit case.

User can sort and filter results. When sorting is introduced, query
performance drops significantly:

SELECT
 part_id,
 part_name,
 (SELECT
  SUM(amount) FROM part_movements M
  WHERE P.part_id = M.part_id
 ) as part_amount
FROM parts P
ORDER BY part_name
LIMIT 50

Postgres seems to compute all possible rows and then sorts the
results, which nearly renders the paging meaningless. A dummy WHERE
case dramatically improves performance:

SELECT
 part_id,
 part_name,
 (SELECT
  SUM(amount) FROM part_movements M
  WHERE P.part_id = M.part_id
 ) as part_amount
FROM parts P
ORDER BY part_name
WHERE part_amount > -10000000
LIMIT 50

Is there a way to improve performance of these queries? Is it possible
to instruct Postgres to first sort the rows then compute the inner
queries? (We have simulated this by using temporary tables and two
stage queries, but this is not practical because most queries are
automatically generated).

Attached is the output of real queries and their corresponding EXPLAIN
ANALYZE outputs.

Regards,
Umit Oztosun

Вложения

Re: Slow queries consisting inner selects and order bys & hack to speed up

От
Tom Lane
Дата:
=?ISO-8859-1?Q?=DCmit_=D6ztosun?= <umit@likyabilisim.com> writes:
> Our application uses typical queries similar to following (very simplified):

> SELECT
>  part_id,
>  part_name,
>  (SELECT
>   SUM(amount) FROM part_movements M
>   WHERE P.part_id =3D M.part_id
>  ) as part_amount
> FROM parts P
> ORDER BY part_name
> LIMIT 50

> Postgres seems to compute all possible rows and then sorts the
> results, which nearly renders the paging meaningless.

Yeah.  The general rule is that sorting happens after computing the
SELECT values --- this is more or less required for cases where the
ORDER BY refers to a SELECT-list item.  You'd probably have better
results by writing a sub-select:

SELECT
 part_id,
 part_name,
 (SELECT
  SUM(amount) FROM part_movements M
  WHERE P.part_id = M.part_id
 ) as part_amount
FROM
 (SELECT part_id, part_name FROM parts P
  WHERE whatever ORDER BY whatever LIMIT n) as P;

This will do the part_movements stuff only for rows that make it out of
the sub-select.

Another approach is to make sure the ORDER BY is always on an indexed
column; in cases where the ORDER BY is done by an indexscan instead
of a sort, calculation of the unwanted SELECT-list items does not
happen.  However, this only works as long as LIMIT+OFFSET is fairly
small.

Lastly, are you on a reasonably current Postgres version (performance
complaints about anything older than 8.0 will no longer be accepted
with much grace), and are your statistics up to date?  The ANALYZE
shows rowcount estimates that seem suspiciously far off:

        ->  Seq Scan on scf_stokkart stok  (cost=0.00..142622.54 rows=25 width=503) (actual time=4.726..19324.633
rows=4947loops=1) 
              Filter: (upper((durum)::text) = 'A'::text)

This is important because, if the planner realized that the SELECT-list
items were going to be evaluated 5000 times not 25, it might well choose
a different plan.

            regards, tom lane