Обсуждение: Optimizing execution of expensive subqueries

Поиск

Список

Период

Сортировка

Optimizing execution of expensive subqueries

От

Mathieu Fenniak

Дата:

11 июля 2018 г., 20:24:47

Hi pgsql-general!

I'm currently looking at a query that is generally selecting a bunch of simple columns from a table, and also performing some subqueries to aggregate related data, and then sorting by one of the simple columns and paginating the result.

eg.

SELECT

tbl.field1, tbl.field2, tbl.field3, ...,

(SELECT SUM(Duration) FROM anothertbl WHERE anothertbl.UserId = tbl.UserId AND anothertbl.ThingyId = 1) as Thingy1Sum,

... repeat for multiply thingies ...

FROM

tbl

ORDER BY tbl.field1 LIMIT 20

I'm finding that if "tbl" contains hundreds of thousands of rows, the subqueries are being executed hundreds of thousands of times. Because of the sorting and pagination, this is appears to be unnecessary, and the result is slow performance. (PostgreSQL 9.5.9 server)

I've only found one solution so far, which is to perform the sort & pagination in a CTE, and the subqueries externally. Are there any other approaches that can be taken to optimize this and prevent the unnecessary computation?

CTE rewrite:

WITH cte AS (

SELECT

tbl.field1, tbl.field2, tbl.field3

FROM

tbl

ORDER BY tbl.field1 LIMIT 20

)

SELECT cte.*,

(SELECT SUM(Duration) FROM anothertbl WHERE anothertbl.UserId = tbl.UserId AND anothertbl.ThingyId = 1) as Thingy1Sum,

... repeat for multiply thingies ...

FROM cte;

Thanks for any thoughts you have,

Mathieu Fenniak

Re: Optimizing execution of expensive subqueries

От

Hellmuth Vargas

Дата:

12 июля 2018 г., 01:44:03

Try this way:

SELECT

tbl.field1, tbl.field2, tbl.field3, ...,

b.Thingy1Sum,

... repeat for multiply thingies ...

FROM

tbl

LATERAL JOIN (

SELECT anothertbl.UserId,SUM(Duration) as Thingy1Sum

FROM anothertbl

WHERE anothertbl.UserId = tbl.UserId AND anothertbl.ThingyId = 1

group by 1) as b on tbl.UserId=b.UserId

ORDER BY tbl.field1 LIMIT 20

El mié., 11 de jul. de 2018 a la(s) 09:25, Mathieu Fenniak (mathieu.fenniak@replicon.com) escribió:

Hi pgsql-general!

I'm currently looking at a query that is generally selecting a bunch of simple columns from a table, and also performing some subqueries to aggregate related data, and then sorting by one of the simple columns and paginating the result.

eg.

SELECT
tbl.field1, tbl.field2, tbl.field3, ...,
(SELECT SUM(Duration) FROM anothertbl WHERE anothertbl.UserId = tbl.UserId AND anothertbl.ThingyId = 1) as Thingy1Sum,
... repeat for multiply thingies ...
FROM
tbl
ORDER BY tbl.field1 LIMIT 20

I'm finding that if "tbl" contains hundreds of thousands of rows, the subqueries are being executed hundreds of thousands of times. Because of the sorting and pagination, this is appears to be unnecessary, and the result is slow performance. (PostgreSQL 9.5.9 server)

I've only found one solution so far, which is to perform the sort & pagination in a CTE, and the subqueries externally. Are there any other approaches that can be taken to optimize this and prevent the unnecessary computation?

CTE rewrite:

WITH cte AS (
SELECT
tbl.field1, tbl.field2, tbl.field3
FROM
tbl
ORDER BY tbl.field1 LIMIT 20
)
SELECT cte.*,
(SELECT SUM(Duration) FROM anothertbl WHERE anothertbl.UserId = tbl.UserId AND anothertbl.ThingyId = 1) as Thingy1Sum,
... repeat for multiply thingies ...
FROM cte;

Thanks for any thoughts you have,

Mathieu Fenniak

Cordialmente,

Ing. Hellmuth I. Vargas S.

Re: Optimizing execution of expensive subqueries

От

Mathieu Fenniak

Дата:

13 июля 2018 г., 19:34:39

Hi Hellmuth,

Thanks for the response and the new approach; a LATERAL JOIN is new to me. Unfortunately it seems to have the same performance characteristics and query plan. The aggregation in the lateral join still executes for every row (eg. if my base query has 500000 rows, I get "Aggregate (... loops=500000)" in the query plan), unaffected by the later LIMIT node in the query plan.

The CTE approach seems to be the only one I can use to improve performance right now, but requires significant application code changes.

Mathieu

On Wed, Jul 11, 2018 at 1:55 PM Hellmuth Vargas <hivs77@gmail.com> wrote:

Hi

Try this way:

SELECT
tbl.field1, tbl.field2, tbl.field3, ...,
b.Thingy1Sum,
... repeat for multiply thingies ...
FROM
tbl
LATERAL JOIN (
SELECT anothertbl.UserId,SUM(Duration) as Thingy1Sum
FROM anothertbl
WHERE anothertbl.UserId = tbl.UserId AND anothertbl.ThingyId = 1
group by 1) as b on tbl.UserId=b.UserId
ORDER BY tbl.field1 LIMIT 20

El mié., 11 de jul. de 2018 a la(s) 09:25, Mathieu Fenniak (mathieu.fenniak@replicon.com) escribió:
Hi pgsql-general!

I'm currently looking at a query that is generally selecting a bunch of simple columns from a table, and also performing some subqueries to aggregate related data, and then sorting by one of the simple columns and paginating the result.

eg.

SELECT
tbl.field1, tbl.field2, tbl.field3, ...,
(SELECT SUM(Duration) FROM anothertbl WHERE anothertbl.UserId = tbl.UserId AND anothertbl.ThingyId = 1) as Thingy1Sum,
... repeat for multiply thingies ...
FROM
tbl
ORDER BY tbl.field1 LIMIT 20

I'm finding that if "tbl" contains hundreds of thousands of rows, the subqueries are being executed hundreds of thousands of times. Because of the sorting and pagination, this is appears to be unnecessary, and the result is slow performance. (PostgreSQL 9.5.9 server)

I've only found one solution so far, which is to perform the sort & pagination in a CTE, and the subqueries externally. Are there any other approaches that can be taken to optimize this and prevent the unnecessary computation?

CTE rewrite:

WITH cte AS (
SELECT
tbl.field1, tbl.field2, tbl.field3
FROM
tbl
ORDER BY tbl.field1 LIMIT 20
)
SELECT cte.*,
(SELECT SUM(Duration) FROM anothertbl WHERE anothertbl.UserId = tbl.UserId AND anothertbl.ThingyId = 1) as Thingy1Sum,
... repeat for multiply thingies ...
FROM cte;

Thanks for any thoughts you have,

Mathieu Fenniak

--
Cordialmente,

Ing. Hellmuth I. Vargas S.

Re: Optimizing execution of expensive subqueries

От

David Rowley

Дата:

15 июля 2018 г., 10:44:11

On 12 July 2018 at 02:24, Mathieu Fenniak <mathieu.fenniak@replicon.com> wrote:
> I'm currently looking at a query that is generally selecting a bunch of
> simple columns from a table, and also performing some subqueries to
> aggregate related data, and then sorting by one of the simple columns and
> paginating the result.
>
> eg.
>
> SELECT
>   tbl.field1, tbl.field2, tbl.field3, ...,
>   (SELECT SUM(Duration) FROM anothertbl WHERE anothertbl.UserId = tbl.UserId
> AND anothertbl.ThingyId = 1) as Thingy1Sum,
>   ... repeat for multiply thingies ...
> FROM
>   tbl
> ORDER BY tbl.field1 LIMIT 20
>
> I'm finding that if "tbl" contains hundreds of thousands of rows, the
> subqueries are being executed hundreds of thousands of times.  Because of
> the sorting and pagination, this is appears to be unnecessary, and the
> result is slow performance.  (PostgreSQL 9.5.9 server)

You've got two choices.

1) You can add a btree index on field1 so that the executor does not
need to examine all records before taking the top-20, or;
2) move the subquery out of the target list and instead make it a LEFT
JOIN adding an appropriate GROUP BY clause.

#2 might not be a great option since it may require building groups
that don't get used, but it would likely be the bast option if you
didn't have a LIMIT clause, or the LIMIT was a larger percentage of
the total records.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Optimizing execution of expensive subqueries

Optimizing execution of expensive subqueries

Re: Optimizing execution of expensive subqueries

Re: Optimizing execution of expensive subqueries

Re: Optimizing execution of expensive subqueries