Обсуждение: Query plan: SELECT vs INSERT from same select

Поиск
Список
Период
Сортировка

Query plan: SELECT vs INSERT from same select

От
Alexander Voytsekhovskyy
Дата:
I have quite complicated query:

SELECT axis_x1, axis_y1, SUM(delivery_price)  as v_1 FROM (
SELECT to_char(delivery_data.delivery_date, 'YYYY-MM') as axis_x1, clients.id_client as axis_y1, delivery_data.amount * production_price.price * groups.discount as delivery_price

FROM delivery_data
JOIN client_tt ON (client_tt.id_client_tt = delivery_data.id_client_tt)
JOIN clients ON (client_tt.id_client = clients.id_client)
JOIN production ON (production.id = delivery_data.id_product)
JOIN groups ON (groups.id = delivery_data.delivery_group_id AND client_tt.id_group = groups.id AND groups.id = clients.id_group)
LEFT JOIN production_price on (delivery_data.id_product = production_price.id_production AND groups.price_list_id = production_price.price_list_id AND delivery_data.delivery_date BETWEEN production_price.date_from AND production_price.date_to)

WHERE delivery_data.delivery_date between '2019-03-01' AND '2019-06-30'
AND delivery_data.delivery_group_id IN (...short list of values...)
AND delivery_data.id_product IN ()) AS tmpsource

WHERE TRUE
GROUP BY GROUPING SETS ((axis_x1, axis_y1), (axis_x1), (axis_y1), ())

It runs well, took 1s and returns 4000 rows.

You can see explain analyze verbose here:
https://explain.depesz.com/s/AEWj

The problem is, when i wrap it to

A)
INSERT INTO norepl_1542_result (axis_x1, axis_y1, v_1)
SELECT .... SAME QUERY

OR even

B)
WITH rows AS (
... SAME SELECT QUERY ...
)
INSERT INTO norepl_1542_result (axis_x1, axis_y1, v_1)
SELECT * FROM rows

The query time dramatically drops to 500+ seconds.

You can see explain analyze verbose here
https://explain.depesz.com/s/AEWj

As you can see, 100% of time goes to same SELECT query, there is no issues with INSERT-part

I have played a lot and it's reproducing all time.

So my question is, why wrapping SELECT query with INSERT FROM SELECT dramatically change query plan and make it 500x slower?

Re: Query plan: SELECT vs INSERT from same select

От
Igor Korot
Дата:
Hi,

On Tue, Jul 23, 2019 at 3:29 PM Alexander Voytsekhovskyy
<young.inbox@gmail.com> wrote:
>
> I have quite complicated query:
>
> SELECT axis_x1, axis_y1, SUM(delivery_price)  as v_1 FROM (
> SELECT to_char(delivery_data.delivery_date, 'YYYY-MM') as axis_x1, clients.id_client as axis_y1, delivery_data.amount
*production_price.price * groups.discount as delivery_price
 
>
> FROM delivery_data
> JOIN client_tt ON (client_tt.id_client_tt = delivery_data.id_client_tt)
> JOIN clients ON (client_tt.id_client = clients.id_client)
> JOIN production ON (production.id = delivery_data.id_product)
> JOIN groups ON (groups.id = delivery_data.delivery_group_id AND client_tt.id_group = groups.id AND groups.id =
clients.id_group)
> LEFT JOIN production_price on (delivery_data.id_product = production_price.id_production AND groups.price_list_id =
production_price.price_list_idAND delivery_data.delivery_date BETWEEN production_price.date_from AND
production_price.date_to)
>
> WHERE delivery_data.delivery_date between '2019-03-01' AND '2019-06-30'
> AND delivery_data.delivery_group_id IN (...short list of values...)
> AND delivery_data.id_product IN ()) AS tmpsource
>
> WHERE TRUE
> GROUP BY GROUPING SETS ((axis_x1, axis_y1), (axis_x1), (axis_y1), ())
>
> It runs well, took 1s and returns 4000 rows.
>
> You can see explain analyze verbose here:
> https://explain.depesz.com/s/AEWj
>
> The problem is, when i wrap it to
>
> A)
> INSERT INTO norepl_1542_result (axis_x1, axis_y1, v_1)
> SELECT .... SAME QUERY
>
> OR even
>
> B)
> WITH rows AS (
> ... SAME SELECT QUERY ...
> )
> INSERT INTO norepl_1542_result (axis_x1, axis_y1, v_1)
> SELECT * FROM rows
>
> The query time dramatically drops to 500+ seconds.
>
> You can see explain analyze verbose here
> https://explain.depesz.com/s/AEWj
>
> As you can see, 100% of time goes to same SELECT query, there is no issues with INSERT-part
>
> I have played a lot and it's reproducing all time.
>
> So my question is, why wrapping SELECT query with INSERT FROM SELECT dramatically change query plan and make it 500x
slower?

Which version of PostgreSQL do you have?
Which OS does it running on?

Thank you.



Re: Query plan: SELECT vs INSERT from same select

От
Tom Lane
Дата:
Alexander Voytsekhovskyy <young.inbox@gmail.com> writes:
> You can see explain analyze verbose here:
> https://explain.depesz.com/s/AEWj

> The problem is, when i wrap it to

> A)
> INSERT INTO norepl_1542_result (axis_x1, axis_y1, v_1)
> SELECT .... SAME QUERY

> OR even

> B)
> WITH rows AS (
> ... SAME SELECT QUERY ...
> )
> INSERT INTO norepl_1542_result (axis_x1, axis_y1, v_1)
> SELECT * FROM rows

> The query time dramatically drops to 500+ seconds.

> You can see explain analyze verbose here
> https://explain.depesz.com/s/AEWj

That's the same link.

> As you can see, 100% of time goes to same SELECT query, there is no issues
> with INSERT-part

We can't see any such thing from what you posted.

            regards, tom lane



Re: Query plan: SELECT vs INSERT from same select

От
Tom Lane
Дата:
[ please keep the list cc'd ]

Alexander Voytsekhovskyy <young.inbox@gmail.com> writes:
> Sorry again
> here is both links:
> https://explain.depesz.com/s/AEWj
> https://explain.depesz.com/s/CHwF

Don't think I believe that those are the same query --- there's a
CTE in the second one that doesn't appear in the first, and it is
eating a lot of time too.

I have a vague recollection that there are cases where optimizations
are possible in plain SELECT but not in data-modifying queries.
So maybe this isn't pilot error but something triggered by that.
We'd need to see a self-contained test case to verify that though.

            regards, tom lane



Re: Query plan: SELECT vs INSERT from same select

От
Alban Hertroys
Дата:
> On 23 Jul 2019, at 22:29, Alexander Voytsekhovskyy <young.inbox@gmail.com> wrote:
>
> I have quite complicated query:
>
> SELECT axis_x1, axis_y1, SUM(delivery_price)  as v_1 FROM (
> SELECT to_char(delivery_data.delivery_date, 'YYYY-MM') as axis_x1, clients.id_client as axis_y1, delivery_data.amount
*production_price.price * groups.discount as delivery_price 
>
> FROM delivery_data
> JOIN client_tt ON (client_tt.id_client_tt = delivery_data.id_client_tt)
> JOIN clients ON (client_tt.id_client = clients.id_client)
> JOIN production ON (production.id = delivery_data.id_product)
> JOIN groups ON (groups.id = delivery_data.delivery_group_id AND client_tt.id_group = groups.id AND groups.id =
clients.id_group) 

Are client_tt.id_group and clients.id_group ever different from each other? It looks like you might have redundant
informationthere, but... If they are guaranteed to be the same then you don’t need the JOIN to clients, which would
bothremove a JOIN and reduce the complexity of the JOIN condition on groups. 

Or (assuming the group id’s are indeed supposed to be equal), you could
 JOIN clients ON (client_tt.id_client = clients.id_client AND client_tt.id_group = clients.id_group)
instead of putting that condition within the JOIN condition on groups.

I don’t think either option will make a huge difference (the first probably more than the second, as it reduces an
entirejoin), but it could be enough to help the database figure out a better plan. 

> LEFT JOIN production_price on (delivery_data.id_product = production_price.id_production AND groups.price_list_id =
production_price.price_list_idAND delivery_data.delivery_date BETWEEN production_price.date_from AND
production_price.date_to) 
>
> WHERE delivery_data.delivery_date between '2019-03-01' AND '2019-06-30'
> AND delivery_data.delivery_group_id IN (...short list of values...)
> AND delivery_data.id_product IN ()) AS tmpsource

You don’t have a price if your goods weren’t produced in the delivery window you set? Or do you have goods that get
deliveredwithout having a price? 

You seem to be using this query for a report on nett sales by month, but I have my doubts whether that LEFT JOIN, and
especiallythe condition on the production date window, is really what you want: Your formula for delivery_price
includesthe price column from that LEFT JOIN, so you’re going to get 0 values when there is no production_price record
inyour delivery-window, resulting in a SUM that’s too low if the product was produced before (or after, but that seems
unlikely)the delivery window. 

> WHERE TRUE

This line is unnecessary.

> GROUP BY GROUPING SETS ((axis_x1, axis_y1), (axis_x1), (axis_y1), ())

Apparently (I’m new to these statements), CUBE (axis_x1, axis_y1) is a shorthand for the above. They seem to have been
introducedat the same time (in 9.6?). See:
https://www.postgresql.org/docs/11/queries-table-expressions.html#QUERIES-GROUPING-SETS

> It runs well, took 1s and returns 4000 rows.

I won’t go into the performance issue ash this point, other more knowledgeable people already did.

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.