Обсуждение: Strange query planner behavior

Поиск

Список

Период

Сортировка

Strange query planner behavior

От

EffiSYS / Martin Querleu

Дата:

30 ноября 2019 г., 11:31:08

Hello

I have a strange problem with the query planner on Postgresql 11.5 on 
Debian stretch, the plan differs between the following 2 requests:

- SELECT * FROM LIVRAISON WHERE ID_MASTER = 10 which uses a btree index 
on ID_MASTER (the table has 1M rows). Everything is normal
- SELECT * FROM LIVRAISON WHERE ID_MASTER = (SELECT 10) which uses a seq 
scan and is 3000 times slower

I don't understand how the planner cannot consider that a subselect with 
an = is equivalent to having = VALUE (the subselect either returning 1 
row or NULL)

I don't have the same behavior on other column with indexes of the same 
table, maybe it's because 99% or the table has ID_MASTER = 0? I can 
understand that if the value returned by the subquery is 0 the seqscan 
could be faster (in our case it is still slower than index scan but only 
by 2 times), but if the subquery does not return 0 in no case the 
seqscan could be faster. The question is why is the subquery not 
calculated before choosing wether to use the index or not since it will 
return a single value?

Thanks for your reply and sorry if the question is stupid

Best regards
Martin Querleu

Re: Strange query planner behavior

От

Pavel Stehule

Дата:

30 ноября 2019 г., 12:55:26

so 30. 11. 2019 v 10:31 odesílatel EffiSYS / Martin Querleu <martin.querleu@effisys.fr> napsal:

Hello

I have a strange problem with the query planner on Postgresql 11.5 on
Debian stretch, the plan differs between the following 2 requests:

- SELECT * FROM LIVRAISON WHERE ID_MASTER = 10 which uses a btree index
on ID_MASTER (the table has 1M rows). Everything is normal
- SELECT * FROM LIVRAISON WHERE ID_MASTER = (SELECT 10) which uses a seq
scan and is 3000 times slower

I don't understand how the planner cannot consider that a subselect with
an = is equivalent to having = VALUE (the subselect either returning 1
row or NULL)

I don't have the same behavior on other column with indexes of the same
table, maybe it's because 99% or the table has ID_MASTER = 0? I can
understand that if the value returned by the subquery is 0 the seqscan
could be faster (in our case it is still slower than index scan but only
by 2 times), but if the subquery does not return 0 in no case the
seqscan could be faster. The question is why is the subquery not
calculated before choosing wether to use the index or not since it will
return a single value?

Thanks for your reply and sorry if the question is stupid

please try

1. run vacuum analyze on LIVRAISON

2. send result of EXPLAIN ANALYZE SELECT * FROM ... for both cases

here is a tool for sharing explains https://explain.depesz.com/

Regards

Pavel

Best regards
Martin Querleu

Re: Strange query planner behavior

От

Pavel Stehule

Дата:

30 ноября 2019 г., 12:59:05

so 30. 11. 2019 v 10:55 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:

Hi

so 30. 11. 2019 v 10:31 odesílatel EffiSYS / Martin Querleu <martin.querleu@effisys.fr> napsal:
Hello

I have a strange problem with the query planner on Postgresql 11.5 on
Debian stretch, the plan differs between the following 2 requests:

- SELECT * FROM LIVRAISON WHERE ID_MASTER = 10 which uses a btree index
on ID_MASTER (the table has 1M rows). Everything is normal
- SELECT * FROM LIVRAISON WHERE ID_MASTER = (SELECT 10) which uses a seq
scan and is 3000 times slower

I don't understand how the planner cannot consider that a subselect with
an = is equivalent to having = VALUE (the subselect either returning 1
row or NULL)

I don't have the same behavior on other column with indexes of the same
table, maybe it's because 99% or the table has ID_MASTER = 0? I can
understand that if the value returned by the subquery is 0 the seqscan
could be faster (in our case it is still slower than index scan but only
by 2 times), but if the subquery does not return 0 in no case the
seqscan could be faster. The question is why is the subquery not
calculated before choosing wether to use the index or not since it will
return a single value?

Thanks for your reply and sorry if the question is stupid

please try

1. run vacuum analyze on LIVRAISON
2. send result of EXPLAIN ANALYZE SELECT * FROM ... for both cases

here is a tool for sharing explains https://explain.depesz.com/

the reason probably will be in using sublans in second case. There should be ensured so sublan results only one row. Probably better for optimizer in this case is SELECT WHERE x IN (SELECT ..

Regards

Pavel

Best regards
Martin Querleu

Re: Strange query planner behavior

От

Pavel Stehule

Дата:

30 ноября 2019 г., 13:00:39

so 30. 11. 2019 v 10:55 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:

Hi

so 30. 11. 2019 v 10:31 odesílatel EffiSYS / Martin Querleu <martin.querleu@effisys.fr> napsal:
Hello

I have a strange problem with the query planner on Postgresql 11.5 on
Debian stretch, the plan differs between the following 2 requests:

- SELECT * FROM LIVRAISON WHERE ID_MASTER = 10 which uses a btree index
on ID_MASTER (the table has 1M rows). Everything is normal
- SELECT * FROM LIVRAISON WHERE ID_MASTER = (SELECT 10) which uses a seq
scan and is 3000 times slower

I don't understand how the planner cannot consider that a subselect with
an = is equivalent to having = VALUE (the subselect either returning 1
row or NULL)

I don't have the same behavior on other column with indexes of the same
table, maybe it's because 99% or the table has ID_MASTER = 0? I can
understand that if the value returned by the subquery is 0 the seqscan
could be faster (in our case it is still slower than index scan but only
by 2 times), but if the subquery does not return 0 in no case the
seqscan could be faster. The question is why is the subquery not
calculated before choosing wether to use the index or not since it will
return a single value?

Thanks for your reply and sorry if the question is stupid

please try

1. run vacuum analyze on LIVRAISON
2. send result of EXPLAIN ANALYZE SELECT * FROM ... for both cases

3. do you have some custom settings of planner configuration variables like random_page_cost, seq_page_cost?

here is a tool for sharing explains https://explain.depesz.com/

Regards

Pavel

Best regards
Martin Querleu

Re: Strange query planner behavior

От

EffiSYS / Martin Querleu

Дата:

30 ноября 2019 г., 13:29:53

Hi Pavel

Thanks for the fast reply
Our databases are VACUUMed everyday. I did it again but no difference

Here are the query plans:

EFT_MBON=# explain analyse select * from livraison where id_master = 10;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using pour_recherche_sous_livraison on livraison (cost=0.03..15.04 rows=1 width=697) (actual time=0.017..0.017 rows=0 loops=1)
Index Cond: (id_master = 10)
Planning Time: 0.124 ms
Execution Time: 0.036 ms
(4 lignes)

EFT_MBON=# explain analyse select * from livraison where id_master = (select 10);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Seq Scan on livraison (cost=0.01..2888156.69 rows=1917632 width=697) (actual time=1334.615..1334.615 rows=0 loops=1)
Filter: (id_master = $0)
Rows Removed by Filter: 1918196
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.000..0.001 rows=1 loops=1)
Planning Time: 0.138 ms
Execution Time: 1334.642 ms
(7 lignes)

Regarding the cost calculator the configuration is as follows:

random_page_cost and seq_page_cost are identical since the data is 100% in RAM (both at 15.0, 3 times default)
cpu_tuple_cost at 0.005 (half default)
cpu_index_tuple_cost at 0.00025 (half defaut)
cpu_operator_cost at 0.00025 (default, by the way I assume we should lower it at 0.0001)

I would expect the seq scan to be more costly than default since both page_cost are higher and cpu_index_tuple_cost lower

I think the main question is whether the query planner is able to pre calculate subqueries with = to use the value returned to get the good query plan

Best regards
Martin

On 30/11/2019 11:00, Pavel Stehule wrote:

so 30. 11. 2019 v 10:55 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
Hi

so 30. 11. 2019 v 10:31 odesílatel EffiSYS / Martin Querleu <martin.querleu@effisys.fr> napsal:
Hello

I have a strange problem with the query planner on Postgresql 11.5 on
Debian stretch, the plan differs between the following 2 requests:

- SELECT * FROM LIVRAISON WHERE ID_MASTER = 10 which uses a btree index
on ID_MASTER (the table has 1M rows). Everything is normal
- SELECT * FROM LIVRAISON WHERE ID_MASTER = (SELECT 10) which uses a seq
scan and is 3000 times slower

I don't understand how the planner cannot consider that a subselect with
an = is equivalent to having = VALUE (the subselect either returning 1
row or NULL)

I don't have the same behavior on other column with indexes of the same
table, maybe it's because 99% or the table has ID_MASTER = 0? I can
understand that if the value returned by the subquery is 0 the seqscan
could be faster (in our case it is still slower than index scan but only
by 2 times), but if the subquery does not return 0 in no case the
seqscan could be faster. The question is why is the subquery not
calculated before choosing wether to use the index or not since it will
return a single value?

Thanks for your reply and sorry if the question is stupid

please try

1. run vacuum analyze on LIVRAISON
2. send result of EXPLAIN ANALYZE SELECT * FROM ... for both cases

3. do you have some custom settings of planner configuration variables like random_page_cost, seq_page_cost?

here is a tool for sharing explains https://explain.depesz.com/

Regards

Pavel

Best regards
Martin Querleu

-- 
Martin Querleu - Directeur Général
EffiSYS (www.effitrace.fr - www.logistique-e-commerce.fr)
martin.querleu@effisys.fr
3, rue Gustave Delory
59000 Lille
Tél: +33 9 54 28 38 76

Vous rencontrez un problème d'utilisation sur effitr@ce?
=====> écrivez à support@effisys.fr
Vous rencontrez un problème technique au niveau des échanges de données?
=====> écrivez à supervision@effisys.fr

Re: Strange query planner behavior

От

Pavel Stehule

Дата:

30 ноября 2019 г., 13:45:59

so 30. 11. 2019 v 11:29 odesílatel EffiSYS / Martin Querleu <martin.querleu@effisys.fr> napsal:

Hi Pavel

Thanks for the fast reply
Our databases are VACUUMed everyday. I did it again but no difference

Here are the query plans:

EFT_MBON=# explain analyse select * from livraison where id_master = 10;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using pour_recherche_sous_livraison on livraison (cost=0.03..15.04 rows=1 width=697) (actual time=0.017..0.017 rows=0 loops=1)
Index Cond: (id_master = 10)
Planning Time: 0.124 ms
Execution Time: 0.036 ms
(4 lignes)

EFT_MBON=# explain analyse select * from livraison where id_master = (select 10);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Seq Scan on livraison (cost=0.01..2888156.69 rows=1917632 width=697) (actual time=1334.615..1334.615 rows=0 loops=1)
Filter: (id_master = $0)
Rows Removed by Filter: 1918196
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.000..0.001 rows=1 loops=1)
Planning Time: 0.138 ms
Execution Time: 1334.642 ms
(7 lignes)

Regarding the cost calculator the configuration is as follows:

random_page_cost and seq_page_cost are identical since the data is 100% in RAM (both at 15.0, 3 times default)
cpu_tuple_cost at 0.005 (half default)
cpu_index_tuple_cost at 0.00025 (half defaut)
cpu_operator_cost at 0.00025 (default, by the way I assume we should lower it at 0.0001)

I would expect the seq scan to be more costly than default since both page_cost are higher and cpu_index_tuple_cost lower

I think the main question is whether the query planner is able to pre calculate subqueries with = to use the value returned to get the good query plan

The basic problem is in very bad estimation

Seq Scan on livraison (cost=0.01..2888156.69 rows=1917632 width=697) (actual time=1334.615..1334.615 rows=0 loops=1)

Looks like the estimation lost a const value, and try to estimate result against unknown variable. Probably the table livraison has in id_master some values that has massively higher number than other. Subplans are estimated separately.

There is not simply solution - you have to rewrite your queries - used syntax blocks flattening, and that is wrong.

SELECT * FROM LIVRAISON WHERE ID_MASTER = (SELECT 10)

this query is optimized as two independent queries - SELECT * FROM LIVRAISON WHERE ID_MASTER and SELECT 10. Although "SELECT 10" has const result, first query desn't calculate it. Postgres planner doesn't expect so somebody will write these queries, and don't try to detect const table results. If you rewrite query to

SELECT * FROM LIVRAISON WHERE ID_MASTER IN (SELECT 10)

Then it will be optimized as one query and it should to work.

Pavel

Best regards
Martin

On 30/11/2019 11:00, Pavel Stehule wrote:

so 30. 11. 2019 v 10:55 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
Hi

so 30. 11. 2019 v 10:31 odesílatel EffiSYS / Martin Querleu <martin.querleu@effisys.fr> napsal:
Hello

I have a strange problem with the query planner on Postgresql 11.5 on
Debian stretch, the plan differs between the following 2 requests:

- SELECT * FROM LIVRAISON WHERE ID_MASTER = 10 which uses a btree index
on ID_MASTER (the table has 1M rows). Everything is normal
- SELECT * FROM LIVRAISON WHERE ID_MASTER = (SELECT 10) which uses a seq
scan and is 3000 times slower

I don't understand how the planner cannot consider that a subselect with
an = is equivalent to having = VALUE (the subselect either returning 1
row or NULL)

I don't have the same behavior on other column with indexes of the same
table, maybe it's because 99% or the table has ID_MASTER = 0? I can
understand that if the value returned by the subquery is 0 the seqscan
could be faster (in our case it is still slower than index scan but only
by 2 times), but if the subquery does not return 0 in no case the
seqscan could be faster. The question is why is the subquery not
calculated before choosing wether to use the index or not since it will
return a single value?

Thanks for your reply and sorry if the question is stupid

please try

1. run vacuum analyze on LIVRAISON
2. send result of EXPLAIN ANALYZE SELECT * FROM ... for both cases

3. do you have some custom settings of planner configuration variables like random_page_cost, seq_page_cost?

here is a tool for sharing explains https://explain.depesz.com/

Regards

Pavel

Best regards
Martin Querleu
-- 
Martin Querleu - Directeur Général
EffiSYS (www.effitrace.fr - www.logistique-e-commerce.fr)
martin.querleu@effisys.fr
3, rue Gustave Delory
59000 Lille
Tél: +33 9 54 28 38 76

Vous rencontrez un problème d'utilisation sur effitr@ce?
=====> écrivez à support@effisys.fr
Vous rencontrez un problème technique au niveau des échanges de données?
=====> écrivez à supervision@effisys.fr

Re: Strange query planner behavior

От

Martin Querleu

Дата:

30 ноября 2019 г., 15:46:02

Indeed the distribution of data is very concentrated with value ID_MASTER = 0 for 99% of the table

I put the example of (SELECT 10) after some testing to show the strangeness of the situation but originally it was a subrequest on other tables. I understand the planner isn't conceived to expect a "select CONST" but a subrequest giving only one result (you know it will be only 1 result because there is a = operator before, if it could produce several results the request would fail) stays constant along the query so it'd be better in some cases if calculated before setting up the query plan of the rest

I rewrote my stored procedure to get the result of the subrequest then select * from livraison where id_master = the result and everything works fine using the index. Thanks

Maybe an idea for future optimisation in the query planner would be that every subrequest that has a = before or any other operator allowing only 0-1 result (<, >, IS NULL, etc.) could be calculated before setting up the query plan of the higher level (and could be in parallel if the subrequest is big work). Then the subrequest result would not be 'unknown' anymore and used for query plan setup

For example:

SELECT * FROM TABLE_A WHERE CONDITIONS AND COLUMN1 = (SELECT WHATEVER FROM TABLE_B INNER JOIN TABLE_C ON (...) WHERE ... etc) would set up a query plan like:

1) Calculate a plan and estimate the complexity of the subrequest(s). Even better: if very complex launch it(them) in another thread(s) / process(es)

2) Estimate if CONDITIONS will remove a lot of rows in TABLE_A. If yes (case A) subrequest result will be used only for recheck and we can set up and launch the main query plan at once. If not (case B) we have to wait for the subrequest to be finished

3) If case A and no rows to be rechecked (so no result) then finished. Else wait for the subrequest result (if parallel) or calculate it (if not parallel) as it will be needed in step 4

4) In case A recheck according to the subrequest result and in case B set up a query plan for the main request that takes into accounts the result of the subrequest and pray that it gives far better performance. Or maybe limit that to simple cases where the column compared to the subrequest is indexed and it will be undoubtfully be faster

I'm not expert enough to imagine all cases especially if stored procedures that modify data are implied, maybe it's not feasible. But it's an idea

Have a nice day and thanks

Martin

Le sam. 30 nov. 2019 à 11:46, Pavel Stehule <pavel.stehule@gmail.com> a écrit :

so 30. 11. 2019 v 11:29 odesílatel EffiSYS / Martin Querleu <martin.querleu@effisys.fr> napsal:
Hi Pavel

Thanks for the fast reply
Our databases are VACUUMed everyday. I did it again but no difference

Here are the query plans:

EFT_MBON=# explain analyse select * from livraison where id_master = 10;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using pour_recherche_sous_livraison on livraison (cost=0.03..15.04 rows=1 width=697) (actual time=0.017..0.017 rows=0 loops=1)
Index Cond: (id_master = 10)
Planning Time: 0.124 ms
Execution Time: 0.036 ms
(4 lignes)

EFT_MBON=# explain analyse select * from livraison where id_master = (select 10);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Seq Scan on livraison (cost=0.01..2888156.69 rows=1917632 width=697) (actual time=1334.615..1334.615 rows=0 loops=1)
Filter: (id_master = $0)
Rows Removed by Filter: 1918196
InitPlan 1 (returns $0)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.000..0.001 rows=1 loops=1)
Planning Time: 0.138 ms
Execution Time: 1334.642 ms
(7 lignes)

Regarding the cost calculator the configuration is as follows:

random_page_cost and seq_page_cost are identical since the data is 100% in RAM (both at 15.0, 3 times default)
cpu_tuple_cost at 0.005 (half default)
cpu_index_tuple_cost at 0.00025 (half defaut)
cpu_operator_cost at 0.00025 (default, by the way I assume we should lower it at 0.0001)

I would expect the seq scan to be more costly than default since both page_cost are higher and cpu_index_tuple_cost lower

I think the main question is whether the query planner is able to pre calculate subqueries with = to use the value returned to get the good query plan

The basic problem is in very bad estimation

Seq Scan on livraison (cost=0.01..2888156.69 rows=1917632 width=697) (actual time=1334.615..1334.615 rows=0 loops=1)

Looks like the estimation lost a const value, and try to estimate result against unknown variable. Probably the table livraison has in id_master some values that has massively higher number than other. Subplans are estimated separately.

There is not simply solution - you have to rewrite your queries - used syntax blocks flattening, and that is wrong.

SELECT * FROM LIVRAISON WHERE ID_MASTER = (SELECT 10)

this query is optimized as two independent queries - SELECT * FROM LIVRAISON WHERE ID_MASTER and SELECT 10. Although "SELECT 10" has const result, first query desn't calculate it. Postgres planner doesn't expect so somebody will write these queries, and don't try to detect const table results. If you rewrite query to

SELECT * FROM LIVRAISON WHERE ID_MASTER IN (SELECT 10)

Then it will be optimized as one query and it should to work.

Pavel
Best regards
Martin

On 30/11/2019 11:00, Pavel Stehule wrote:

so 30. 11. 2019 v 10:55 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
Hi

so 30. 11. 2019 v 10:31 odesílatel EffiSYS / Martin Querleu <martin.querleu@effisys.fr> napsal:
Hello

I have a strange problem with the query planner on Postgresql 11.5 on
Debian stretch, the plan differs between the following 2 requests:

- SELECT * FROM LIVRAISON WHERE ID_MASTER = 10 which uses a btree index
on ID_MASTER (the table has 1M rows). Everything is normal
- SELECT * FROM LIVRAISON WHERE ID_MASTER = (SELECT 10) which uses a seq
scan and is 3000 times slower

I don't understand how the planner cannot consider that a subselect with
an = is equivalent to having = VALUE (the subselect either returning 1
row or NULL)

I don't have the same behavior on other column with indexes of the same
table, maybe it's because 99% or the table has ID_MASTER = 0? I can
understand that if the value returned by the subquery is 0 the seqscan
could be faster (in our case it is still slower than index scan but only
by 2 times), but if the subquery does not return 0 in no case the
seqscan could be faster. The question is why is the subquery not
calculated before choosing wether to use the index or not since it will
return a single value?

Thanks for your reply and sorry if the question is stupid

please try

1. run vacuum analyze on LIVRAISON
2. send result of EXPLAIN ANALYZE SELECT * FROM ... for both cases

3. do you have some custom settings of planner configuration variables like random_page_cost, seq_page_cost?

here is a tool for sharing explains https://explain.depesz.com/

Regards

Pavel

Best regards
Martin Querleu
-- 
Martin Querleu - Directeur Général
EffiSYS (www.effitrace.fr - www.logistique-e-commerce.fr)
martin.querleu@effisys.fr
3, rue Gustave Delory
59000 Lille
Tél: +33 9 54 28 38 76

Vous rencontrez un problème d'utilisation sur effitr@ce?
=====> écrivez à support@effisys.fr
Vous rencontrez un problème technique au niveau des échanges de données?
=====> écrivez à supervision@effisys.fr

Re: Strange query planner behavior

От

Tomas Vondra

Дата:

30 ноября 2019 г., 17:53:52

On Sat, Nov 30, 2019 at 11:29:53AM +0100, EffiSYS / Martin Querleu wrote:
>Hi Pavel
>
>Thanks for the fast reply
>Our databases are VACUUMed everyday. I did it again but no difference
>
>Here are the query plans:
>
>EFT_MBON=# explain analyse select * from livraison where id_master = 10;
> QUERY PLAN

>--------------------------------------------------------------------------------------------------------------------------------------------
> Index Scan using pour_recherche_sous_livraison on livraison 
> (cost=0.03..15.04 rows=1 width=697) (actual time=0.017..0.017 rows=0 
>loops=1)
>   Index Cond: (id_master = 10)
> Planning Time: 0.124 ms
> Execution Time: 0.036 ms
>(4 lignes)
>
>EFT_MBON=# explain analyse select * from livraison where id_master = 
>(select 10);
>                                                      QUERY PLAN

>-----------------------------------------------------------------------------------------------------------------------
> Seq Scan on livraison  (cost=0.01..2888156.69 rows=1917632 width=697) 
>(actual time=1334.615..1334.615 rows=0 loops=1)
>   Filter: (id_master = $0)
>   Rows Removed by Filter: 1918196
>   InitPlan 1 (returns $0)
>     ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual 
>time=0.000..0.001 rows=1 loops=1)
> Planning Time: 0.138 ms
> Execution Time: 1334.642 ms
>(7 lignes)
>
>Regarding the cost calculator the configuration is as follows:
>
>random_page_cost and seq_page_cost are identical since the data is 
>100% in RAM (both at 15.0, 3 times default)

Ummm, what? Does this mean you have 

random_page_cost = 15
seq_page_cost = 15

Neither of that is 3 times the default value, though, so maybe I just
don't understand correctly.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Strange query planner behavior

От

Tom Lane

Дата:

30 ноября 2019 г., 19:00:32

Pavel Stehule <pavel.stehule@gmail.com> writes:
> so 30. 11. 2019 v 11:29 odesílatel EffiSYS / Martin Querleu <
> martin.querleu@effisys.fr> napsal:
>> I think the main question is whether the query planner is able to pre
>> calculate subqueries with = to use the value returned to get the good query
>> plan

> SELECT * FROM LIVRAISON WHERE ID_MASTER = (SELECT 10)
> this query is optimized as two independent queries - SELECT * FROM
> LIVRAISON WHERE ID_MASTER and SELECT 10. Although "SELECT 10" has const
> result, first query desn't calculate it. Postgres planner doesn't expect so
> somebody will write these queries, and don't try to detect const table
> results.

More to the point: usually, when somebody writes something that way,
it's because they *want* to hide the sub-select expression from the
upper-level query.  It's pretty common to use this syntax to prevent an
expensive or volatile function from being recalculated multiple times,
for instance.  It would certainly not be that hard to pull up the
expression out of a trivial scalar sub-select, but we'll reject any
patch to do that, because it would make many more users unhappy than
happy.  If you don't want this behavior, don't write it that way.

As Pavel suggests, "IN" is a reasonable alternative if you don't
want to skip the "(SELECT ...)" wrapper for some reason.

            regards, tom lane

Re: Strange query planner behavior

От

EffiSYS / Martin Querleu

Дата:

02 декабря 2019 г., 12:20:30

Hi Tomas

I tried the method of using IN instead of =, it's not really better in 
the present case:

EFT_MBON=# explain analyse select * from livraison where id_master in 
(select 10);
                                                           QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------
  Nested Loop Semi Join  (cost=0.00..211841.54 rows=1919474 width=697) 
(actual time=1245.271..1245.271 rows=0 loops=1)
    Join Filter: (livraison.id_master = (10))
    Rows Removed by Join Filter: 1921796
    ->  Seq Scan on livraison  (cost=0.00..201476.37 rows=1919474 
width=697) (actual time=0.006..211.196 rows=1921796 loops=1)
    ->  Materialize  (cost=0.00..0.01 rows=1 width=4) (actual 
time=0.000..0.000 rows=1 loops=1921796)
          ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual 
time=0.001..0.001 rows=1 loops=1)
  Planning Time: 0.200 ms
  Execution Time: 1245.309 ms
(8 lignes)

I tried many different query planner cost variables (including defaults) 
but no impact, as expected (default for random_page_cost is 4 so 15 is 
about 3-4 times the default)
This was a very precise case of a special data distribution, so we'll go 
for a workaround on this one

Best regards and thanks
Martin

On 30/11/2019 15:53, Tomas Vondra wrote:
> On Sat, Nov 30, 2019 at 11:29:53AM +0100, EffiSYS / Martin Querleu wrote:
>> Hi Pavel
>>
>> Thanks for the fast reply
>> Our databases are VACUUMed everyday. I did it again but no difference
>>
>> Here are the query plans:
>>
>> EFT_MBON=# explain analyse select * from livraison where id_master = 10;
>>  QUERY PLAN
>>
--------------------------------------------------------------------------------------------------------------------------------------------

>>
>>  Index Scan using pour_recherche_sous_livraison on livraison 
>>  (cost=0.03..15.04 rows=1 width=697) (actual time=0.017..0.017 rows=0 
>> loops=1)
>>    Index Cond: (id_master = 10)
>>  Planning Time: 0.124 ms
>>  Execution Time: 0.036 ms
>> (4 lignes)
>>
>> EFT_MBON=# explain analyse select * from livraison where id_master = 
>> (select 10);
>>                                                       QUERY PLAN
>>
-----------------------------------------------------------------------------------------------------------------------

>>
>>  Seq Scan on livraison  (cost=0.01..2888156.69 rows=1917632 
>> width=697) (actual time=1334.615..1334.615 rows=0 loops=1)
>>    Filter: (id_master = $0)
>>    Rows Removed by Filter: 1918196
>>    InitPlan 1 (returns $0)
>>      ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual 
>> time=0.000..0.001 rows=1 loops=1)
>>  Planning Time: 0.138 ms
>>  Execution Time: 1334.642 ms
>> (7 lignes)
>>
>> Regarding the cost calculator the configuration is as follows:
>>
>> random_page_cost and seq_page_cost are identical since the data is 
>> 100% in RAM (both at 15.0, 3 times default)
>
> Ummm, what? Does this mean you have
> random_page_cost = 15
> seq_page_cost = 15
>
> Neither of that is 3 times the default value, though, so maybe I just
> don't understand correctly.
>
> regards
>


-- 
Martin Querleu - Directeur Général
EffiSYS (www.effitrace.fr - www.logistique-e-commerce.fr)
martin.querleu@effisys.fr
3, rue Gustave Delory
59000 Lille
Tél: +33 9 54 28 38 76

Vous rencontrez un problème d'utilisation sur effitr@ce?
=====> écrivez à support@effisys.fr
Vous rencontrez un problème technique au niveau des échanges de données?
=====> écrivez à supervision@effisys.fr

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Strange query planner behavior