Обсуждение: Strange (and good) side effect of partitioning ?

Поиск

Список

Период

Сортировка

Strange (and good) side effect of partitioning ?

От

Phil Florent

Дата:

14 января 2021 г., 23:48:34

Hi,

I read that on Jonathan Lewis' blog :

(I believe that there may be some RDBMS which will treat (e.g.) “X between 20 and 10” as being identical to “X between 10 and 20” )

I am puzzled. PostgreSQL seems NOT to treat X between 20 and 10” as being identical to “X between 10 and 20" but it's complicated.

Here is my test case:

select version();

version

----------------------------------------------------------------------------------------------------------------------------------

PostgreSQL 13.1 (Ubuntu 13.1-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit

create table t1 (rn integer , object_name text) partition by range(rn);

create table t1a partition of t1 for values from (1) to (50001);

\d+ t1

Table partitionnée « public.t1 »

-------------+---------+-----------------+-----------+------------+----------+-----------------------+-------------

Clé de partition : RANGE (rn)

Partitions: t1a FOR VALUES FROM (1) TO (50001)

insert into t1 select

rownum rn,

upper(md5(random()::text)) object_name

from

(select generate_series(1,50000) rownum) serie

;

explain analyze select object_name

from t1

where

rn between 20 and 10

;

QUERY PLAN

------------------------------------------------------------------------------------

Result (cost=0.00..0.00 rows=0 width=0) (actual time=0.002..0.002 rows=0 loops=1)

One-Time Filter: false

Planning Time: 0.116 ms

Execution Time: 0.020 ms

It's OK but:

explain analyze select object_name

from t1a

where

rn between 20 and 10

;

QUERY PLAN

-------------------------------------------------------------------------------------------------

Seq Scan on t1a (cost=0.00..1167.00 rows=1 width=33) (actual time=6.553..6.553 rows=0 loops=1)

Filter: ((rn >= 20) AND (rn <= 10))

Rows Removed by Filter: 50000

Planning Time: 0.092 ms

Execution Time: 6.573 ms

At first I thought it was related to partition pruning but:

set enable_partition_pruning = false;

explain analyze select object_name

from t1

where

rn between 20 and 10

;

QUERY PLAN

------------------------------------------------------------------------------------

Result (cost=0.00..0.00 rows=0 width=0) (actual time=0.002..0.003 rows=0 loops=1)

One-Time Filter: false

Planning Time: 0.104 ms

Execution Time: 0.021 ms

Confirmation since I still obtain "One-Time Filter: false" if I don't filter on the partition key:

create table t2 (rn integer , rn2 integer, object_name text) partition by range(rn);

create table t2a partition of t2 for values from (1) to (50001);

d+ t2

Table partitionnée « public.t2 »

-------------+---------+-----------------+-----------+------------+----------+-----------------------+-------------

Clé de partition : RANGE (rn)

Partitions: t2a FOR VALUES FROM (1) TO (50001)

insert into t2 select

rownum rn, rownum rn2,

upper(md5(random()::text)) object_name

from

(select generate_series(1,50000) rownum) serie

;

explain analyze select object_name

from t2

where

rn2 between 20 and 10

;

QUERY PLAN

------------------------------------------------------------------------------------

Result (cost=0.00..0.00 rows=0 width=0) (actual time=0.002..0.003 rows=0 loops=1)

One-Time Filter: false

Planning Time: 0.185 ms

Execution Time: 0.019 ms

I don't understand why I don't obtain " One-Time Filter: false" with a classic table or a partition ?

Best regards,

Phil

Re: Strange (and good) side effect of partitioning ?

От

rob stone

Дата:

15 января 2021 г., 03:51:41

Hi,

On Thu, 2021-01-14 at 20:48 +0000, Phil Florent wrote:
> Hi,
> 
> I read that on Jonathan Lewis' blog :
> 
> (I believe that there may be some RDBMS which will treat (e.g.) “X
> between 20 and 10” as being identical to“X between 10 and 20” )
> 
> I am puzzled. PostgreSQL seems NOT to treat X between 20 and 10” as
> being identical to“X between 10 and 20" but it's complicated.
> 
> Here is my test case:
> 
> select version();                                                   
>          version                                                    
>          
> ---------------------------------------------------------------------
> -------------------------------------------------------------
>  PostgreSQL 13.1 (Ubuntu 13.1-1.pgdg20.04+1) on x86_64-pc-linux-gnu,
> compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
> 
> create table t1 (rn integer , object_name text) partition by
> range(rn);
> 
> create table t1a partition of t1 for values from (1) to (50001);
> 
> 
> \d+ t1                                         Table partitionnée «
> public.t1 »
>    Colonne   |  Type   | Collationnement | NULL-able | Par défaut |
> Stockage | Cible de statistiques | Description
> -------------+---------+-----------------+-----------+------------+--
> --------+-----------------------+-------------
>  rn          | integer |                 |           |            |
> plain    |                       |
>  object_name | text    |                 |           |            |
> extended |                       |
> Clé de partition : RANGE (rn)
> Partitions: t1a FOR VALUES FROM (1) TO (50001)
> 
> 
> 
> insert into t1 select                                    rownum  rn,
>         upper(md5(random()::text)) object_name
> from
>         (select generate_series(1,50000) rownum) serie
> ;
> 
> explain analyze select  object_namefrom    t1
> where
>         rn between 20 and 10
> ;
>                                      QUERY PLAN                      
>              
> ---------------------------------------------------------------------
> ---------------
>  Result  (cost=0.00..0.00 rows=0 width=0) (actual time=0.002..0.002
> rows=0 loops=1)
>    One-Time Filter: false
>  Planning Time: 0.116 ms
>  Execution Time: 0.020 ms
> 
> It's OK but:
> 
> explain analyze select  object_namefrom    t1a
> where
>         rn between 20 and 10
> ;
>                                            QUERY PLAN                
>                            
> ---------------------------------------------------------------------
> ----------------------------
>  Seq Scan on t1a  (cost=0.00..1167.00 rows=1 width=33) (actual
> time=6.553..6.553 rows=0 loops=1)
>    Filter: ((rn >= 20) AND (rn <= 10))
>    Rows Removed by Filter: 50000
>  Planning Time: 0.092 ms
>  Execution Time: 6.573 ms
> 
> At first I thought it was related to partition pruning but:
> 
> set enable_partition_pruning = false;
> 
> explain analyze select  object_namefrom    t1
> where
>         rn between 20 and 10
> ;
> 
>                                      QUERY PLAN                      
>              --------------------------------------------------------
> ----------------------------
>  Result  (cost=0.00..0.00 rows=0 width=0) (actual time=0.002..0.003
> rows=0 loops=1)
>    One-Time Filter: false
>  Planning Time: 0.104 ms
>  Execution Time: 0.021 ms
> 
> 
> Confirmation since I still obtain "One-Time Filter: false" if I don't
> filter on the partition key:
> 
> create table t2 (rn integer , rn2 integer, object_name text)
> partition by range(rn);
> 
> create table t2a partition of t2 for values from (1) to (50001);
> 
> d+ t2                                         Table partitionnée «
> public.t2 »
>    Colonne   |  Type   | Collationnement | NULL-able | Par défaut |
> Stockage | Cible de statistiques | Description
> -------------+---------+-----------------+-----------+------------+--
> --------+-----------------------+-------------
>  rn          | integer |                 |           |            |
> plain    |                       |
>  rn2         | integer |                 |           |            |
> plain    |                       |
>  object_name | text    |                 |           |            |
> extended |                       |
> Clé de partition : RANGE (rn)
> Partitions: t2a FOR VALUES FROM (1) TO (50001)
> 
> insert into t2 select                                               
>   rownum  rn, rownum rn2,
>         upper(md5(random()::text)) object_name
> from
>         (select generate_series(1,50000) rownum) serie
> ;
> 
> explain analyze select  object_namefrom    t2
> where
>         rn2 between 20 and 10
> ;
> 
> 
>                                      QUERY PLAN                      
>              --------------------------------------------------------
> ----------------------------
>  Result  (cost=0.00..0.00 rows=0 width=0) (actual time=0.002..0.003
> rows=0 loops=1)
>    One-Time Filter: false
>  Planning Time: 0.185 ms
>  Execution Time: 0.019 ms
> 
> I don't understand why I don't obtain " One-Time Filter: false" with
> a classic table or a partition ?
> 
> Best regards,
> 
> Phil


See table 9.2 in the documentation.

BETWEEN 10 AND 20 returns TRUE.
BETWEEN 20 AND 10 returns FALSE.
BETWEEN SYMMETRIC 20 AND 10 returns TRUE.

HTH,

Re: Strange (and good) side effect of partitioning ?

От

Tom Lane

Дата:

15 января 2021 г., 04:56:14

Phil Florent <philflorent@hotmail.com> writes:
> I read that on Jonathan Lewis' blog :
> (I believe that there may be some RDBMS which will treat (e.g.) "X between 20 and 10" as being identical to "X
between10 and 20" ) 

FWIW, I hope not, because the SQL spec is perfectly clear that it's
not supposed to work like that.  As rob stone noted nearby, you're
supposed to say BETWEEN SYMMETRIC if you want that behavior.
"X between 20 and 10" should always be false (well, unless it's
null because X is null).  But I think that's not really what
your question is.

> I am puzzled. PostgreSQL seems NOT to treat X between 20 and 10" as being identical to "X between 10 and 20" but it's
complicated.

There's no specific mechanism in Postgres that would cause "X between 20
and 10" to be reduced to constant-false (and I kind of think it would
be a waste of effort to add one).  So that's why in simple cases you
get a plan like

>  Seq Scan on t1a  (cost=0.00..1167.00 rows=1 width=33) (actual time=6.553..6.553 rows=0 loops=1)
>    Filter: ((rn >= 20) AND (rn <= 10))

I think that the other cases you show work as they do because the
code for excluding irrelevant range-based partitions is able to
conclude that no partition need be scanned.  That is, the
constant-false-one-time-filter plan arises when we have no
partitions remaining to scan, not because the plan for any one
partition would have looked different from what's above.

            regards, tom lane

Re: Strange (and good) side effect of partitioning ?

От

Tom Lane

Дата:

15 января 2021 г., 05:12:17

I wrote:
> There's no specific mechanism in Postgres that would cause "X between 20
> and 10" to be reduced to constant-false

Wait, I take that back.  There is a mechanism that can conclude that
"X >= 20" and "X <= 10" are contradictory, but it's not applied by
default.  Observe:

regression=# set constraint_exclusion = default;
SET
regression=# explain select * from tenk1 where unique1 between 20 and 10;
                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Index Scan using tenk1_unique1 on tenk1  (cost=0.29..8.30 rows=1 width=244)
   Index Cond: ((unique1 >= 20) AND (unique1 <= 10))
(2 rows)

regression=# set constraint_exclusion = on;     
SET
regression=# explain select * from tenk1 where unique1 between 20 and 10;
                QUERY PLAN                
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

The default value of constraint_exclusion is "partition", which means
(you guessed it) that it's applied only to potential partitioning
constraints.  This is a heuristic based on the typical payoff of
excluding whole partitions versus skipping an empty index scan.
But if you have a workload where it's really worth spending
planner cycles looking for self-contradictory queries, you can
turn it on.

            regards, tom lane

RE: Strange (and good) side effect of partitioning ?

От

Phil Florent

Дата:

15 января 2021 г., 06:52:39

Hi Tom, Hi Rob

Thanks for this clear and complete explanation. My question was unclear since I didn't even consider the results could be identical and it was about the plans. I had misunderstood what J.Lewis had written since he probably meant some RDBMS always do a BETWEEN SYMETRIC. Our application currently has double compatibility with Oracle and PostgreSQL, PostgreSQL only after 2021, and I hope we won't be compatible with a RDBMS that would not respect SQL standard on this aspect.

>But if you have a workload where it's really worth spending
>planner cycles looking for self-contradictory queries, you can
>turn it on.

It was theoretical but it is a DSS tool and some queries can be dynamically built by the end users. Perhaps it really happens on some cases since I don't know if we always check criterias not to obtain self-contradictory queries. Since it's not OLTP our execution times are always much more important than our planning times anyway.

There are other places it could be more interesting to spend time for better performance. Debian 10/PostgreSQL 11 is our initial PostgreSQL platform. It's very efficient but I have to prepare Debian 11/PostgresQL 13. My first goal was to avoid performance regressions but I now want to always fully use planning capabilites of PostgreSQL. We currently have to completely deactive merge joins for some workload, nested loops for some other workload. It's OK but it's not optimal.

My current goal is to always activate (almost) everything with Debian 11/PostgreSQL 13 and everything with Debian 12/PostgreSQL 14+.

I will try to increase default_statistics_target it could be worth the price. I will also try to activate enable_partitionwise_aggregate and enable_partitionwise_join since we use partitioning by list of hospitals and subpartitioning by range of times. Replacing our slow Oracle "union all" views by PostgreSQL partitioned tables to deal with group of hospitals has still to be completed.

Best regards,

Phil

De : Tom Lane <tgl@sss.pgh.pa.us>
Envoyé : vendredi 15 janvier 2021 03:12
À : Phil Florent <philflorent@hotmail.com>
Cc : pgsql-general@postgresql.org <pgsql-general@postgresql.org>
Objet : Re: Strange (and good) side effect of partitioning ?

I wrote:
> There's no specific mechanism in Postgres that would cause "X between 20
> and 10" to be reduced to constant-false

Wait, I take that back. There is a mechanism that can conclude that
"X >= 20" and "X <= 10" are contradictory, but it's not applied by
default. Observe:

regression=# set constraint_exclusion = default;
SET
regression=# explain select * from tenk1 where unique1 between 20 and 10;
                                 QUERY PLAN
-----------------------------------------------------------------------------
Index Scan using tenk1_unique1 on tenk1 (cost=0.29..8.30 rows=1 width=244)
   Index Cond: ((unique1 >= 20) AND (unique1 <= 10))
(2 rows)

regression=# set constraint_exclusion = on;
SET
regression=# explain select * from tenk1 where unique1 between 20 and 10;
                QUERY PLAN
------------------------------------------
Result (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 rows)

The default value of constraint_exclusion is "partition", which means
(you guessed it) that it's applied only to potential partitioning
constraints. This is a heuristic based on the typical payoff of
excluding whole partitions versus skipping an empty index scan.
But if you have a workload where it's really worth spending
planner cycles looking for self-contradictory queries, you can
turn it on.

                        regards, tom lane

RE: Strange (and good) side effect of partitioning ?

От

Phil Florent

Дата:

15 января 2021 г., 07:45:27

Hi Tom,

>The default value of constraint_exclusion is "partition", which means
>(you guessed it) that it's applied only to potential partitioning
>constraints. This is a heuristic based on the typical payoff of
>excluding whole partitions versus skipping an empty index scan.
>But if you have a workload where it's really worth spending
>planner cycles looking for self-contradictory queries, you can
>turn it on.

Interesting. Test case was not real but planning times have to be considered from a more general point of view. They are not a problem with our DSS app but we will also migrate our OLTP applications.

Partitioning is something new for me since we currently don't use it for our OLTP apps. It was not a technical choice, partitioning is not included in standard license of our current RDBMS. I will globally check the gain/loss with real workloads anyway.

Best regards,

Phil

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Strange (and good) side effect of partitioning ?

Strange (and good) side effect of partitioning ?

Re: Strange (and good) side effect of partitioning ?

Re: Strange (and good) side effect of partitioning ?

Re: Strange (and good) side effect of partitioning ?

RE: Strange (and good) side effect of partitioning ?

RE: Strange (and good) side effect of partitioning ?