Обсуждение: Enable partitionwise join for partition keys wrapped by RelabelType
Hi,
The function exprs_known_equal() is used by the planner to determine if
two expressions are semantically equivalent, often by checking if they
belong to the same Equivalence Class (EC).
When a partitioned table uses a varchar(N) type as a partition key, the
partition key expression stored in the equivalence class member
(em->em_expr) is often wrapped in a RelabelType node.
However, when checking a join condition or a predicate against the EC
member, the input expression (item1 or item2) may not contain this
RelabelType wrapper, leading to an incorrect equal() comparison and
failing to detect a known equivalence. This prevents the planner from
recognizing that a join condition matches the partition keys, thereby
disabling optimizations like partitionwise joins.
On match_expr_to_partition_keys() we already strip away any RelabelType
that is present on OpExpr, so I think that can also modify
exprs_known_equal() to do the same and enable partitionwise joins for
more cases. Please see the attached patch.
On master for queries like the following we have this plan:
EXPLAIN(VERBOSE, COSTS OFF) SELECT fprt3.a, fprt4.a FROM fprt3 JOIN
fprt4 ON fprt3.c = fprt4.c WHERE fprt3.c = '0002';
QUERY PLAN
---------------------------------------------------------------------------------
Nested Loop
Output: fprt3.a, fprt4.a
-> Foreign Scan on public.fprt3_p2 fprt3
Output: fprt3.a, fprt3.c
Remote SQL: SELECT a, c FROM public.fprt3_ft WHERE ((c = '0002'))
-> Materialize
Output: fprt4.a, fprt4.c
-> Foreign Scan on public.fprt4_p2 fprt4
Output: fprt4.a, fprt4.c
Remote SQL: SELECT a, c FROM public.fprt4_ft WHERE ((c = '0002'))
(10 rows)
With the patch and "enable_partitionwise_join" set to on we get this query plan
QUERY PLAN
---------------------------------------------------------------------------------
Foreign Scan
Output: fprt3.a, fprt4.a
Relations: (public.fprt3_p2 fprt3) INNER JOIN (public.fprt4_p2 fprt4)
Remote SQL: SELECT r4.a, r5.a FROM (public.fprt3_ft r4 INNER JOIN public.fprt4_ft r5 ON (((r5.c = '0002')) AND
((r4.c= '0002'))))
(4 rows)
--
Matheus Alcantara
EDB: http://www.enterprisedb.com
Вложения
On Mon, Dec 15, 2025 at 10:46 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> Hi,
>
> The function exprs_known_equal() is used by the planner to determine if
> two expressions are semantically equivalent, often by checking if they
> belong to the same Equivalence Class (EC).
>
hi.
src/include/nodes/primnodes.h CollateExpr comments:
/*----------
* CollateExpr - COLLATE
*
* The planner replaces CollateExpr with RelabelType during expression
* preprocessing, so execution never sees a CollateExpr.
*----------
*/
examine_variable handling RelabelType (transformed from CollateExpr) is wrong, i
think. Roughly speaking it will reduce "t2.c collate case_insensitive" to
"t2.c". see [1].
If examine_variable does not strip the RelabelType(CollateExpr) node, then
estimate_num_groups->add_unique_group_var may need to deal with RelabelType.
The estimate_num_groups function should account for collation settings. "GROUP
BY a" and "GROUP BY a COLLATE case_insensitive" may result in different row
estimates and should be handled distinctly. Therefore exprs_known_equal within
add_unique_group_var must ensure collation is compared before assuming equality.
In this context, while strippping RelabelType, we should ensure
exprCollation(RelabelType->arg) is the same as the RelabelType->resultcollid.
However, it does not affect partitionwise join, because commit [2] already fixed
the collation issue. so we don't have to worry about
have_partkey_equi_join->exprs_known_equal code path for the RelabelType node.
so i change exprs_known_equal to:
+ /* Remove any relabel decorations if collation match */
+ if (IsA(expr, RelabelType))
+ {
+ RelabelType *relabel = castNode(RelabelType, expr);
+ Expr *rexpr = (Expr *) relabel;
+
+ while (rexpr && IsA(rexpr, RelabelType))
+ rexpr = ((RelabelType *) rexpr)->arg;
+
+ if (exprCollation((Node *) rexpr) == relabel->resultcollid)
+ expr = rexpr;
+ }
[1] https://postgr.es/m/CACJufxGLCiyhM+P0gxesg2x--PTrMY3PszqSqOq_H4QS_oq3Jg@mail.gmail.com
[2] https://git.postgresql.org/cgit/postgresql.git/commit/?id=075acdd93388c080c0fb0aca5723144ad7a56dac
--
jian
https://www.enterprisedb.com
Вложения
Re: Enable partitionwise join for partition keys wrapped by RelabelType
От
"Matheus Alcantara"
Дата:
On Thu Jan 15, 2026 at 12:30 AM -03, jian he wrote:
> On Mon, Dec 15, 2025 at 10:46 PM Matheus Alcantara
> <matheusssilv97@gmail.com> wrote:
>> The function exprs_known_equal() is used by the planner to determine if
>> two expressions are semantically equivalent, often by checking if they
>> belong to the same Equivalence Class (EC).
>>
> src/include/nodes/primnodes.h CollateExpr comments:
> /*----------
> * CollateExpr - COLLATE
> *
> * The planner replaces CollateExpr with RelabelType during expression
> * preprocessing, so execution never sees a CollateExpr.
> *----------
> */
>
> examine_variable handling RelabelType (transformed from CollateExpr) is wrong, i
> think. Roughly speaking it will reduce "t2.c collate case_insensitive" to
> "t2.c". see [1].
>
> If examine_variable does not strip the RelabelType(CollateExpr) node, then
> estimate_num_groups->add_unique_group_var may need to deal with RelabelType.
>
> The estimate_num_groups function should account for collation settings. "GROUP
> BY a" and "GROUP BY a COLLATE case_insensitive" may result in different row
> estimates and should be handled distinctly. Therefore exprs_known_equal within
> add_unique_group_var must ensure collation is compared before assuming equality.
>
> In this context, while strippping RelabelType, we should ensure
> exprCollation(RelabelType->arg) is the same as the RelabelType->resultcollid.
>
> However, it does not affect partitionwise join, because commit [2] already fixed
> the collation issue. so we don't have to worry about
> have_partkey_equi_join->exprs_known_equal code path for the RelabelType node.
>
> so i change exprs_known_equal to:
>
> + /* Remove any relabel decorations if collation match */
> + if (IsA(expr, RelabelType))
> + {
> + RelabelType *relabel = castNode(RelabelType, expr);
> + Expr *rexpr = (Expr *) relabel;
> +
> + while (rexpr && IsA(rexpr, RelabelType))
> + rexpr = ((RelabelType *) rexpr)->arg;
> +
> + if (exprCollation((Node *) rexpr) == relabel->resultcollid)
> + expr = rexpr;
> + }
>
> [1] https://postgr.es/m/CACJufxGLCiyhM+P0gxesg2x--PTrMY3PszqSqOq_H4QS_oq3Jg@mail.gmail.com
> [2] https://git.postgresql.org/cgit/postgresql.git/commit/?id=075acdd93388c080c0fb0aca5723144ad7a56dac
>
Thanks for taking a look at this.
Although this make sense to me I see difference in row estimation using
your v2 patch for the following example:
CREATE COLLATION case_insensitive (provider = icu, locale = '@colStrength=secondary', deterministic = false);
CREATE DOMAIN d_txt1 AS text collate case_insensitive;
CREATE TABLE t1 (a int, b int, c text) PARTITION BY LIST(c);
CREATE TABLE t1_p1 PARTITION OF t1 FOR VALUES IN ('0000', '0003', '0004', '0010');
CREATE TABLE t1_p2 PARTITION OF t1 FOR VALUES IN ('0001', '0005', '0002', '0009');
CREATE TABLE t1_p3 PARTITION OF t1 FOR VALUES IN ('0006', '0007', '0008', '0011');
INSERT INTO t1 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 599, 2) i;
ANALYZE t1;
CREATE TABLE t2 (a int, b int, c d_txt1) PARTITION BY LIST(c);
CREATE TABLE t2_p1 PARTITION OF t2 FOR VALUES IN ('0000', '0003', '0004', '0010');
CREATE TABLE t2_p2 PARTITION OF t2 FOR VALUES IN ('0001', '0005', '0002', '0009');
CREATE TABLE t2_p3 PARTITION OF t2 FOR VALUES IN ('0006', '0007', '0008', '0011');
INSERT INTO t2 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 599, 3) i;
ANALYZE t2;
SET enable_partitionwise_join TO true;
V1 patch:
postgres=# select * from check_estimated_rows($$ SELECT FROM t1, t2 WHERE t1.c = t2.c GROUP BY t1.c, t2.c $$);
estimated | actual
-----------+--------
12 | 12
V2 patch:
postgres=# select * from check_estimated_rows($$ SELECT FROM t1, t2 WHERE t1.c = t2.c GROUP BY t1.c, t2.c $$);
estimated | actual
-----------+--------
144 | 12
I've also tried to make the partitions of t1 and t2 as foreign tables
and I got the same row estimation difference.
I'm just wondering if we are missing something?
About the v2 patch:
+ if (IsA(expr, RelabelType))
+ {
+ RelabelType *relabel = castNode(RelabelType, expr);
+ Expr *rexpr = (Expr *) relabel;
+
+ while (rexpr && IsA(rexpr, RelabelType))
+ rexpr = ((RelabelType *) rexpr)->arg;
+
+ if (exprCollation((Node *) rexpr) == relabel->resultcollid)
+ expr = rexpr;
I think that unwrapping all Relabel types may ignore intermediate
states. For example, consider Relabel A -> Relabel B -> Base expression:
This code will unwrap Relabel A and Relabel B and check the expression
collation directly on Base expression, shouldn't we check every layer,
e.g Relabel A, Relabel B and Base Expression? Please see the attached v3
version with a simplified version of v2 that also check every layer of a
RelabelType node.
--
Matheus Alcantara
EDB: https://www.enterprisedb.com
Вложения
On Tue, Jan 27, 2026 at 11:42 PM Matheus Alcantara
<matheusssilv97@gmail.com> wrote:
>
> Although this make sense to me I see difference in row estimation using
> your v2 patch for the following example:
>
> CREATE COLLATION case_insensitive (provider = icu, locale = '@colStrength=secondary', deterministic = false);
> CREATE DOMAIN d_txt1 AS text collate case_insensitive;
> CREATE TABLE t1 (a int, b int, c text) PARTITION BY LIST(c);
> CREATE TABLE t1_p1 PARTITION OF t1 FOR VALUES IN ('0000', '0003', '0004', '0010');
> CREATE TABLE t1_p2 PARTITION OF t1 FOR VALUES IN ('0001', '0005', '0002', '0009');
> CREATE TABLE t1_p3 PARTITION OF t1 FOR VALUES IN ('0006', '0007', '0008', '0011');
> INSERT INTO t1 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 599, 2) i;
> ANALYZE t1;
>
> CREATE TABLE t2 (a int, b int, c d_txt1) PARTITION BY LIST(c);
> CREATE TABLE t2_p1 PARTITION OF t2 FOR VALUES IN ('0000', '0003', '0004', '0010');
> CREATE TABLE t2_p2 PARTITION OF t2 FOR VALUES IN ('0001', '0005', '0002', '0009');
> CREATE TABLE t2_p3 PARTITION OF t2 FOR VALUES IN ('0006', '0007', '0008', '0011');
> INSERT INTO t2 SELECT i, i, to_char(i/50, 'FM0000') FROM generate_series(0, 599, 3) i;
> ANALYZE t2;
>
> SET enable_partitionwise_join TO true;
>
> V1 patch:
>
> postgres=# select * from check_estimated_rows($$ SELECT FROM t1, t2 WHERE t1.c = t2.c GROUP BY t1.c, t2.c $$);
> estimated | actual
> -----------+--------
> 12 | 12
>
> V2 patch:
>
> postgres=# select * from check_estimated_rows($$ SELECT FROM t1, t2 WHERE t1.c = t2.c GROUP BY t1.c, t2.c $$);
> estimated | actual
> -----------+--------
> 144 | 12
>
> I've also tried to make the partitions of t1 and t2 as foreign tables
> and I got the same row estimation difference.
>
> I'm just wondering if we are missing something?
>
Hi.
in function process_equivalence, we have:
/*
* Ensure both input expressions expose the desired collation (their types
* should be OK already); see comments for canonicalize_ec_expression.
*/
item1 = canonicalize_ec_expression(item1,
exprType((Node *) item1),
collation);
item2 = canonicalize_ec_expression(item2,
exprType((Node *) item2),
collation);
Let's simplify the test case.
CREATE COLLATION case_insensitive (provider = icu, locale =
'@colStrength=secondary', deterministic = false);
CREATE DOMAIN d_txt1 AS text collate case_insensitive;
CREATE TABLE t3 (a int, b int, c text);
INSERT INTO t3 SELECT i % 12, i, to_char(i/50, 'FM0000') FROM
generate_series(0, 599, 2) i;
ANALYZE t3;
CREATE TABLE t4 (a int, b int, c d_txt1);
INSERT INTO t4 SELECT i % 10, i, to_char(i/50, 'FM0000') FROM
generate_series(0, 599, 3) i;
ANALYZE t4;
EXPLAIN SELECT FROM t3, t4 WHERE t3.c = t4.c GROUP BY t3.c, t4.c;
The ``WHERE t3.c = t4.c `` after the function process_equivalence, it will
produce 2 RELABELTYPE node in EquivalenceClass->ec_members->em_expr and your v1
uncondition strip these two RELABELTYPE, exprs_known_equal will retrun true,
therefore for numdistinct it will think "GROUP BY t3.c, t4.c" is the same as
""GROUP BY t3.c".
However if we not unconditionly strip RELABELTYPE, exprs_known_equal will return
false, therefore "GROUP BY t3.c, t4.c", "t3.c", "t4.c" is not identical, so it
multiply these two distinct numbers. IMHO, That's the reason for
estimate number 144 versus 12.
Please also see the comments in canonicalize_ec_expression.
Actually, I think the comments in canonicalize_ec_expression discourage strip
RelabelType nodes when RelabelType->resultcollid differs from the collation of
RelabelType->arg.
--
jian
https://www.enterprisedb.com/
On 25/02/26 09:46, jian he wrote: > On Tue, Jan 27, 2026 at 11:42 PM Matheus Alcantara > <matheusssilv97@gmail.com> wrote: >> >> Although this make sense to me I see difference in row estimation using >> your v2 patch for the following example: >> >> ... >> >> >> V1 patch: >> >> postgres=# select * from check_estimated_rows($$ SELECT FROM t1, t2 WHERE t1.c = t2.c GROUP BY t1.c, t2.c $$); >> estimated | actual >> -----------+-------- >> 12 | 12 >> >> V2 patch: >> >> postgres=# select * from check_estimated_rows($$ SELECT FROM t1, t2 WHERE t1.c = t2.c GROUP BY t1.c, t2.c $$); >> estimated | actual >> -----------+-------- >> 144 | 12 >> >> I've also tried to make the partitions of t1 and t2 as foreign tables >> and I got the same row estimation difference. >> >> I'm just wondering if we are missing something? >> > Hi. > > in function process_equivalence, we have: > /* > * Ensure both input expressions expose the desired collation (their types > * should be OK already); see comments for canonicalize_ec_expression. > */ > item1 = canonicalize_ec_expression(item1, > exprType((Node *) item1), > collation); > item2 = canonicalize_ec_expression(item2, > exprType((Node *) item2), > collation); > > > Let's simplify the test case. > CREATE COLLATION case_insensitive (provider = icu, locale = > '@colStrength=secondary', deterministic = false); > CREATE DOMAIN d_txt1 AS text collate case_insensitive; > CREATE TABLE t3 (a int, b int, c text); > INSERT INTO t3 SELECT i % 12, i, to_char(i/50, 'FM0000') FROM > generate_series(0, 599, 2) i; > ANALYZE t3; > CREATE TABLE t4 (a int, b int, c d_txt1); > INSERT INTO t4 SELECT i % 10, i, to_char(i/50, 'FM0000') FROM > generate_series(0, 599, 3) i; > ANALYZE t4; > EXPLAIN SELECT FROM t3, t4 WHERE t3.c = t4.c GROUP BY t3.c, t4.c; > > The ``WHERE t3.c = t4.c `` after the function process_equivalence, it will > produce 2 RELABELTYPE node in EquivalenceClass->ec_members->em_expr and your v1 > uncondition strip these two RELABELTYPE, exprs_known_equal will retrun true, > therefore for numdistinct it will think "GROUP BY t3.c, t4.c" is the same as > ""GROUP BY t3.c". > > However if we not unconditionly strip RELABELTYPE, exprs_known_equal will return > false, therefore "GROUP BY t3.c, t4.c", "t3.c", "t4.c" is not identical, so it > multiply these two distinct numbers. IMHO, That's the reason for > estimate number 144 versus 12. > Ok, that make sense to me, thanks for pointing this out. So do you think that v3 attached on [1] correctly address this? > Please also see the comments in canonicalize_ec_expression. > Actually, I think the comments in canonicalize_ec_expression discourage strip > RelabelType nodes when RelabelType->resultcollid differs from the collation of > RelabelType->arg. > Agree, thanks. [1] https://www.postgresql.org/message-id/DFZHIGROJHVS.25OYGENTHBLSM%40gmail.com -- Matheus Alcantara EDB: https://www.enterprisedb.com