Обсуждение: Partition pruning on parameters grouped into an array does not prune properly
Partition pruning on parameters grouped into an array does not prune properly
От
Andrei Lepikhov
Дата:
Hi,
Working on improving partition pruning [1] I found a case that I may
treat like a bug. I mean the case with expression on a partitioning
column like:
id = ARRAY[$1,$2]
Basically, pruning on ARRAY[$1,$2] works in the case of single column
(see correct-pruning-example.sql in attachment):
PREPARE test (int, int) AS
SELECT * FROM array_prune WHERE id = ANY(ARRAY[$1,$2]);
EXPLAIN (COSTS OFF) EXECUTE test(1,2);
Append
Subplans Removed: 1
-> Seq Scan on array_prune_t0 array_prune_1
Filter: (id = ANY (ARRAY[$1, $2]))
But if we partition on HASH(x,y) it is not working (see
incorrect-pruning-example.sql):
PREPARE test2 (int,int) AS
SELECT 1 FROM array_prune
WHERE id1 = ANY(ARRAY[$1]) AND id2 = ANY(ARRAY[$2]);
EXPLAIN (COSTS OFF) EXECUTE test2(1,-1);
Append
-> Seq Scan on array_prune_t0 array_prune_1
Filter: ((id1 = ANY (ARRAY[$1])) AND (id2 = ANY (ARRAY[$2])))
-> Seq Scan on array_prune_t1 array_prune_2
Filter: ((id1 = ANY (ARRAY[$1])) AND (id2 = ANY (ARRAY[$2])))
Although its analogue works nice:
PREPARE test3 (int,int) AS
SELECT 1 FROM array_prune
WHERE id1 = $1 AND id2 = $2;
EXPLAIN (COSTS OFF) EXECUTE test3(1,-1);
Append
Subplans Removed: 1
-> Seq Scan on array_prune_t0 array_prune_1
Filter: ((id1 = $1) AND (id2 = $2))
So, before diving into the partitioning depths, someone may quickly say
it is not a bug, but I am missing something. Some hidden semantics?
[1] Prune partitions by ScalarArrayOpExpr with an array parameter
(partkey = ANY($1))
https://www.postgresql.org/message-id/b8cdd20f-b34b-42b9-8c7c-dae864b7b3b2@gmail.com
--
regards, Andrei Lepikhov
Вложения
Re: Partition pruning on parameters grouped into an array does not prune properly
От
David Rowley
Дата:
On Thu, 27 Mar 2025 at 04:19, Andrei Lepikhov <lepihov@gmail.com> wrote:
> But if we partition on HASH(x,y) it is not working (see
> incorrect-pruning-example.sql):
>
> PREPARE test2 (int,int) AS
> SELECT 1 FROM array_prune
> WHERE id1 = ANY(ARRAY[$1]) AND id2 = ANY(ARRAY[$2]);
> EXPLAIN (COSTS OFF) EXECUTE test2(1,-1);
>
> Append
> -> Seq Scan on array_prune_t0 array_prune_1
> Filter: ((id1 = ANY (ARRAY[$1])) AND (id2 = ANY (ARRAY[$2])))
> -> Seq Scan on array_prune_t1 array_prune_2
> Filter: ((id1 = ANY (ARRAY[$1])) AND (id2 = ANY (ARRAY[$2])))
It is a bug. This is down to how match_clause_to_partition_key()
handles ScalarArrayOpExpr. To save some complexity in the handling of
ScalarArrayOpExpr, these get transformed into OpExprs, one for each
item in the ScalarArrayOpExpr. Look for the call to make_opclause()
in match_clause_to_partition_key(). Just a few lines down, you see
that we recursively call gen_partprune_steps_internal() to pass down
the OpExprs that we just generated. The problem is that the recursive
call only contains the OpExprs generated for one of the
ScalarArrayOpExpr, gen_prune_steps_from_opexps() requires equality
quals (or at least an key IS NULL qual) for all partitioned keys for
hash partitioning, otherwise it'll bail out on the following:
if (part_scheme->strategy == PARTITION_STRATEGY_HASH &&
clauselist == NIL && !bms_is_member(i, nullkeys))
return NIL;
I wonder if we need to redesign this to not do that recursive
processing and instead have it so match_clause_to_partition_key() can
generate multiple PartClauseInfos. If we've matched to the
ScalarArrayOpExpr then I think each generated PartClauseInfo should
have the same PartClauseMatchStatus. That would also get rid of the
(kinda silly) overhead we have of having to match the
ScalarArrayOpExpr to the partition key, then generating OpExprs and
having to match those again, even though we know they will match.
I suspect the fix for this might be a bit invasive to backpatch. Maybe
it's something we can give a bit more clear thought to after the
freeze is over.
David
Re: Partition pruning on parameters grouped into an array does not prune properly
От
Andrei Lepikhov
Дата:
On 3/27/25 01:58, David Rowley wrote:
> I wonder if we need to redesign this to not do that recursive
> processing and instead have it so match_clause_to_partition_key() can
> generate multiple PartClauseInfos. If we've matched to the
> ScalarArrayOpExpr then I think each generated PartClauseInfo should
> have the same PartClauseMatchStatus. That would also get rid of the
> (kinda silly) overhead we have of having to match the
> ScalarArrayOpExpr to the partition key, then generating OpExprs and
> having to match those again, even though we know they will match.
>
> I suspect the fix for this might be a bit invasive to backpatch. Maybe
> it's something we can give a bit more clear thought to after the
> freeze is over.
Thank you for the explanation!
Why does the pruning machinery only include the OpExpr pruning
operation? Often, when preparing for pruning steps, we don’t know the
exact number of values we will encounter during the initial or execution
pruning stages. I believe it would be beneficial to have an iterator -
something similar to the predicate_implied_by function - that can
iteratively extract values from an array. This would allow pruning in
practical scenarios, such as the following:
CREATE OR REPLACE FUNCTION some_business_logic(val integer)
RETURNS integer[] AS $$
BEGIN
IF txid_current() % 2 = 0 THEN
RETURN ARRAY[val];
ELSE
RETURN ARRAY[val + 1];
END IF;
END;
$$ LANGUAGE plpgsql STRICT STABLE;
PREPARE test (int) AS
SELECT * FROM array_prune
WHERE id = ANY (some_business_logic($1));
EXPLAIN (ANALYZE, COSTS OFF) EXECUTE test(1);
Also in that case we wouldn't need to decompose a ScalarArrayOpExpr to
the list of OpExpr clauses to prune partitions.
--
regards, Andrei Lepikhov
Re: Partition pruning on parameters grouped into an array does not prune properly
От
Andrei Lepikhov
Дата:
On 3/27/25 01:58, David Rowley wrote:
> I suspect the fix for this might be a bit invasive to backpatch. Maybe
> it's something we can give a bit more clear thought to after the
> freeze is over.
One more issue I think may be addressed (or just considered) here is the
following:
CREATE TABLE parted (a int, b int) PARTITION BY RANGE (a, b);
CREATE TABLE part1 PARTITION OF parted
FOR VALUES FROM (1, 1) TO (1, 10);
CREATE TABLE part2 PARTITION OF parted
FOR VALUES FROM (2, 1) TO (2, 10);
INSERT INTO parted (VALUES (1, 2));
INSERT INTO parted VALUES (2, 2);
EXPLAIN (COSTS OFF)
SELECT * FROM parted WHERE a > 1 AND b < 1;
EXPLAIN (COSTS OFF)
SELECT * FROM parted WHERE a > 1 AND b > 10;
/*
Seq Scan on part2 parted
Filter: ((a > 1) AND (b < 1))
Seq Scan on part2 parted
Filter: ((a > 1) AND (b > 10))
*/
I think partition part2 could be pruned like in the following example:
EXPLAIN (COSTS OFF)
SELECT * FROM parted WHERE a > 2 AND b > 10;
/*
Result
One-Time Filter: false
*/
--
regards, Andrei Lepikhov
Re: Partition pruning on parameters grouped into an array does not prune properly
От
Amit Langote
Дата:
On Thu, Mar 27, 2025 at 9:59 AM David Rowley <dgrowleyml@gmail.com> wrote: > On Thu, 27 Mar 2025 at 04:19, Andrei Lepikhov <lepihov@gmail.com> wrote: > > But if we partition on HASH(x,y) it is not working (see > > incorrect-pruning-example.sql): > > > > PREPARE test2 (int,int) AS > > SELECT 1 FROM array_prune > > WHERE id1 = ANY(ARRAY[$1]) AND id2 = ANY(ARRAY[$2]); > > EXPLAIN (COSTS OFF) EXECUTE test2(1,-1); > > > > Append > > -> Seq Scan on array_prune_t0 array_prune_1 > > Filter: ((id1 = ANY (ARRAY[$1])) AND (id2 = ANY (ARRAY[$2]))) > > -> Seq Scan on array_prune_t1 array_prune_2 > > Filter: ((id1 = ANY (ARRAY[$1])) AND (id2 = ANY (ARRAY[$2]))) > > It is a bug. This is down to how match_clause_to_partition_key() > handles ScalarArrayOpExpr. To save some complexity in the handling of > ScalarArrayOpExpr, these get transformed into OpExprs, one for each > item in the ScalarArrayOpExpr. Look for the call to make_opclause() > in match_clause_to_partition_key(). Just a few lines down, you see > that we recursively call gen_partprune_steps_internal() to pass down > the OpExprs that we just generated. The problem is that the recursive > call only contains the OpExprs generated for one of the > ScalarArrayOpExpr, gen_prune_steps_from_opexps() requires equality > quals (or at least an key IS NULL qual) for all partitioned keys for > hash partitioning, otherwise it'll bail out on the following: > > if (part_scheme->strategy == PARTITION_STRATEGY_HASH && > clauselist == NIL && !bms_is_member(i, nullkeys)) > return NIL; > > I wonder if we need to redesign this to not do that recursive > processing and instead have it so match_clause_to_partition_key() can > generate multiple PartClauseInfos. If we've matched to the > ScalarArrayOpExpr then I think each generated PartClauseInfo should > have the same PartClauseMatchStatus. That would also get rid of the > (kinda silly) overhead we have of having to match the > ScalarArrayOpExpr to the partition key, then generating OpExprs and > having to match those again, even though we know they will match. Hmm, how would the multiple PartClauseInfos that would be generated in this case -- presumably one per element in the ScalarArrayOpExpr -- be interpreted by gen_partprune_steps_internal()? As things stand, that code assumes multiple clauses for a given key are to be combined using logical AND, which doesn’t line up with how we handle ScalarArrayOpExpr with ANY. In that case (useOr = true), we expand the clause into individual OpExprs for each element, wrap them in an OR_EXPR, and pass that down recursively to gen_partprune_steps_internal(), which we know works correctly for single partition key case. So unless we also rethink how pruning step generation in the caller (gen_partprune_steps_internal()) interprets multiple PartClauseInfos for the same key, just returning more of them from match_clause_to_partition_key() wouldn’t be enough. We’d need to restructure the caller to combine them in a way that reflects the intended cross-product of values across partition keys. For example, consider: key1 = ANY(ARRAY[1, 2]) AND key2 = ANY(ARRAY[3, 4]) which expands to: (key1 = 1 OR key1 = 2) AND (key2 = 3 OR key2 = 4) and then to DNF: (key1 = 1 AND key2 = 3) OR (key1 = 1 AND key2 = 4) OR (key1 = 2 AND key2 = 3) OR (key1 = 2 AND key2 = 4) To support pruning in this case, we’d need to extend gen_partprune_steps_internal() to construct value combinations across independently matched keys. The way that could work is for each (key1, key2) pair in the cross-product, we’d generate a PartitionPruneStepOp, and combine their results using an OR-mode PartitionPruneStepCombine step. As for whether this qualifies as a bug, I’d lean toward “no” -- at least in the sense that it doesn’t produce incorrect results. We’ve historically treated unsupported pruning cases like this as missed optimization opportunities rather than planner correctness issues. That would argue against backpatching a fix, even if we do decide to improve the pruning logic in this area and for other cases going forward. -- Thanks, Amit Langote