Обсуждение: BUG #19370: PG18 returns incorrect array slice results when slice bounds depend on another array expression
BUG #19370: PG18 returns incorrect array slice results when slice bounds depend on another array expression
От
PG Bug reporting form
Дата:
The following bug has been logged on the website:
Bug reference: 19370
Logged by: Zepeng Zhang
Email address: redraiment@gmail.com
PostgreSQL version: 18.1
Operating system: macOS 15.6.1 & Debian Linux 12
Description:
I have encountered a behavioral regression in PostgreSQL 18 related to array
slicing when the slice start index is computed from another array expression
in the same SELECT list.
The attached SQL query (included below in full for reproducibility)
implements a recursive Sudoku constraint-propagation algorithm purely in
SQL. The final SELECT computes several derived arrays from a base
`candidates bigint[]` array:
* `unsorted_candidates`: the original candidate array, with no defined
ordering
* `sorted_candidates`: the same candidates sorted in ascending numeric order
* `counts bigint[]`: for each Sudoku cell (in row-major order), the number
of candidates belonging to that cell
From these, two array slices are compared:
1. `sorted_candidates[2 + 1:]`
This is a constant-expression slice that removes the first two elements
from `sorted_candidates`.
The result is correct and identical in PostgreSQL 17 and PostgreSQL 18.
2. `sorted_candidates[counts[1] + 1:]`
Since `counts[1] = 2` in this query, this slice is semantically
equivalent to the expression above and **should produce the same result**.
### Observed behavior
* **PostgreSQL 17**:
Both expressions return identical results, as expected.
* **PostgreSQL 18**:
The result of `sorted_candidates[counts[1] + 1:]` is incorrect.
Instead of slicing `sorted_candidates`, PostgreSQL 18 appears to slice
`unsorted_candidates`, preserving the original (unsorted) order and merely
removing the first two elements of that unsorted array.
In other words, PostgreSQL 18 returns a slice that corresponds to:
```
unsorted_candidates[counts[1] + 1:]
```
even though the slice expression explicitly references `sorted_candidates`.
### Why this appears to be a bug
* The array being sliced (`sorted_candidates`) is explicitly named and
independently computed in the SELECT list.
* `counts[1]` is a scalar expression that evaluates deterministically to
`2`.
* Replacing `counts[1]` with the literal constant `2` restores the correct
behavior in PostgreSQL 18.
* The query relies only on documented SQL semantics; no undefined ordering
assumptions are made for `sorted_candidates`, as it is explicitly ordered
using `ORDER BY` inside `array(...)`.
This suggests that PostgreSQL 18 may be incorrectly reusing or misbinding
array expressions when evaluating array slices whose bounds depend on other
output expressions in the same SELECT list.
### Reproducibility
* Reproduced on:
* macOS (Apple Silicon)
* Debian Linux
* PostgreSQL versions tested:
* PostgreSQL 17: correct behavior
* PostgreSQL 18: incorrect behavior
The full SQL query used to reproduce the issue is included verbatim below.
```sql
with recursive coordinates as (
select
row * 10 + col + 1 as id,
row + 1 as row,
col + 1 as col,
row / 3 * 3 + col / 3 + 1 as box
from
generate_series(0, 8) as rs(row),
generate_series(0, 8) as cs(col)
), propagations as (
select
sources.id as source_id,
targets.id as target_id
from
coordinates as sources
inner join
coordinates as targets
on
sources.row = targets.row
or sources.col = targets.col
or sources.box = targets.box
), puzzles as (
select
'{26,59,63,75,84,91,111,129,137,155,164,234,246,261,279,328,345,416,441,467,472,569,613,649,651,674,692,793,836,888}'::bigint[]
as pendings,
'{11,12,13,14,15,16,17,18,19,31,32,33,34,35,36,37,38,39,41,42,43,44,45,46,47,48,49,141,142,143,144,145,146,147,148,149,171,172,173,174,175,176,177,178,179,181,182,183,184,185,186,187,188,189,191,192,193,194,195,196,197,198,199,211,212,213,214,215,216,217,218,219,221,222,223,224,225,226,227,228,229,251,252,253,254,255,256,257,258,259,281,282,283,284,285,286,287,288,289,291,292,293,294,295,296,297,298,299,311,312,313,314,315,316,317,318,319,331,332,333,334,335,336,337,338,339,351,352,353,354,355,356,357,358,359,361,362,363,364,365,366,367,368,369,371,372,373,374,375,376,377,378,379,381,382,383,384,385,386,387,388,389,391,392,393,394,395,396,397,398,399,421,422,423,424,425,426,427,428,429,431,432,433,434,435,436,437,438,439,451,452,453,454,455,456,457,458,459,481,482,483,484,485,486,487,488,489,491,492,493,494,495,496,497,498,499,511,512,513,514,515,516,517,518,519,521,522,523,524,525,526,527,528,529,531,532,533,534,535,536,537,538,539,541,542,543,544,545,546,547,548,549,551,552,553,554,555,556,557,558,559,571,572,573,574,575,576,577,578,579,581,582,583,584,585,586,587,588,589,591,592,593,594,595,596,597,598,599,621,622,623,624,625,626,627,628,629,631,632,633,634,635,636,637,638,639,661,662,663,664,665,666,667,668,669,681,682,683,684,685,686,687,688,689,711,712,713,714,715,716,717,718,719,721,722,723,724,725,726,727,728,729,731,732,733,734,735,736,737,738,739,741,742,743,744,745,746,747,748,749,751,752,753,754,755,756,757,758,759,761,762,763,764,765,766,767,768,769,771,772,773,774,775,776,777,778,779,781,782,783,784,785,786,787,788,789,811,812,813,814,815,816,817,818,819,821,822,823,824,825,826,827,828,829,841,842,843,844,845,846,847,848,849,851,852,853,854,855,856,857,858,859,861,862,863,864,865,866,867,868,869,871,872,873,874,875,876,877,878,879,891,892,893,894,895,896,897,898,899}'::bigint[]
as candidates
union all
select
array(
select
min(candidate)
from
unnest(candidates) as _(candidate)
group by
candidate / 10
having
count(*) = 1
) as pendings,
candidates
from (
select
array(
select unnest(candidates)
except all
select
propagations.target_id * 10 + digit
from (
select
pending / 10,
pending % 10
from
unnest(pendings) as _(pending)
) as pendings(coordinate_id, digit)
inner join
propagations
on
pendings.coordinate_id = propagations.source_id
except all
select
pending / 10 * 10 + id
from
unnest(pendings) as _0(pending),
generate_series(1, 9) as _1(id)
) as candidates
from
puzzles
where
cardinality(pendings) > 0
) as _
), solutions as (
select
array(
select
count(*)
from
unnest(candidates) as _(candidate)
group by
candidate / 10
order by
candidate / 10
) as counts,
array(
select unnest(candidates) order by 1
) as sorted_candidates,
candidates as unsorted_candidates
from
puzzles
where
cardinality(pendings) = 0
and cardinality(candidates) > 0
)
select
counts[1] as numbers,
sorted_candidates,
unsorted_candidates,
sorted_candidates[2 + 1:] as expect_candidates,
sorted_candidates[counts[1] + 1:] as actual_candidates
from
solutions
```
On Tue, 6 Jan 2026 at 21:22, PG Bug reporting form <noreply@postgresql.org> wrote: > I have encountered a behavioral regression in PostgreSQL 18 related to array > slicing when the slice start index is computed from another array expression > in the same SELECT list. > > The attached SQL query (included below in full for reproducibility) Many thanks for the report and reproducer. I've not looked as to why, but this seems to be caused by a7f107df2. I've included Andres. David
David Rowley <dgrowleyml@gmail.com> writes:
> On Tue, 6 Jan 2026 at 21:22, PG Bug reporting form
> <noreply@postgresql.org> wrote:
>> I have encountered a behavioral regression in PostgreSQL 18 related to array
>> slicing when the slice start index is computed from another array expression
>> in the same SELECT list.
> Many thanks for the report and reproducer. I've not looked as to why,
> but this seems to be caused by a7f107df2. I've included Andres.
After re-reading that patch, I suspect an aliasing problem,
specifically from this bit in ExecInitSubPlanExpr:
* ... No
* danger of conflicts with other uses of resvalue/resnull as storing and
* using the value always is in subsequent steps.
ExecInitExprRec(arg, state,
&state->resvalue, &state->resnull);
The comment seems perhaps too airy :-(. I experimented with replacing
- &state->resvalue, &state->resnull);
+ resv, resnull);
and indeed that makes the submitted query work --- but it breaks other
queries in our regression tests, so it's not a usable solution.
I don't see a good reason why ExecInitSubPlanExpr shouldn't be allowed
to use state->resvalue/resnull this way. Given that the problem seems
specific to array slicing, I'm suspicious that some step in array
slicing is failing to cope with input and output datum storage being
the same, or something close to that.
regards, tom lane
I wrote:
> After re-reading that patch, I suspect an aliasing problem,
> specifically from this bit in ExecInitSubPlanExpr:
> * ... No
> * danger of conflicts with other uses of resvalue/resnull as storing and
> * using the value always is in subsequent steps.
> ExecInitExprRec(arg, state,
> &state->resvalue, &state->resnull);
Yup, that's it. The problem occurs when a SubPlan is in the
subscripts of an array reference. In that case,
ExecInitSubscriptingRef has already emitted code to load the source
array into its target resv/resnull, which might well be the
ExprState's resvalue/resnull. So it's not okay for the array
subscript calculation steps to overwrite the ExprState's
resvalue/resnull, but ExecInitSubPlanExpr thinks it can.
We *could* safely use ExecInitSubPlanExpr's target resv/resnull,
but that doesn't line up with EEOP_PARAM_SET's definition:
ExecEvalParamSet is hard-wired to store from state->resvalue/resnull.
I thought all along that that was probably too simplistic.
We could either generalize EEOP_PARAM_SET to include an explicit
specification of the source value's address, or insert some kind
of LOAD operation to copy the computed value into
state->resvalue/resnull. I don't see anything that looks like
that today, though.
regards, tom lane
Hi,
I'm also looking into this.
On 2026-01-06 10:59:41 -0500, Tom Lane wrote:
> David Rowley <dgrowleyml@gmail.com> writes:
> > On Tue, 6 Jan 2026 at 21:22, PG Bug reporting form
> > <noreply@postgresql.org> wrote:
> >> I have encountered a behavioral regression in PostgreSQL 18 related to array
> >> slicing when the slice start index is computed from another array expression
> >> in the same SELECT list.
>
> > Many thanks for the report and reproducer. I've not looked as to why,
> > but this seems to be caused by a7f107df2. I've included Andres.
>
> After re-reading that patch, I suspect an aliasing problem,
> specifically from this bit in ExecInitSubPlanExpr:
>
> * ... No
> * danger of conflicts with other uses of resvalue/resnull as storing and
> * using the value always is in subsequent steps.
>
> ExecInitExprRec(arg, state,
> &state->resvalue, &state->resnull);
>
> The comment seems perhaps too airy :-(. I experimented with replacing
>
> - &state->resvalue, &state->resnull);
> + resv, resnull);
>
> and indeed that makes the submitted query work --- but it breaks other
> queries in our regression tests, so it's not a usable solution.
>
> I don't see a good reason why ExecInitSubPlanExpr shouldn't be allowed
> to use state->resvalue/resnull this way.
I don't either.
> Given that the problem seems specific to array slicing, I'm suspicious that
> some step in array slicing is failing to cope with input and output datum
> storage being the same, or something close to that.
I think the problem may indeed be an interaction with subscription (I think it
may not require slicing though, I see a similar problem with plain array
indexing).
I think the problem is that ExecBuildProjectionInfo() evaluates, quite
reasonably, each column into state->resvalue (thereby calling
ExecInitExprRec() with resv pointing to
state->resvalue). ExecInitSubscriptingRef() then does:
/*
* Evaluate array input. It's safe to do so into resv/resnull, because we
* won't use that as target for any of the other subexpressions, and it'll
* be overwritten by the final EEOP_SBSREF_FETCH/ASSIGN step, which is
* pushed last.
*/
ExecInitExprRec(sbsref->refexpr, state, resv, resnull);
which puts the input array into state->resvalue. However, I don't think that
can work reliably, because it also evaluates arbitrary expressions afterwards
for the subscripts - if those subscript evaluations also use state->resvalue,
we're in trouble.
I don't quite know yet which step is to blame here.
Greetings,
Andres Freund
Hi, On 2026-01-06 11:40:01 -0500, Tom Lane wrote: > I wrote: > > After re-reading that patch, I suspect an aliasing problem, > > specifically from this bit in ExecInitSubPlanExpr: > > > * ... No > > * danger of conflicts with other uses of resvalue/resnull as storing and > > * using the value always is in subsequent steps. > > > ExecInitExprRec(arg, state, > > &state->resvalue, &state->resnull); > > Yup, that's it. The problem occurs when a SubPlan is in the > subscripts of an array reference. In that case, > ExecInitSubscriptingRef has already emitted code to load the source > array into its target resv/resnull, which might well be the > ExprState's resvalue/resnull. I see we pinpointed the same thing... > So it's not okay for the array subscript calculation steps to overwrite the > ExprState's resvalue/resnull, but ExecInitSubPlanExpr thinks it can. But I'm less sure that the problem is ExecInitSubPlanExpr()'s use of state->resvalue, rather than ExecInitSubscriptingRef() belief that the *resv value won't change. If I could travel through time, I'd tell younger Andres to introduce explicit variables to the expression interpretation thingymagick... > We *could* safely use ExecInitSubPlanExpr's target resv/resnull, > but that doesn't line up with EEOP_PARAM_SET's definition: > ExecEvalParamSet is hard-wired to store from state->resvalue/resnull. > I thought all along that that was probably too simplistic. > > We could either generalize EEOP_PARAM_SET to include an explicit > specification of the source value's address That's pretty trivial, see attached. I don't quite understand why I didn't go that way immediately... At the very least we need to create a simplified testcase for the bug at hand. > or insert some kind of LOAD operation to copy the computed value into > state->resvalue/resnull. I don't see anything that looks like that today, > though. Hm, wouldn't that have exactly the same issues as we have today anyway? Greetings, Andres Freund
Вложения
Andres Freund <andres@anarazel.de> writes:
> On 2026-01-06 11:40:01 -0500, Tom Lane wrote:
>> We could either generalize EEOP_PARAM_SET to include an explicit
>> specification of the source value's address
> That's pretty trivial, see attached. I don't quite understand why I didn't go
> that way immediately...
Yeah, that looks about right.
> At the very least we need to create a simplified testcase for the bug at hand.
Yup, the sudoku example is fun but it seems inappropriate as a test
case.
>> or insert some kind of LOAD operation to copy the computed value into
>> state->resvalue/resnull. I don't see anything that looks like that today,
>> though.
> Hm, wouldn't that have exactly the same issues as we have today anyway?
Indeed ... -ENOCAFFEINE
regards, tom lane
I wrote:
> Andres Freund <andres@anarazel.de> writes:
>> At the very least we need to create a simplified testcase for the bug at hand.
> Yup, the sudoku example is fun but it seems inappropriate as a test
> case.
This seems simple enough:
regression=# with cte as
(select array[g,2] as a from generate_series(1,3) g)
select a[2], a[(select a[2])] from cte;
server closed the connection unexpectedly
regards, tom lane
Hi, On 2026-01-06 13:35:23 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2026-01-06 11:40:01 -0500, Tom Lane wrote: > >> We could either generalize EEOP_PARAM_SET to include an explicit > >> specification of the source value's address > > > That's pretty trivial, see attached. I don't quite understand why I didn't go > > that way immediately... > > Yeah, that looks about right. Pushed, together with a narrowed down testcase. Thanks for the report! Without a reproducer this would have been much much harder (*). Greetings, Andres Freund * although I do wish we had something like cvise for sql...