Обсуждение: BUG #18465: Wrong results from SELECT DISTINCT MIN in scalar subquery using HashAggregate
BUG #18465: Wrong results from SELECT DISTINCT MIN in scalar subquery using HashAggregate
От
PG Bug reporting form
Дата:
The following bug has been logged on the website: Bug reference: 18465 Logged by: Hal Takahara Email address: mtakahar@gmail.com PostgreSQL version: 15.7 Operating system: macOS 13.6 Description: * The query in the example below returns wrong results when HashAggregate is used for eliminating the duplicates for DISTINCT. postgres=# select version(); version ------------------------------------------------------------------------------------------------------------------------------ PostgreSQL 15.7 (Homebrew) on x86_64-apple-darwin22.6.0, compiled by Apple clang version 15.0.0 (clang-1500.1.0.2.5), 64-bit (1 row) postgres=# CREATE TABLE b (col_int int); CREATE TABLE cc (col_int int); INSERT INTO b values (1); INSERT INTO cc values (null), (1), (2); CREATE TABLE postgres=# CREATE TABLE postgres=# postgres=# INSERT 0 1 postgres=# INSERT 0 3 postgres=# postgres=# \pset null '<null>' Null display is "<null>". postgres=# SET enable_hashagg = ON; SET enable_sort = OFF; SET SET postgres=# SELECT ( SELECT DISTINCT MIN( col_int ) FROM B AS SUBQUERY1_t1 WHERE SUBQUERY1_t1 .col_int = table1 .col_int ), table1 .col_int FROM CC AS table1 ; min | col_int --------+--------- <null> | <null> <null> | 1 <------- *** wrong *** <null> | 2 (3 rows) postgres=# EXPLAIN SELECT ( SELECT DISTINCT MIN( col_int ) FROM B AS SUBQUERY1_t1 WHERE SUBQUERY1_t1 .col_int = table1 .col_int ), table1 .col_int FROM CC AS table1 ; QUERY PLAN ---------------------------------------------------------------------------------------- Seq Scan on cc table1 (cost=0.00..8306.82 rows=2550 width=8) SubPlan 2 -> HashAggregate (cost=3.23..3.24 rows=1 width=4) Group Key: $1 InitPlan 1 (returns $1) -> Limit (cost=0.00..3.22 rows=1 width=4) -> Seq Scan on b subquery1_t1 (cost=0.00..41.88 rows=13 width=4) Filter: ((col_int IS NOT NULL) AND (col_int = table1.col_int)) -> Result (cost=3.22..3.23 rows=1 width=4) (9 rows) postgres=# SET enable_hashagg = OFF; SET enable_sort = ON; SET SET postgres=# SELECT ( SELECT DISTINCT MIN( col_int ) FROM B AS SUBQUERY1_t1 WHERE SUBQUERY1_t1 .col_int = table1 .col_int ), table1 .col_int FROM CC AS table1 ; min | col_int --------+--------- <null> | <null> 1 | 1 <------- *** correct *** <null> | 2 (3 rows) postgres=# EXPLAIN SELECT ( SELECT DISTINCT MIN( col_int ) FROM B AS SUBQUERY1_t1 WHERE SUBQUERY1_t1 .col_int = table1 .col_int ), table1 .col_int FROM CC AS table1 ; QUERY PLAN ---------------------------------------------------------------------------------------- Seq Scan on cc table1 (cost=0.00..8319.57 rows=2550 width=8) SubPlan 2 -> Unique (cost=3.24..3.25 rows=1 width=4) InitPlan 1 (returns $1) -> Limit (cost=0.00..3.22 rows=1 width=4) -> Seq Scan on b subquery1_t1 (cost=0.00..41.88 rows=13 width=4) Filter: ((col_int IS NOT NULL) AND (col_int = table1.col_int)) -> Sort (cost=3.24..3.25 rows=1 width=4) Sort Key: ($1) -> Result (cost=3.22..3.23 rows=1 width=4) (10 rows) * The subquery is returning the first result for all the subsequent tuples: postgres=# SET enable_hashagg = ON; SET enable_sort = OFF; SET SET postgres=# TRUNCATE TABLE cc; TRUNCATE TABLE postgres=# INSERT INTO cc values (1), (2), (null); INSERT 0 3 postgres=# SET enable_hashagg = ON; SET enable_sort = OFF; SET SET postgres=# SELECT ( SELECT DISTINCT MIN( col_int ) FROM B AS SUBQUERY1_t1 WHERE SUBQUERY1_t1 .col_int = table1 .col_int ), table1 .col_int FROM CC AS table1 ; min | col_int -----+--------- 1 | 1 <------- *** correct *** 1 | 2 <------- *** wrong *** 1 | <null> <------- *** wrong *** (3 rows)
> On Tue, May 14, 2024 at 09:14:34PM +0000, PG Bug reporting form wrote: > > postgres=# SELECT ( SELECT DISTINCT MIN( col_int ) FROM B AS > SUBQUERY1_t1 WHERE SUBQUERY1_t1 .col_int = table1 .col_int ), table1 > .col_int FROM CC AS table1 ; > min | col_int > --------+--------- > <null> | <null> > <null> | 1 <------- *** wrong *** > <null> | 2 > (3 rows) > > postgres=# SELECT ( SELECT DISTINCT MIN( col_int ) FROM B AS > SUBQUERY1_t1 WHERE SUBQUERY1_t1 .col_int = table1 .col_int ), table1 > .col_int FROM CC AS table1 ; > min | col_int > -----+--------- > 1 | 1 <------- *** correct *** > 1 | 2 <------- *** wrong *** > 1 | <null> <------- *** wrong *** > (3 rows) Thanks, I could reproduce this in v15 and v16. Interestingly enough it was fixed in d0d44049d1, but looks like due to lack of error reports the fix was applied only on the main branch. I assume this one makes it necessary to back-patch the fix.
Dmitry Dolgov <9erthalion6@gmail.com> writes: > Thanks, I could reproduce this in v15 and v16. Interestingly enough it > was fixed in d0d44049d1, but looks like due to lack of error reports the > fix was applied only on the main branch. I assume this one makes it > necessary to back-patch the fix. Hah. Nothing like fixing a bug that you know is there but you can't demonstrate it ;-). I confirm the problem appears in d0d44049d1^ (and at least as far back as v12), but not in d0d44049d1 or HEAD. So yeah, add test case and back-patch. Many thanks for the test case! regards, tom lane