Обсуждение: BUG #17084: Wrong results of distinct node.
The following bug has been logged on the website: Bug reference: 17084 Logged by: Maxim Boguk Email address: maxim.boguk@gmail.com PostgreSQL version: 13.3 Operating system: Linux Description: During investigation of weird app behaviour I found very strange results with sorting/dising of some dataset. Issue was reproduced on 12.7 server and 13.3 my local laptop. Dataset table dump (25MB file) available per request. QUERY: WITH t1 AS ( SELECT ad_id, advertiser_id, campaign_id, campaign_type_id, ad_group_id, target_id, device_type_id, country_code, block_format_type_id, category_group_id, array_agg(geo_id) AS geo_ids FROM test GROUP BY ad_id, advertiser_id, campaign_id, campaign_type_id, ad_group_id, target_id, device_type_id, country_code, block_format_type_id, category_group_id ), t2 AS ( SELECT DISTINCT ad_group_id, country_code, device_type_id, target_id, block_format_type_id, category_group_id, geo_ids FROM t1 ) SELECT COUNT(*) FROM t2; Test: set jit to off; set max_parallel_workers_per_gather to 0; set work_mem to 4MB; count ------- 83921 set work_mem to '64MB'; count ------- 55634 Plan with 4MB work_mem: Aggregate (cost=3258730.52..3258730.53 rows=1 width=8) (actual time=16607.679..16607.680 rows=1 loops=1) -> HashAggregate (cost=3232649.49..3257461.37 rows=101532 width=67) (actual time=16126.315..16604.394 rows=83700 loops=1) Group Key: test.ad_group_id, test.country_code, test.device_type_id, test.target_id, test.block_format_type_id, test.category_group_id, array_agg(test.geo_id) Planned Partitions: 8 Batches: 77 Memory Usage: 4313kB Disk Usage: 139784kB -> GroupAggregate (cost=2771948.04..3089235.54 rows=1015320 width=99) (actual time=12097.536..15675.807 rows=705836 loops=1) Group Key: test.ad_id, test.advertiser_id, test.campaign_id, test.campaign_type_id, test.ad_group_id, test.target_id, test.device_type_id, test.country_code, test.block_format_type_id, test.category_group_id -> Sort (cost=2771948.04..2797331.04 rows=10153200 width=75) (actual time=12097.524..13354.731 rows=10153197 loops=1) Sort Key: test.ad_id, test.advertiser_id, test.campaign_id, test.campaign_type_id, test.ad_group_id, test.target_id, test.device_type_id, test.country_code, test.block_format_type_id, test.category_group_id Sort Method: external merge Disk: 894184kB -> Seq Scan on test (cost=0.00..236908.00 rows=10153200 width=75) (actual time=0.015..779.722 rows=10153197 loops=1) Planning Time: 0.344 ms Execution Time: 16730.304 ms QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=2051062.74..2051062.75 rows=1 width=8) (actual time=8404.880..8404.882 rows=1 loops=1) -> HashAggregate (cost=2048778.27..2049793.59 rows=101532 width=67) (actual time=8390.952..8402.777 rows=55634 loops=1) Group Key: test.ad_group_id, test.country_code, test.device_type_id, test.target_id, test.block_format_type_id, test.category_group_id, array_agg(test.geo_id) Batches: 1 Memory Usage: 19473kB -> HashAggregate (cost=1750369.38..2020856.97 rows=1015320 width=99) (actual time=4478.394..7879.103 rows=705836 loops=1) Group Key: test.ad_id, test.advertiser_id, test.campaign_id, test.campaign_type_id, test.ad_group_id, test.target_id, test.device_type_id, test.country_code, test.block_format_type_id, test.category_group_id Planned Partitions: 256 Batches: 257 Memory Usage: 6553kB Disk Usage: 1102856kB -> Seq Scan on test (cost=0.00..236908.00 rows=10153200 width=75) (actual time=0.020..788.698 rows=10153197 loops=1) Planning Time: 0.301 ms Execution Time: 8538.236 ms Correnct answers provided with larger work_mem (55634) What make situatio even more curious that disable hash_agg doesn't make issue gone: set enable_hashagg to 0; set work_mem to '4MB'; ... count ------- 83700 Aggregate (cost=3305553.24..3305553.25 rows=1 width=8) (actual time=18227.164..18227.167 rows=1 loops=1) -> Unique (cost=3283977.69..3304284.09 rows=101532 width=67) (actual time=17595.189..18223.935 rows=83700 loops=1) -> Sort (cost=3283977.69..3286515.99 rows=1015320 width=67) (actual time=17595.188..18042.457 rows=705836 loops=1) Sort Key: t1.ad_group_id, t1.country_code, t1.device_type_id, t1.target_id, t1.block_format_type_id, t1.category_group_id, t1.geo_ids Sort Method: external merge Disk: 128680kB -> Subquery Scan on t1 (cost=2771948.04..3099388.74 rows=1015320 width=67) (actual time=11965.394..15750.759 rows=705836 loops=1) -> GroupAggregate (cost=2771948.04..3089235.54 rows=1015320 width=99) (actual time=11965.392..15695.102 rows=705836 loops=1) Group Key: test.ad_id, test.advertiser_id, test.campaign_id, test.campaign_type_id, test.ad_group_id, test.target_id, test.device_type_id, test.country_code, test.block_format_type_id, test.category_group_id -> Sort (cost=2771948.04..2797331.04 rows=10153200 width=75) (actual time=11965.378..13283.143 rows=10153197 loops=1) Sort Key: test.ad_id, test.advertiser_id, test.campaign_id, test.campaign_type_id, test.ad_group_id, test.target_id, test.device_type_id, test.country_code, test.block_format_type_id, test.category_group_id Sort Method: external merge Disk: 894184kB -> Seq Scan on test (cost=0.00..236908.00 rows=10153200 width=75) (actual time=0.014..770.040 rows=10153197 loops=1) Planning Time: 0.315 ms Execution Time: 18503.307 ms --Maxim Boguk
On Tue, Jul 6, 2021 at 4:09 PM PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:
Bug reference: 17084
Logged by: Maxim Boguk
Email address: maxim.boguk@gmail.com
PostgreSQL version: 13.3
Operating system: Linux
Description:
During investigation of weird app behaviour I found very strange results
with sorting/dising of some dataset.
Issue was reproduced on 12.7 server and 13.3 my local laptop.
Dataset table dump (25MB file) available per request.
QUERY:
WITH t1 AS (
SELECT
ad_id,
advertiser_id,
campaign_id,
campaign_type_id,
ad_group_id,
target_id,
device_type_id,
country_code,
block_format_type_id,
category_group_id,
array_agg(geo_id) AS geo_ids
FROM test
GROUP BY
ad_id,
advertiser_id,
campaign_id,
campaign_type_id,
ad_group_id,
target_id,
device_type_id,
country_code,
block_format_type_id,
category_group_id
),
t2 AS (
SELECT
DISTINCT
ad_group_id,
country_code,
device_type_id,
target_id,
block_format_type_id,
category_group_id,
geo_ids
FROM t1
)
SELECT COUNT(*) FROM t2;
Test:
set jit to off;
set max_parallel_workers_per_gather to 0;
set work_mem to 4MB;
count
-------
83921
set work_mem to '64MB';
count
-------
55634
...
Correnct answers provided with larger work_mem (55634)
What make situatio even more curious that disable hash_agg doesn't make
issue gone:
set enable_hashagg to 0;
set work_mem to '4MB';
...
count
-------
83700
...
Do you get different results if you provide an ORDER BY in the aggregate array_agg() function?
Without an ORDER BY there, I don't think there is any guarantee for deterministic results.
Best regards,
Pantelis Theodosiou
PG Bug reporting form <noreply@postgresql.org> writes: > During investigation of weird app behaviour I found very strange results > with sorting/dising of some dataset. I am not sure there is a PG bug here. Your query is doing DISTINCT on (among other things) > array_agg(geo_id) AS geo_ids which is going to have results that vary depending on the unspecified order in which the first-level aggregation is done. So you might by luck have more or fewer distinct array values with different plans. If you still get unstable results after changing that to, say, array_agg(geo_id order by geo_id) AS geo_ids then I'd agree we need to investigate. regards, tom lane