Обсуждение: BUG #17084: Wrong results of distinct node.

Поиск
Список
Период
Сортировка

BUG #17084: Wrong results of distinct node.

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      17084
Logged by:          Maxim Boguk
Email address:      maxim.boguk@gmail.com
PostgreSQL version: 13.3
Operating system:   Linux
Description:

During investigation of weird app behaviour I found very strange results
with sorting/dising of some dataset.
Issue was reproduced on 12.7 server and 13.3 my local laptop.
Dataset table dump (25MB file) available per request.

QUERY:
WITH t1 AS (
         SELECT
            ad_id, 
            advertiser_id, 
            campaign_id, 
            campaign_type_id, 
            ad_group_id, 
            target_id, 
            device_type_id, 
            country_code, 
            block_format_type_id, 
            category_group_id,
            array_agg(geo_id) AS geo_ids
          FROM test 
          GROUP BY 
            ad_id, 
            advertiser_id, 
            campaign_id, 
            campaign_type_id, 
            ad_group_id, 
            target_id, 
            device_type_id, 
            country_code, 
            block_format_type_id, 
            category_group_id
),
t2 AS (
SELECT 
DISTINCT 
  ad_group_id, 
  country_code, 
  device_type_id, 
  target_id, 
  block_format_type_id,
  category_group_id,
  geo_ids
FROM  t1
)

SELECT COUNT(*) FROM t2;


Test:
set jit to off;
set max_parallel_workers_per_gather to 0;
set work_mem to 4MB;
 count 
-------
 83921

set work_mem to '64MB';
 count 
-------
 55634

Plan with 4MB work_mem:
 Aggregate  (cost=3258730.52..3258730.53 rows=1 width=8) (actual
time=16607.679..16607.680 rows=1 loops=1)
   ->  HashAggregate  (cost=3232649.49..3257461.37 rows=101532 width=67)
(actual time=16126.315..16604.394 rows=83700 loops=1)
         Group Key: test.ad_group_id, test.country_code,
test.device_type_id, test.target_id, test.block_format_type_id,
test.category_group_id, array_agg(test.geo_id)
         Planned Partitions: 8  Batches: 77  Memory Usage: 4313kB  Disk
Usage: 139784kB
         ->  GroupAggregate  (cost=2771948.04..3089235.54 rows=1015320
width=99) (actual time=12097.536..15675.807 rows=705836 loops=1)
               Group Key: test.ad_id, test.advertiser_id, test.campaign_id,
test.campaign_type_id, test.ad_group_id, test.target_id,
test.device_type_id, test.country_code, test.block_format_type_id,
test.category_group_id
               ->  Sort  (cost=2771948.04..2797331.04 rows=10153200
width=75) (actual time=12097.524..13354.731 rows=10153197 loops=1)
                     Sort Key: test.ad_id, test.advertiser_id,
test.campaign_id, test.campaign_type_id, test.ad_group_id, test.target_id,
test.device_type_id, test.country_code, test.block_format_type_id,
test.category_group_id
                     Sort Method: external merge  Disk: 894184kB
                     ->  Seq Scan on test  (cost=0.00..236908.00
rows=10153200 width=75) (actual time=0.015..779.722 rows=10153197 loops=1)
 Planning Time: 0.344 ms
 Execution Time: 16730.304 ms


                             QUERY PLAN
                                                                     

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2051062.74..2051062.75 rows=1 width=8) (actual
time=8404.880..8404.882 rows=1 loops=1)
   ->  HashAggregate  (cost=2048778.27..2049793.59 rows=101532 width=67)
(actual time=8390.952..8402.777 rows=55634 loops=1)
         Group Key: test.ad_group_id, test.country_code,
test.device_type_id, test.target_id, test.block_format_type_id,
test.category_group_id, array_agg(test.geo_id)
         Batches: 1  Memory Usage: 19473kB
         ->  HashAggregate  (cost=1750369.38..2020856.97 rows=1015320
width=99) (actual time=4478.394..7879.103 rows=705836 loops=1)
               Group Key: test.ad_id, test.advertiser_id, test.campaign_id,
test.campaign_type_id, test.ad_group_id, test.target_id,
test.device_type_id, test.country_code, test.block_format_type_id,
test.category_group_id
               Planned Partitions: 256  Batches: 257  Memory Usage: 6553kB
Disk Usage: 1102856kB
               ->  Seq Scan on test  (cost=0.00..236908.00 rows=10153200
width=75) (actual time=0.020..788.698 rows=10153197 loops=1)
 Planning Time: 0.301 ms
 Execution Time: 8538.236 ms


Correnct answers provided with larger work_mem (55634)

What make situatio even more curious that disable hash_agg doesn't make
issue gone:
set enable_hashagg to 0;
set work_mem to '4MB';
...
 count 
-------
 83700

 Aggregate  (cost=3305553.24..3305553.25 rows=1 width=8) (actual
time=18227.164..18227.167 rows=1 loops=1)
   ->  Unique  (cost=3283977.69..3304284.09 rows=101532 width=67) (actual
time=17595.189..18223.935 rows=83700 loops=1)
         ->  Sort  (cost=3283977.69..3286515.99 rows=1015320 width=67)
(actual time=17595.188..18042.457 rows=705836 loops=1)
               Sort Key: t1.ad_group_id, t1.country_code, t1.device_type_id,
t1.target_id, t1.block_format_type_id, t1.category_group_id, t1.geo_ids
               Sort Method: external merge  Disk: 128680kB
               ->  Subquery Scan on t1  (cost=2771948.04..3099388.74
rows=1015320 width=67) (actual time=11965.394..15750.759 rows=705836
loops=1)
                     ->  GroupAggregate  (cost=2771948.04..3089235.54
rows=1015320 width=99) (actual time=11965.392..15695.102 rows=705836
loops=1)
                           Group Key: test.ad_id, test.advertiser_id,
test.campaign_id, test.campaign_type_id, test.ad_group_id, test.target_id,
test.device_type_id, test.country_code, test.block_format_type_id,
test.category_group_id
                           ->  Sort  (cost=2771948.04..2797331.04
rows=10153200 width=75) (actual time=11965.378..13283.143 rows=10153197
loops=1)
                                 Sort Key: test.ad_id, test.advertiser_id,
test.campaign_id, test.campaign_type_id, test.ad_group_id, test.target_id,
test.device_type_id, test.country_code, test.block_format_type_id,
test.category_group_id
                                 Sort Method: external merge  Disk:
894184kB
                                 ->  Seq Scan on test  (cost=0.00..236908.00
rows=10153200 width=75) (actual time=0.014..770.040 rows=10153197 loops=1)
 Planning Time: 0.315 ms
 Execution Time: 18503.307 ms


--Maxim Boguk


Re: BUG #17084: Wrong results of distinct node.

От
Pantelis Theodosiou
Дата:


On Tue, Jul 6, 2021 at 4:09 PM PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:

Bug reference:      17084
Logged by:          Maxim Boguk
Email address:      maxim.boguk@gmail.com
PostgreSQL version: 13.3
Operating system:   Linux
Description:       

During investigation of weird app behaviour I found very strange results
with sorting/dising of some dataset.
Issue was reproduced on 12.7 server and 13.3 my local laptop.
Dataset table dump (25MB file) available per request.

QUERY:
WITH t1 AS (
         SELECT
            ad_id,
            advertiser_id,
            campaign_id,
            campaign_type_id,
            ad_group_id,
            target_id,
            device_type_id,
            country_code,
            block_format_type_id,
            category_group_id,
            array_agg(geo_id) AS geo_ids
          FROM test
          GROUP BY
            ad_id,
            advertiser_id,
            campaign_id,
            campaign_type_id,
            ad_group_id,
            target_id,
            device_type_id,
            country_code,
            block_format_type_id,
            category_group_id
),
t2 AS (
SELECT
DISTINCT
  ad_group_id,
  country_code,
  device_type_id,
  target_id,
  block_format_type_id,
  category_group_id,
  geo_ids
FROM  t1
)

SELECT COUNT(*) FROM t2;


Test:
set jit to off;
set max_parallel_workers_per_gather to 0;
set work_mem to 4MB;
 count
-------
 83921

set work_mem to '64MB';
 count
-------
 55634

...


Correnct answers provided with larger work_mem (55634)

What make situatio even more curious that disable hash_agg doesn't make
issue gone:
set enable_hashagg to 0;
set work_mem to '4MB';
...
 count
-------
 83700

...

Do you get different results if you provide an ORDER BY in the aggregate array_agg()  function?
Without an ORDER BY there, I don't think there is any guarantee for deterministic results.

Best regards,

Pantelis Theodosiou

Re: BUG #17084: Wrong results of distinct node.

От
Tom Lane
Дата:
PG Bug reporting form <noreply@postgresql.org> writes:
> During investigation of weird app behaviour I found very strange results
> with sorting/dising of some dataset.

I am not sure there is a PG bug here.  Your query is doing DISTINCT
on (among other things)

>             array_agg(geo_id) AS geo_ids

which is going to have results that vary depending on the unspecified
order in which the first-level aggregation is done.  So you might
by luck have more or fewer distinct array values with different
plans.

If you still get unstable results after changing that to, say,

              array_agg(geo_id order by geo_id) AS geo_ids

then I'd agree we need to investigate.

            regards, tom lane