BUG #17084: Wrong results of distinct node.

Поиск
Список
Период
Сортировка
От PG Bug reporting form
Тема BUG #17084: Wrong results of distinct node.
Дата
Msg-id 17084-9f83b67ab464cc69@postgresql.org
обсуждение исходный текст
Ответы Re: BUG #17084: Wrong results of distinct node.  (Pantelis Theodosiou <ypercube@gmail.com>)
Re: BUG #17084: Wrong results of distinct node.  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
The following bug has been logged on the website:

Bug reference:      17084
Logged by:          Maxim Boguk
Email address:      maxim.boguk@gmail.com
PostgreSQL version: 13.3
Operating system:   Linux
Description:

During investigation of weird app behaviour I found very strange results
with sorting/dising of some dataset.
Issue was reproduced on 12.7 server and 13.3 my local laptop.
Dataset table dump (25MB file) available per request.

QUERY:
WITH t1 AS (
         SELECT
            ad_id, 
            advertiser_id, 
            campaign_id, 
            campaign_type_id, 
            ad_group_id, 
            target_id, 
            device_type_id, 
            country_code, 
            block_format_type_id, 
            category_group_id,
            array_agg(geo_id) AS geo_ids
          FROM test 
          GROUP BY 
            ad_id, 
            advertiser_id, 
            campaign_id, 
            campaign_type_id, 
            ad_group_id, 
            target_id, 
            device_type_id, 
            country_code, 
            block_format_type_id, 
            category_group_id
),
t2 AS (
SELECT 
DISTINCT 
  ad_group_id, 
  country_code, 
  device_type_id, 
  target_id, 
  block_format_type_id,
  category_group_id,
  geo_ids
FROM  t1
)

SELECT COUNT(*) FROM t2;


Test:
set jit to off;
set max_parallel_workers_per_gather to 0;
set work_mem to 4MB;
 count 
-------
 83921

set work_mem to '64MB';
 count 
-------
 55634

Plan with 4MB work_mem:
 Aggregate  (cost=3258730.52..3258730.53 rows=1 width=8) (actual
time=16607.679..16607.680 rows=1 loops=1)
   ->  HashAggregate  (cost=3232649.49..3257461.37 rows=101532 width=67)
(actual time=16126.315..16604.394 rows=83700 loops=1)
         Group Key: test.ad_group_id, test.country_code,
test.device_type_id, test.target_id, test.block_format_type_id,
test.category_group_id, array_agg(test.geo_id)
         Planned Partitions: 8  Batches: 77  Memory Usage: 4313kB  Disk
Usage: 139784kB
         ->  GroupAggregate  (cost=2771948.04..3089235.54 rows=1015320
width=99) (actual time=12097.536..15675.807 rows=705836 loops=1)
               Group Key: test.ad_id, test.advertiser_id, test.campaign_id,
test.campaign_type_id, test.ad_group_id, test.target_id,
test.device_type_id, test.country_code, test.block_format_type_id,
test.category_group_id
               ->  Sort  (cost=2771948.04..2797331.04 rows=10153200
width=75) (actual time=12097.524..13354.731 rows=10153197 loops=1)
                     Sort Key: test.ad_id, test.advertiser_id,
test.campaign_id, test.campaign_type_id, test.ad_group_id, test.target_id,
test.device_type_id, test.country_code, test.block_format_type_id,
test.category_group_id
                     Sort Method: external merge  Disk: 894184kB
                     ->  Seq Scan on test  (cost=0.00..236908.00
rows=10153200 width=75) (actual time=0.015..779.722 rows=10153197 loops=1)
 Planning Time: 0.344 ms
 Execution Time: 16730.304 ms


                             QUERY PLAN
                                                                     

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2051062.74..2051062.75 rows=1 width=8) (actual
time=8404.880..8404.882 rows=1 loops=1)
   ->  HashAggregate  (cost=2048778.27..2049793.59 rows=101532 width=67)
(actual time=8390.952..8402.777 rows=55634 loops=1)
         Group Key: test.ad_group_id, test.country_code,
test.device_type_id, test.target_id, test.block_format_type_id,
test.category_group_id, array_agg(test.geo_id)
         Batches: 1  Memory Usage: 19473kB
         ->  HashAggregate  (cost=1750369.38..2020856.97 rows=1015320
width=99) (actual time=4478.394..7879.103 rows=705836 loops=1)
               Group Key: test.ad_id, test.advertiser_id, test.campaign_id,
test.campaign_type_id, test.ad_group_id, test.target_id,
test.device_type_id, test.country_code, test.block_format_type_id,
test.category_group_id
               Planned Partitions: 256  Batches: 257  Memory Usage: 6553kB
Disk Usage: 1102856kB
               ->  Seq Scan on test  (cost=0.00..236908.00 rows=10153200
width=75) (actual time=0.020..788.698 rows=10153197 loops=1)
 Planning Time: 0.301 ms
 Execution Time: 8538.236 ms


Correnct answers provided with larger work_mem (55634)

What make situatio even more curious that disable hash_agg doesn't make
issue gone:
set enable_hashagg to 0;
set work_mem to '4MB';
...
 count 
-------
 83700

 Aggregate  (cost=3305553.24..3305553.25 rows=1 width=8) (actual
time=18227.164..18227.167 rows=1 loops=1)
   ->  Unique  (cost=3283977.69..3304284.09 rows=101532 width=67) (actual
time=17595.189..18223.935 rows=83700 loops=1)
         ->  Sort  (cost=3283977.69..3286515.99 rows=1015320 width=67)
(actual time=17595.188..18042.457 rows=705836 loops=1)
               Sort Key: t1.ad_group_id, t1.country_code, t1.device_type_id,
t1.target_id, t1.block_format_type_id, t1.category_group_id, t1.geo_ids
               Sort Method: external merge  Disk: 128680kB
               ->  Subquery Scan on t1  (cost=2771948.04..3099388.74
rows=1015320 width=67) (actual time=11965.394..15750.759 rows=705836
loops=1)
                     ->  GroupAggregate  (cost=2771948.04..3089235.54
rows=1015320 width=99) (actual time=11965.392..15695.102 rows=705836
loops=1)
                           Group Key: test.ad_id, test.advertiser_id,
test.campaign_id, test.campaign_type_id, test.ad_group_id, test.target_id,
test.device_type_id, test.country_code, test.block_format_type_id,
test.category_group_id
                           ->  Sort  (cost=2771948.04..2797331.04
rows=10153200 width=75) (actual time=11965.378..13283.143 rows=10153197
loops=1)
                                 Sort Key: test.ad_id, test.advertiser_id,
test.campaign_id, test.campaign_type_id, test.ad_group_id, test.target_id,
test.device_type_id, test.country_code, test.block_format_type_id,
test.category_group_id
                                 Sort Method: external merge  Disk:
894184kB
                                 ->  Seq Scan on test  (cost=0.00..236908.00
rows=10153200 width=75) (actual time=0.014..770.040 rows=10153197 loops=1)
 Planning Time: 0.315 ms
 Execution Time: 18503.307 ms


--Maxim Boguk


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Adrian Ho
Дата:
Сообщение: Re: BUG #17083: [PATCH] PostgreSQL fails to build with OpenLDAP 2.5.x
Следующее
От: Tom Lane
Дата:
Сообщение: Re: BUG #17083: [PATCH] PostgreSQL fails to build with OpenLDAP 2.5.x