Обсуждение: BUG #16627: union all with partioned table yields random aggregate results

Поиск

Список

Период

Сортировка

BUG #16627: union all with partioned table yields random aggregate results

От

PG Bug reporting form

Дата:

22 сентября 2020 г., 18:10:51

The following bug has been logged on the website:

Bug reference:      16627
Logged by:          Brian Kanaga
Email address:      kanaga@consumeracquisition.com
PostgreSQL version: 11.4
Operating system:   Linux
Description:

Hoping this is something already discovered as the amount of data to
recreate is cumbersome.

Two tables identical in structure, one is a partioned table.  ex, "active
ledger" and "historical ledger" where the partitions of the historical one
are by year.

When parallel processing is disabled, the issue goes away.  The issue also
prefers to (always?) occur when nestloop is disabled.  So my workaround is
to disable parallel (max_parallel_workers_per_gather=0) when this type of
query is run.

From my experimentation it seems to be related to having multiple workers
gathering data and to some extent their gathering overlaps.  again, just in
a partitioned table.  if i change the subquery to union all with the
partition itself (ex fb_ad_activity_daily_archive_2019) then the problem
does not occur.

As an aside, I have had to disable nestloop to get around what seems to be
planner inaccuracy and have not found adjusting statistics targets helpful.
 However with nestloop disabled (and possibly without), the union all with
partitioned table causes random data.

Example of query structure and results:

select

    obj.none, obj.mobileos   ,

    sum(spend) as spend,

    sum(conversions) as conversions

                        

from (

    select fa.id as adset , case when fa.targeting_ios is not null and
fa.targeting_android is not null then 'iOS and Android' when
fa.targeting_ios is not null and fa.targeting_android is null then 'iOS'
when fa.targeting_ios is null and fa.targeting_android is not null then
'Android' when fa.targeting_ios is not null and fa.targeting_android is not
null then 'Any' end as mobileos , 1 as none
    
    from fb_campaigns fc               
    left join fb_adsets fa on fc.id=fa.campaign and fa.account in
(128091690677840,205971526867688,183647115790295)

    where 
    -- see if using account in join is ok
    (fc.account is null or fc.account in
(128091690677840,205971526867688,183647115790295)) and fa.account in
(128091690677840,205971526867688,183647115790295)

    

    -- master_rel_wheres_end

    group by fa.id , case when fa.targeting_ios is not null and
fa.targeting_android is not null then 'iOS and Android' when
fa.targeting_ios is not null and fa.targeting_android is null then 'iOS'
when fa.targeting_ios is null and fa.targeting_android is not null then
'Android' when fa.targeting_ios is not null and fa.targeting_android is not
null then 'Any' end , 1

) obj full outer join (

            select a.campaign_id as adset   ,
                
            -- these columns will be missing in some tbls
            avg(cast(relevance_score as float4)) as relevance_score,
            avg(cast(positive_feedback as float4)) as positive_feedback,
            avg(cast(negative_feedback as float4)) as negative_feedback,

            sum(spend) as spend,
            sum(case when 1=0 then 0 else
coalesce(unique_actions_1d_view_app_custom_event_fb_mobile_purchase,0)
end+case when 28=0 then 0 else
coalesce(unique_actions_28d_click_app_custom_event_fb_mobile_purchase,0)
end) as mobile_purchases
            ,
            sum(case when 1=0 then 0 else
coalesce(unique_actions_1d_view_mobile_app_install,0) end+case when 28=0
then 0 else coalesce(unique_actions_28d_click_mobile_app_install,0) end) as
mobile_app_install
            ,
            sum(case when 1=0 then 0 else
coalesce(action_values_1d_view_app_custom_event_fb_mobile_purchase,0)
end+case when 28=0 then 0 else
coalesce(action_values_28d_click_app_custom_event_fb_mobile_purchase,0) end)
as action_value_app_custom_event_fb_mobile_purchase
            ,
            sum(case when 1=0 then 0 else
coalesce(unique_actions_1d_view_link_click,0) end+case when 28=0 then 0 else
coalesce(unique_actions_28d_click_link_click,0) end) as link_clicks
            ,sum(event_1) as event_1,sum(event_2) as event_2,sum(event_3) as
event_3,sum(event_4) as event_4,sum(impressions) as impressions,
            sum(case when 1=0 then 0 else
coalesce(actions_1d_view_app_custom_event_fb_mobile_purchase,0) end+case
when 28=0 then 0 else
coalesce(actions_28d_click_app_custom_event_fb_mobile_purchase,0) end) as
nu_mobile_purchases
            ,sum(event_6) as event_6,sum(event_8) as event_8,
            sum(case when 1=0 then 0 else
coalesce(actions_1d_view_mobile_app_install,0) end+case when 28=0 then 0
else coalesce(actions_28d_click_mobile_app_install,0) end) as
nu_mobile_app_install
            ,
            sum(case when c.buying_type = 10 then actions_1d_view_mobile_app_install
end) as conversions --< - this
            ,
            sum(case when 1=0 then 0 else coalesce(actions_1d_view_link_click,0)
end+case when 28=0 then 0 else coalesce(actions_28d_click_link_click,0) end)
as nu_link_clicks


            from 
            (
                    select * from fb_ad_activity_daily where logdate between '11/01/2019'
and '11/17/2019' and account_id in
(128091690677840,205971526867688,183647115790295)
                    union all 
                    select * from fb_ad_activity_daily_archive where logdate between
'11/01/2019' and '11/17/2019' and account_id in
(128091690677840,205971526867688,183647115790295) --< plus this
            )
             a
            inner join fb_campaigns c on a.campaign_group_id=c.id
            
            
            left join ca_ud_conversions ud on a.logdate=ud.logdate and
a.account_id=ud.account and a.adgroup_id=ud.adgroup

            where a.logdate between '11/01/2019' and '11/17/2019' and a.account_id in
(128091690677840,205971526867688,183647115790295) 

            group by a.campaign_id


) stats on obj.adset = stats.adset

group by obj.none, obj.mobileos  


The outputs of the sums will be random; results will vary.  sometimes they
will be correct.

correct results:
1    Android    66050.68    
1    iOS    27624.41    
1            

examples of incorrect random results
1    Android    111098.92    
1    iOS    34314.69    
1            
1    Android    116473.33    
1    iOS    37640.63    
1            
1    Android    68350.25    
1    iOS    27624.41    
1            





-- issue does occur
set enable_nestloop TO 0;
GroupAggregate  (cost=356113.02..356115.54 rows=1 width=76) (actual
time=1193.334..1194.411 rows=3 loops=1)
  Group Key: (1), (CASE WHEN ((fa.targeting_ios IS NOT NULL) AND
(fa.targeting_android IS NOT NULL)) THEN 'iOS and Android'::text WHEN
((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NULL)) THEN
'iOS'::text WHEN ((fa.targeting_ios IS NULL) AND (fa.targeting_android IS
NOT NULL)) THEN 'Android'::text WHEN ((fa.targeting_ios IS NOT NULL) AND
(fa.targeting_android IS NOT NULL)) THEN 'Any'::text ELSE NULL::text END)
  ->  Sort  (cost=356113.02..356113.52 rows=200 width=52) (actual
time=1192.245..1193.084 rows=4280 loops=1)
        Sort Key: (1), (CASE WHEN ((fa.targeting_ios IS NOT NULL) AND
(fa.targeting_android IS NOT NULL)) THEN 'iOS and Android'::text WHEN
((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NULL)) THEN
'iOS'::text WHEN ((fa.targeting_ios IS NULL) AND (fa.targeting_android IS
NOT NULL)) THEN 'Android'::text WHEN ((fa.targeting_ios IS NOT NULL) AND
(fa.targeting_android IS NOT NULL)) THEN 'Any'::text ELSE NULL::text END)
        Sort Method: quicksort  Memory: 395kB
        ->  Merge Full Join  (cost=356063.98..356105.38 rows=200 width=52)
(actual time=1183.640..1190.639 rows=4280 loops=1)
              Merge Cond: (fa.id = "*SELECT* 1".campaign_id)
              ->  Group  (cost=179905.47..179905.84 rows=1 width=44) (actual
time=85.440..90.013 rows=4280 loops=1)
                    Group Key: fa.id, (CASE WHEN ((fa.targeting_ios IS NOT
NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'iOS and Android'::text
WHEN ((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NULL))
THEN 'iOS'::text WHEN ((fa.targeting_ios IS NULL) AND (fa.targeting_android
IS NOT NULL)) THEN 'Android'::text WHEN ((fa.targeting_ios IS NOT NULL) AND
(fa.targeting_android IS NOT NULL)) THEN 'Any'::text ELSE NULL::text END)
                    ->  Gather Merge  (cost=179905.47..179905.83 rows=3
width=40) (actual time=85.438..87.984 rows=4280 loops=1)
                          Workers Planned: 3
                          Workers Launched: 3
                          ->  Group  (cost=178905.43..178905.44 rows=1
width=40) (actual time=79.332..80.143 rows=1070 loops=4)
                                Group Key: fa.id, (CASE WHEN
((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NOT NULL)) THEN
'iOS and Android'::text WHEN ((fa.targeting_ios IS NOT NULL) AND
(fa.targeting_android IS NULL)) THEN 'iOS'::text WHEN ((fa.targeting_ios IS
NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'Android'::text WHEN
((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NOT NULL)) THEN
'Any'::text ELSE NULL::text END)
                                ->  Sort  (cost=178905.43..178905.43 rows=1
width=40) (actual time=79.329..79.567 rows=1070 loops=4)
                                      Sort Key: fa.id, (CASE WHEN
((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NOT NULL)) THEN
'iOS and Android'::text WHEN ((fa.targeting_ios IS NOT NULL) AND
(fa.targeting_android IS NULL)) THEN 'iOS'::text WHEN ((fa.targeting_ios IS
NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'Android'::text WHEN
((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NOT NULL)) THEN
'Any'::text ELSE NULL::text END)
                                      Sort Method: quicksort  Memory: 97kB
                                      Worker 0:  Sort Method: quicksort
Memory: 72kB
                                      Worker 1:  Sort Method: quicksort
Memory: 99kB
                                      Worker 2:  Sort Method: quicksort
Memory: 103kB
                                      ->  Parallel Hash Join
(cost=167790.90..178905.42 rows=1 width=40) (actual time=70.443..78.798
rows=1070 loops=4)
                                            Hash Cond: (fc.id =
fa.campaign)
                                            ->  Parallel Seq Scan on
fb_campaigns fc  (cost=0.00..11113.68 rows=74 width=8) (actual
time=0.090..7.951 rows=174 loops=4)
                                                  Filter: ((account IS NULL)
OR (account = ANY
('{128091690677840,205971526867688,183647115790295}'::bigint[])))
                                                  Rows Removed by Filter:
49408
                                            ->  Parallel Hash
(cost=167790.81..167790.81 rows=7 width=64) (actual time=70.253..70.253
rows=1070 loops=4)
                                                  Buckets: 4096 (originally
1024)  Batches: 1 (originally 1)  Memory Usage: 440kB
                                                  ->  Parallel Seq Scan on
fb_adsets fa  (cost=0.00..167790.81 rows=7 width=64) (actual
time=0.988..55.405 rows=1070 loops=4)
                                                        Filter: ((account =
ANY ('{128091690677840,205971526867688,183647115790295}'::bigint[])) AND
(account = ANY
('{128091690677840,205971526867688,183647115790295}'::bigint[])))
                                                        Rows Removed by
Filter: 198077
              ->  Finalize GroupAggregate  (cost=176158.52..176197.01
rows=200 width=160) (actual time=1098.196..1098.694 rows=64 loops=1)
                    Group Key: "*SELECT* 1".campaign_id
                    ->  Gather Merge  (cost=176158.52..176192.97 rows=272
width=24) (actual time=1098.160..1141.270 rows=181 loops=1)
                          Workers Planned: 2
                          Workers Launched: 2
                          ->  Partial GroupAggregate
(cost=175158.49..175161.55 rows=136 width=24) (actual
time=1088.396..1088.791 rows=60 loops=3)
                                Group Key: "*SELECT* 1".campaign_id
                                ->  Sort  (cost=175158.49..175158.83
rows=136 width=22) (actual time=1088.377..1088.547 rows=675 loops=3)
                                      Sort Key: "*SELECT* 1".campaign_id
                                      Sort Method: quicksort  Memory: 73kB
                                      Worker 0:  Sort Method: quicksort
Memory: 80kB
                                      Worker 1:  Sort Method: quicksort
Memory: 79kB
                                      ->  Parallel Hash Left Join
(cost=90647.99..175153.67 rows=136 width=22) (actual time=963.744..1088.079
rows=675 loops=3)
                                            Hash Cond: (("*SELECT*
1".logdate = ud.logdate) AND ("*SELECT* 1".account_id = ud.account) AND
("*SELECT* 1".adgroup_id = ud.adgroup))
                                            ->  Parallel Hash Join
(cost=11673.89..91194.54 rows=136 width=42) (actual time=64.276..76.694
rows=675 loops=3)
                                                  Hash Cond: ("*SELECT*
1".campaign_group_id = c.id)
                                                  ->  Parallel Append
(cost=0.42..79520.71 rows=140 width=48) (actual time=6.548..18.083 rows=675
loops=3)
                                                        ->  Subquery Scan on
"*SELECT* 1"  (cost=0.42..2.67 rows=1 width=48) (actual time=0.016..0.016
rows=0 loops=1)
                                                              ->  Index Scan
using fb_ad_activity_daily2_pkey on fb_ad_activity_daily  (cost=0.42..2.66
rows=1 width=1618) (actual time=0.015..0.015 rows=0 loops=1)
                                                                    Index
Cond: ((logdate >= '2019-11-01'::date) AND (logdate <= '2019-11-17'::date)
AND (logdate >= '2019-11-01'::date) AND (logdate <= '2019-11-17'::date))
                                                                    Filter:
((account_id = ANY
('{128091690677840,205971526867688,183647115790295}'::bigint[])) AND
(account_id = ANY
('{128091690677840,205971526867688,183647115790295}'::bigint[])))
                                                        ->  Subquery Scan on
"*SELECT* 2"  (cost=0.56..79516.88 rows=326 width=48) (actual
time=6.541..17.792 rows=675 loops=3)
                                                              ->  Parallel
Append  (cost=0.56..79513.62 rows=192 width=1618) (actual time=6.539..17.471
rows=675 loops=3)
                                                                    ->
Parallel Index Scan using fb_ad_activity_daily_archive_2019_pkey on
fb_ad_activity_daily_archive_2019  (cost=0.56..79512.66 rows=192 width=1618)
(actual time=6.538..17.169 rows=675 loops=3)

Index Cond: ((logdate >= '2019-11-01'::date) AND (logdate <=
'2019-11-17'::date) AND (logdate >= '2019-11-01'::date) AND (logdate <=
'2019-11-17'::date) AND (account_id = ANY
('{128091690677840,205971526867688,183647115790295}'::bigint[])) AND
(account_id = ANY
('{128091690677840,205971526867688,183647115790295}'::bigint[])))
                                                  ->  Parallel Hash
(cost=10873.76..10873.76 rows=63976 width=10) (actual time=57.291..57.292
rows=66109 loops=3)
                                                        Buckets: 262144
Batches: 1  Memory Usage: 11424kB
                                                        ->  Parallel Seq
Scan on fb_campaigns c  (cost=0.00..10873.76 rows=63976 width=10) (actual
time=0.008..28.978 rows=66109 loops=3)
                                            ->  Parallel Hash
(cost=60953.22..60953.22 rows=771422 width=20) (actual time=875.479..875.479
rows=1031137 loops=3)
                                                  Buckets: 131072  Batches:
32  Memory Usage: 6432kB
                                                  ->  Parallel Seq Scan on
ca_ud_conversions ud  (cost=0.00..60953.22 rows=771422 width=20) (actual
time=0.004..405.186 rows=1031137 loops=3)
Planning Time: 3.630 ms
Execution Time: 1238.495 ms


-- issue does not occur
SET max_parallel_workers_per_gather TO 0;
set enable_nestloop TO 0;
GroupAggregate  (cost=466729.81..466732.32 rows=1 width=76) (actual
time=2866.611..2867.687 rows=3 loops=1)
  Group Key: (1), (CASE WHEN ((fa.targeting_ios IS NOT NULL) AND
(fa.targeting_android IS NOT NULL)) THEN 'iOS and Android'::text WHEN
((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NULL)) THEN
'iOS'::text WHEN ((fa.targeting_ios IS NULL) AND (fa.targeting_android IS
NOT NULL)) THEN 'Android'::text WHEN ((fa.targeting_ios IS NOT NULL) AND
(fa.targeting_android IS NOT NULL)) THEN 'Any'::text ELSE NULL::text END)
  ->  Sort  (cost=466729.81..466730.31 rows=200 width=52) (actual
time=2865.528..2866.368 rows=4280 loops=1)
        Sort Key: (1), (CASE WHEN ((fa.targeting_ios IS NOT NULL) AND
(fa.targeting_android IS NOT NULL)) THEN 'iOS and Android'::text WHEN
((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NULL)) THEN
'iOS'::text WHEN ((fa.targeting_ios IS NULL) AND (fa.targeting_android IS
NOT NULL)) THEN 'Android'::text WHEN ((fa.targeting_ios IS NOT NULL) AND
(fa.targeting_android IS NOT NULL)) THEN 'Any'::text ELSE NULL::text END)
        Sort Method: quicksort  Memory: 395kB
        ->  Merge Full Join  (cost=466713.55..466722.17 rows=200 width=52)
(actual time=2857.653..2863.873 rows=4280 loops=1)
              Merge Cond: (fa.id = "*SELECT* 1".campaign_id)
              ->  Group  (cost=191854.00..191854.01 rows=1 width=44) (actual
time=246.292..249.519 rows=4280 loops=1)
                    Group Key: fa.id, (CASE WHEN ((fa.targeting_ios IS NOT
NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'iOS and Android'::text
WHEN ((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NULL))
THEN 'iOS'::text WHEN ((fa.targeting_ios IS NULL) AND (fa.targeting_android
IS NOT NULL)) THEN 'Android'::text WHEN ((fa.targeting_ios IS NOT NULL) AND
(fa.targeting_android IS NOT NULL)) THEN 'Any'::text ELSE NULL::text END)
                    ->  Sort  (cost=191854.00..191854.00 rows=1 width=40)
(actual time=246.288..247.277 rows=4280 loops=1)
                          Sort Key: fa.id, (CASE WHEN ((fa.targeting_ios IS
NOT NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'iOS and
Android'::text WHEN ((fa.targeting_ios IS NOT NULL) AND
(fa.targeting_android IS NULL)) THEN 'iOS'::text WHEN ((fa.targeting_ios IS
NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'Android'::text WHEN
((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NOT NULL)) THEN
'Any'::text ELSE NULL::text END)
                          Sort Method: quicksort  Memory: 393kB
                          ->  Hash Join  (cost=12963.85..191853.99 rows=1
width=40) (actual time=31.233..244.131 rows=4280 loops=1)
                                Hash Cond: (fa.campaign = fc.id)
                                ->  Seq Scan on fb_adsets fa
(cost=0.00..178890.05 rows=34 width=64) (actual time=0.109..210.360
rows=4280 loops=1)
                                      Filter: ((account = ANY
('{128091690677840,205971526867688,183647115790295}'::bigint[])) AND
(account = ANY
('{128091690677840,205971526867688,183647115790295}'::bigint[])))
                                      Rows Removed by Filter: 792308
                                ->  Hash  (cost=12961.00..12961.00 rows=228
width=8) (actual time=31.110..31.111 rows=694 loops=1)
                                      Buckets: 1024  Batches: 1  Memory
Usage: 36kB
                                      ->  Seq Scan on fb_campaigns fc
(cost=0.00..12961.00 rows=228 width=8) (actual time=0.013..30.894 rows=694
loops=1)
                                            Filter: ((account IS NULL) OR
(account = ANY
('{128091690677840,205971526867688,183647115790295}'::bigint[])))
                                            Rows Removed by Filter: 197633
              ->  GroupAggregate  (cost=274859.55..274865.64 rows=200
width=160) (actual time=2611.356..2612.484 rows=64 loops=1)
                    Group Key: "*SELECT* 1".campaign_id
                    ->  Sort  (cost=274859.55..274860.37 rows=327 width=22)
(actual time=2611.327..2611.831 rows=2026 loops=1)
                          Sort Key: "*SELECT* 1".campaign_id
                          Sort Method: quicksort  Memory: 207kB
                          ->  Hash Left Join  (cost=171842.20..274845.89
rows=327 width=22) (actual time=2334.961..2610.504 rows=2026 loops=1)
                                Hash Cond: (("*SELECT* 1".logdate =
ud.logdate) AND ("*SELECT* 1".account_id = ud.account) AND ("*SELECT*
1".adgroup_id = ud.adgroup))
                                ->  Hash Join  (cost=15665.78..96165.51
rows=327 width=42) (actual time=156.656..198.283 rows=2026 loops=1)
                                      Hash Cond: ("*SELECT*
1".campaign_group_id = c.id)
                                      ->  Append  (cost=0.42..79524.29
rows=327 width=48) (actual time=13.125..45.025 rows=2026 loops=1)
                                            ->  Subquery Scan on "*SELECT*
1"  (cost=0.42..2.67 rows=1 width=48) (actual time=0.012..0.012 rows=0
loops=1)
                                                  ->  Index Scan using
fb_ad_activity_daily2_pkey on fb_ad_activity_daily  (cost=0.42..2.66 rows=1
width=1618) (actual time=0.011..0.011 rows=0 loops=1)
                                                        Index Cond:
((logdate >= '2019-11-01'::date) AND (logdate <= '2019-11-17'::date) AND
(logdate >= '2019-11-01'::date) AND (logdate <= '2019-11-17'::date))
                                                        Filter: ((account_id
= ANY ('{128091690677840,205971526867688,183647115790295}'::bigint[])) AND
(account_id = ANY
('{128091690677840,205971526867688,183647115790295}'::bigint[])))
                                            ->  Subquery Scan on "*SELECT*
2"  (cost=0.56..79519.99 rows=326 width=48) (actual time=13.111..44.245
rows=2026 loops=1)
                                                  ->  Append
(cost=0.56..79516.73 rows=326 width=1618) (actual time=13.110..43.372
rows=2026 loops=1)
                                                        ->  Index Scan using
fb_ad_activity_daily_archive_2019_pkey on fb_ad_activity_daily_archive_2019
(cost=0.56..79515.10 rows=326 width=1618) (actual time=13.108..42.569
rows=2026 loops=1)
                                                              Index Cond:
((logdate >= '2019-11-01'::date) AND (logdate <= '2019-11-17'::date) AND
(logdate >= '2019-11-01'::date) AND (logdate <= '2019-11-17'::date) AND
(account_id = ANY
('{128091690677840,205971526867688,183647115790295}'::bigint[])) AND
(account_id = ANY
('{128091690677840,205971526867688,183647115790295}'::bigint[])))
                                      ->  Hash  (cost=12217.27..12217.27
rows=198327 width=10) (actual time=143.122..143.122 rows=198327 loops=1)
                                            Buckets: 262144  Batches: 2
Memory Usage: 6693kB
                                            ->  Seq Scan on fb_campaigns c
(cost=0.00..12217.27 rows=198327 width=10) (actual time=0.007..74.000
rows=198327 loops=1)
                                ->  Hash  (cost=84095.88..84095.88
rows=3085688 width=20) (actual time=2177.119..2177.119 rows=3093410
loops=1)
                                      Buckets: 131072  Batches: 32  Memory
Usage: 6348kB
                                      ->  Seq Scan on ca_ud_conversions ud
(cost=0.00..84095.88 rows=3085688 width=20) (actual time=0.005..1005.261
rows=3093410 loops=1)
Planning Time: 3.733 ms
Execution Time: 2868.068 ms

Re: BUG #16627: union all with partioned table yields random aggregate results

От

Tom Lane

Дата:

22 сентября 2020 г., 18:53:37

PG Bug reporting form <noreply@postgresql.org> writes:
> The following bug has been logged on the website:
> Bug reference:      16627
> Logged by:          Brian Kanaga
> Email address:      kanaga@consumeracquisition.com
> PostgreSQL version: 11.4
> Operating system:   Linux
> Description:        

> Hoping this is something already discovered as the amount of data to
> recreate is cumbersome.

Well, 11.4 is more than a year out of date.  Can you still reproduce
this on the current minor release (11.9)?

            regards, tom lane

RE: BUG #16627: union all with partioned table yields random aggregate results

От

Brian Kanaga

Дата:

22 сентября 2020 г., 20:41:25

Hi, confirming it happens in 11.9 and 12.4.

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, September 22, 2020 10:54 AM
To: kanaga@consumeracquisition.com
Cc: pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #16627: union all with partioned table yields random
aggregate results

PG Bug reporting form <noreply@postgresql.org> writes:
> The following bug has been logged on the website:
> Bug reference:      16627
> Logged by:          Brian Kanaga
> Email address:      kanaga@consumeracquisition.com
> PostgreSQL version: 11.4
> Operating system:   Linux
> Description:

> Hoping this is something already discovered as the amount of data to
> recreate is cumbersome.

Well, 11.4 is more than a year out of date.  Can you still reproduce this
on the current minor release (11.9)?

            regards, tom lane

Re: BUG #16627: union all with partioned table yields random aggregate results

От

Tom Lane

Дата:

22 сентября 2020 г., 21:45:51

Brian Kanaga <kanaga@consumeracquisition.com> writes:
> Hi, confirming it happens in 11.9 and 12.4.

Ugh.  I don't think it's any known problem then.  We'll have to
ask you to make a reproducer.  Whatever it is probably isn't very
data-dependent, so hopefully you can just create a script that
fills some tables with generated/random data.

            regards, tom lane

Re: BUG #16627: union all with partioned table yields random aggregate results

От

David Rowley

Дата:

23 сентября 2020 г., 06:10:08

On Wed, 23 Sep 2020 at 03:30, PG Bug reporting form
<noreply@postgresql.org> wrote:
>
>                         -- these columns will be missing in some tbls
>                         avg(cast(relevance_score as float4)) as relevance_score,
>                         avg(cast(positive_feedback as float4)) as positive_feedback,
>                         avg(cast(negative_feedback as float4)) as negative_feedback,
>

In addition to Tom's request, it would also be good to understand more
about these "random" results.

I see you have a number of aggregate functions run on the results. It
would be good to understand if all of them are incorrect in the
parallel version of the query or if it's just some of them, e.g just
the float4 ones I've quoted above.  Essentially there is some sense of
randomness to the results of floating-point aggregation in all cases
due to the lossy precision of floating points. This is much more
likely to be seen during parallel aggregation as how each tuple is
aggregated is highly likely to differ for each run of the query. That
order is much more likely to be stable in the serial query.

If you are just seeing the floating-point aggregates vary then it
might be worth casting floating-point values to NUMERIC to see if that
gets rid of the randomness.

If that's not the issue then you may also like to help try to narrow
the scope of the problem. Does the problem still occur when you
disable enable_parallel_hash? How about enable_parallel_append?

There was also some mangling of the query plans you posted. I took the
time to format those again and attached the result.

Looking over the row counts between each version of the plan I didn't
find anything that would indicate additional rows were seen in the
parallel version of the plan. However, there's some precision loss
which I couldn't account for around the actual rows being divided by
the parallel workers with integer division, so that process is not
100% accurate.

David

Вложения

explain.txt

RE: BUG #16627: union all with partioned table yields random aggregate results

От

Brian Kanaga

Дата:

23 сентября 2020 г., 16:47:31

Thank you David for your input here!

Attached is a much-simplified version of the problem query along with screen
shots of plans and what tweaks to the query produce changes to the plan.

Both your hints fix it.  Also if I force the partitioned table subquery to
be evaluated differently by placing an "order by" on it, that fixes it.

I have tried to recreate this for you in a dump file but I could not get it
to happen without including gobs of data.  Even tweaking the plan I could
not get the filter part evaluating to match the offending plan.

Hopefully these files shed some light.

Oh - and float vs more precision wasn't it.   The results variance is
enormous:
correct results
----------------------------
27839.83example runs with the issue (each row here is a different run)
----------------------------
27839.8352267.54
63620.24
30139.4
27839.83 <- randomly the right answer will be returned
30139.4
46044.1
30139.4
34350.93
39673.62
32239.73
27839.83 <- again
43351.18

-----Original Message-----
From: David Rowley [mailto:dgrowleyml@gmail.com]
Sent: Tuesday, September 22, 2020 10:10 PM
To: kanaga@consumeracquisition.com; PostgreSQL mailing lists
Subject: Re: BUG #16627: union all with partioned table yields random
aggregate results

On Wed, 23 Sep 2020 at 03:30, PG Bug reporting form <noreply@postgresql.org>
wrote:
>
>                         -- these columns will be missing in some tbls
>                         avg(cast(relevance_score as float4)) as
> relevance_score,
>                         avg(cast(positive_feedback as float4)) as
> positive_feedback,
>                         avg(cast(negative_feedback as float4)) as
> negative_feedback,
>

In addition to Tom's request, it would also be good to understand more about
these "random" results.

I see you have a number of aggregate functions run on the results. It would
be good to understand if all of them are incorrect in the parallel version
of the query or if it's just some of them, e.g just the float4 ones I've
quoted above.  Essentially there is some sense of randomness to the results
of floating-point aggregation in all cases due to the lossy precision of
floating points. This is much more likely to be seen during parallel
aggregation as how each tuple is aggregated is highly likely to differ for
each run of the query. That order is much more likely to be stable in the
serial query.

If you are just seeing the floating-point aggregates vary then it might be
worth casting floating-point values to NUMERIC to see if that gets rid of
the randomness.

If that's not the issue then you may also like to help try to narrow the
scope of the problem. Does the problem still occur when you disable
enable_parallel_hash? How about enable_parallel_append?

There was also some mangling of the query plans you posted. I took the time
to format those again and attached the result.

Looking over the row counts between each version of the plan I didn't find
anything that would indicate additional rows were seen in the parallel
version of the plan. However, there's some precision loss which I couldn't
account for around the actual rows being divided by the parallel workers
with integer division, so that process is not 100% accurate.

David

Вложения

issue-parallel-union.zip

RE: BUG #16627: union all with partioned table yields random aggregate results

От

Brian Kanaga

Дата:

23 сентября 2020 г., 16:53:26

If I may add (referencing the files I just sent): the parallel append and
parallel index scan is unique to the issue query.

-----Original Message-----
From: Brian Kanaga [mailto:kanaga@consumeracquisition.com]
Sent: Wednesday, September 23, 2020 8:48 AM
To: 'David Rowley'; 'PostgreSQL mailing lists'
Subject: RE: BUG #16627: union all with partioned table yields random
aggregate results

Thank you David for your input here!

Attached is a much-simplified version of the problem query along with screen
shots of plans and what tweaks to the query produce changes to the plan.

Both your hints fix it.  Also if I force the partitioned table subquery to
be evaluated differently by placing an "order by" on it, that fixes it.

I have tried to recreate this for you in a dump file but I could not get it
to happen without including gobs of data.  Even tweaking the plan I could
not get the filter part evaluating to match the offending plan.

Hopefully these files shed some light.

Oh - and float vs more precision wasn't it.   The results variance is
enormous:
correct results
----------------------------
27839.83example runs with the issue (each row here is a different run)
----------------------------
27839.8352267.54
63620.24
30139.4
27839.83 <- randomly the right answer will be returned
30139.4
46044.1
30139.4
34350.93
39673.62
32239.73
27839.83 <- again
43351.18

-----Original Message-----
From: David Rowley [mailto:dgrowleyml@gmail.com]
Sent: Tuesday, September 22, 2020 10:10 PM
To: kanaga@consumeracquisition.com; PostgreSQL mailing lists
Subject: Re: BUG #16627: union all with partioned table yields random
aggregate results

On Wed, 23 Sep 2020 at 03:30, PG Bug reporting form <noreply@postgresql.org>
wrote:
>
>                         -- these columns will be missing in some tbls
>                         avg(cast(relevance_score as float4)) as
> relevance_score,
>                         avg(cast(positive_feedback as float4)) as
> positive_feedback,
>                         avg(cast(negative_feedback as float4)) as
> negative_feedback,
>

In addition to Tom's request, it would also be good to understand more about
these "random" results.

I see you have a number of aggregate functions run on the results. It would
be good to understand if all of them are incorrect in the parallel version
of the query or if it's just some of them, e.g just the float4 ones I've
quoted above.  Essentially there is some sense of randomness to the results
of floating-point aggregation in all cases due to the lossy precision of
floating points. This is much more likely to be seen during parallel
aggregation as how each tuple is aggregated is highly likely to differ for
each run of the query. That order is much more likely to be stable in the
serial query.

If you are just seeing the floating-point aggregates vary then it might be
worth casting floating-point values to NUMERIC to see if that gets rid of
the randomness.

If that's not the issue then you may also like to help try to narrow the
scope of the problem. Does the problem still occur when you disable
enable_parallel_hash? How about enable_parallel_append?

There was also some mangling of the query plans you posted. I took the time
to format those again and attached the result.

Looking over the row counts between each version of the plan I didn't find
anything that would indicate additional rows were seen in the parallel
version of the plan. However, there's some precision loss which I couldn't
account for around the actual rows being divided by the parallel workers
with integer division, so that process is not 100% accurate.

David

Re: BUG #16627: union all with partioned table yields random aggregate results

От

David Rowley

Дата:

24 сентября 2020 г., 12:43:54

Hi Brian,

On Thu, 24 Sep 2020 at 01:50, Brian Kanaga
<kanaga@consumeracquisition.com> wrote:
> Attached is a much-simplified version of the problem query along with screen
> shots of plans and what tweaks to the query produce changes to the plan.

Thanks for sending those.

(For the future, attaching a text file with the queries and explain
output would be much easier to work with. I personally tend to compare
these sorts of things in a text compare tool. Playing spot the
difference with images is more tricky.)

Comparing the images you attached it does look like the index scan on
fb_add_daily_archive_2019_pkey found more rows in the parallel version
of the scan. 262 * 4 = 1048, but only 826 on the "noissue.png" plan.
In the cases you've shown that were incorrect, the aggregated value is
larger. So assuming you're always aggregating positive values then the
incorrect result does hint that something is getting more rows than it
should.  The row counts I see indicate that's the case with ~1048 in
the error case and only 826 in the correct result case.  It would be
good to get the full text of the EXPLAIN ANALYZE to confirm those
predicates match properly. That part was chopped off the screenshot.

I noticed that the "issue.png" plan has a nested Parallel Append, the
outer of which has a mix of parallel and parallel safe paths. I'm not
sure how relevant that is, but having nested parallel appends is
probably not that common.

I played around with the following trying to produce a similar plan
with a nested parallel append with a mix of parallel and parallel safe
paths. Trying this on 11.4 I didn't see any executions with the
incorrect tuple count.

drop table t;
drop table pt;
create table pt (a int) partition by range(a);
create table pt1 partition of pt for values from (0) to (3000000);
create index on pt1 (a);
create table t (a int primary key);
insert into t select x from generate_Series(1,2000000)x;
insert into pt select x from generate_series(1,2000000)x;
alter table t set (parallel_workers=0);
set enable_bitmapscan=0;
set enable_indexonlyscan=0;
set work_mem = '200MB';
select count(*) from (select * from t where a between 100000 and
200000 union all select * from t where a between 200000 and 300000
union all select * from pt where a between 900000 and 999999) t;

> I have tried to recreate this for you in a dump file but I could not get it
> to happen without including gobs of data.  Even tweaking the plan I could
> not get the filter part evaluating to match the offending plan.

How large is the dataset?  and if the data was properly anonymised,
and the size wasn't too insane, would you be allowed to share it?
privately would be an option.

David

Re: BUG #16627: union all with partioned table yields random aggregate results

От

Brian Kanaga

Дата:

25 сентября 2020 г., 20:54:58

Hi David,

Here's a dump file that is able to reproduce the issue. I would ask that this message and dump file url be kept non-public.

Please let me know when you've retrieved it and I will remove. Thank you!

-- dump file at https://ca-s3-1.s3-us-west-2.amazonaws.com/pgdumpadshfjas/dump.zip
-- database name to create: capgpod
-- users referenced in dump file: capgpod, devreader (might be able to ignore as i think these are ref'd after creates/copies)
-- 4x16 server (m4.xlarge in aws), v11.5

set enable_nestloop = 0;

select sum(spend), sum(conversions) from (
select a.campaign_id,

sum(cast(spend as decimal(12,2))) as spend,

sum(case when c.buying_type = 10 then actions_1d_view_mobile_app_install end) as conversions

from
(
select * from fb_ad_activity_daily where logdate between '11/01/2019' and '11/5/2019' and account_id in (128091690677840,205971526867688,183647115790295)
union all
select * from fb_ad_activity_daily_archive where logdate between '11/01/2019' and '11/5/2019' and account_id in (128091690677840,205971526867688,183647115790295)
)
a
inner join fb_campaigns c on a.campaign_group_id=c.id

where a.logdate between '11/01/2019' and '11/5/2019' and a.account_id in (128091690677840,205971526867688,183647115790295)

group by a.campaign_id
) x

Virus-free. www.avast.com

On Thu, Sep 24, 2020 at 4:44 AM David Rowley <dgrowleyml@gmail.com> wrote:

Hi Brian,

On Thu, 24 Sep 2020 at 01:50, Brian Kanaga
<kanaga@consumeracquisition.com> wrote:
> Attached is a much-simplified version of the problem query along with screen
> shots of plans and what tweaks to the query produce changes to the plan.

Thanks for sending those.

(For the future, attaching a text file with the queries and explain
output would be much easier to work with. I personally tend to compare
these sorts of things in a text compare tool. Playing spot the
difference with images is more tricky.)

Comparing the images you attached it does look like the index scan on
fb_add_daily_archive_2019_pkey found more rows in the parallel version
of the scan. 262 * 4 = 1048, but only 826 on the "noissue.png" plan.
In the cases you've shown that were incorrect, the aggregated value is
larger. So assuming you're always aggregating positive values then the
incorrect result does hint that something is getting more rows than it
should. The row counts I see indicate that's the case with ~1048 in
the error case and only 826 in the correct result case. It would be
good to get the full text of the EXPLAIN ANALYZE to confirm those
predicates match properly. That part was chopped off the screenshot.

I noticed that the "issue.png" plan has a nested Parallel Append, the
outer of which has a mix of parallel and parallel safe paths. I'm not
sure how relevant that is, but having nested parallel appends is
probably not that common.

I played around with the following trying to produce a similar plan
with a nested parallel append with a mix of parallel and parallel safe
paths. Trying this on 11.4 I didn't see any executions with the
incorrect tuple count.

drop table t;
drop table pt;
create table pt (a int) partition by range(a);
create table pt1 partition of pt for values from (0) to (3000000);
create index on pt1 (a);
create table t (a int primary key);
insert into t select x from generate_Series(1,2000000)x;
insert into pt select x from generate_series(1,2000000)x;
alter table t set (parallel_workers=0);
set enable_bitmapscan=0;
set enable_indexonlyscan=0;
set work_mem = '200MB';
select count(*) from (select * from t where a between 100000 and
200000 union all select * from t where a between 200000 and 300000
union all select * from pt where a between 900000 and 999999) t;

> I have tried to recreate this for you in a dump file but I could not get it
> to happen without including gobs of data. Even tweaking the plan I could
> not get the filter part evaluating to match the offending plan.

How large is the dataset? and if the data was properly anonymised,
and the size wasn't too insane, would you be allowed to share it?
privately would be an option.

David

Brian Kanaga

CTO, ConsumerAcquisition.com

kanaga@consumeracquisition.com

Re: BUG #16627: union all with partioned table yields random aggregate results

От

"David G. Johnston"

Дата:

25 сентября 2020 г., 21:26:11

On Fri, Sep 25, 2020 at 11:15 AM Brian Kanaga <kanaga@consumeracquisition.com> wrote:

Hi David,

Here's a dump file that is able to reproduce the issue. I would ask that this message and dump file url be kept non-public.
Please let me know when you've retrieved it and I will remove. Thank you!

You just sent the email directly to a public mailing list so keeping the email private is a non-starter. You should probably remove the file from AWS and re-send a private message to David with a new link to minimize the file's publicity.

David J.

Re: BUG #16627: union all with partioned table yields random aggregate results

От

David Rowley

Дата:

28 сентября 2020 г., 10:12:31

Hi Brian,

On Sat, 26 Sep 2020 at 05:58, Brian Kanaga
<kanaga@consumeracquisition.com> wrote:
> Here's a dump file that is able to reproduce the issue.  I would ask that this message and dump file url be kept
non-public.
> Please let me know when you've retrieved it and I will remove.  Thank you!

Thanks for sending me the updated link.

I can confirm that I can recreate this issue on 11.4. The two attached
explain files show a variance on actual rows on the parallel index
scan;

->  Parallel Index Scan using fb_ad_activity_daily_archive_2019_pkey
on fb_ad_activity_daily_archive_2019  (cost=0.56..37964.90 rows=49
width=1618) (actual time=1.909..8.301 rows=430 loops=3)

from one execution, and;

->  Parallel Index Scan using fb_ad_activity_daily_archive_2019_pkey
on fb_ad_activity_daily_archive_2019  (cost=0.56..37964.90 rows=49
width=1618) (actual time=2.039..9.043 rows=472 loops=3)

from another.

The minimum case to reproduce that I could find is:

create table t (a int not null);
insert into t select x from generate_Series(1,100) x,
generate_Series(1,4000000);
create index on t (a);
analyze t;
set parallel_tuple_cost = 0;
set parallel_setup_cost = 0;
alter table t set (parallel_workers=8);
explain select count(*) from t where a in(10,20,30,40,50,60) and a in
(10,20,30,40,50,60);

The problem only seems to occur with the redundant IN clause added.
Your query is getting that as the outer one was pushed down into the
inner query, but it already existed there.

Although, I was testing on 11.3. On trying the latest, yet to be
released v12 code on REL_12_STABLE, I can't reproduce.

On looking a bit further as to why, I found a fix has already been
pushed [1], but the commit message there does not really mention the
wrong results issue. There's some discussion in  [2]. There's another
case to reproduce it on that thread too.

If you have the ability to build from source away from production,
feel free to try on the REL_11_STABLE branch and confirm that it's now
working ok.

I don't know the exact dates, but what will become 11.10 already has
that fix backpatched.  I expect that will be released around
mid-November.

In the meantime, you could remove the inner WHERE clause items which
are duplicated on the outer query. These should get pushed down into
the inner scans anyway. However, I'm not sure exactly how realiable
that will be as a fix.

(Copying in Amit, to let him know that someone did stumble upon this
in the wild.)

David

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=4bc63462d9d8dd12a5564c3d4ca06c99449d2d07
[2] https://www.postgresql.org/message-id/flat/4248CABC-25E3-4809-B4D0-128E1BAABC3C%40amazon.com

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: BUG #16627: union all with partioned table yields random aggregate results

Вложения

Вложения

Вложения