Обсуждение: BUG #16627: union all with partioned table yields random aggregate results
BUG #16627: union all with partioned table yields random aggregate results
От
PG Bug reporting form
Дата:
The following bug has been logged on the website: Bug reference: 16627 Logged by: Brian Kanaga Email address: kanaga@consumeracquisition.com PostgreSQL version: 11.4 Operating system: Linux Description: Hoping this is something already discovered as the amount of data to recreate is cumbersome. Two tables identical in structure, one is a partioned table. ex, "active ledger" and "historical ledger" where the partitions of the historical one are by year. When parallel processing is disabled, the issue goes away. The issue also prefers to (always?) occur when nestloop is disabled. So my workaround is to disable parallel (max_parallel_workers_per_gather=0) when this type of query is run. From my experimentation it seems to be related to having multiple workers gathering data and to some extent their gathering overlaps. again, just in a partitioned table. if i change the subquery to union all with the partition itself (ex fb_ad_activity_daily_archive_2019) then the problem does not occur. As an aside, I have had to disable nestloop to get around what seems to be planner inaccuracy and have not found adjusting statistics targets helpful. However with nestloop disabled (and possibly without), the union all with partitioned table causes random data. Example of query structure and results: select obj.none, obj.mobileos , sum(spend) as spend, sum(conversions) as conversions from ( select fa.id as adset , case when fa.targeting_ios is not null and fa.targeting_android is not null then 'iOS and Android' when fa.targeting_ios is not null and fa.targeting_android is null then 'iOS' when fa.targeting_ios is null and fa.targeting_android is not null then 'Android' when fa.targeting_ios is not null and fa.targeting_android is not null then 'Any' end as mobileos , 1 as none from fb_campaigns fc left join fb_adsets fa on fc.id=fa.campaign and fa.account in (128091690677840,205971526867688,183647115790295) where -- see if using account in join is ok (fc.account is null or fc.account in (128091690677840,205971526867688,183647115790295)) and fa.account in (128091690677840,205971526867688,183647115790295) -- master_rel_wheres_end group by fa.id , case when fa.targeting_ios is not null and fa.targeting_android is not null then 'iOS and Android' when fa.targeting_ios is not null and fa.targeting_android is null then 'iOS' when fa.targeting_ios is null and fa.targeting_android is not null then 'Android' when fa.targeting_ios is not null and fa.targeting_android is not null then 'Any' end , 1 ) obj full outer join ( select a.campaign_id as adset , -- these columns will be missing in some tbls avg(cast(relevance_score as float4)) as relevance_score, avg(cast(positive_feedback as float4)) as positive_feedback, avg(cast(negative_feedback as float4)) as negative_feedback, sum(spend) as spend, sum(case when 1=0 then 0 else coalesce(unique_actions_1d_view_app_custom_event_fb_mobile_purchase,0) end+case when 28=0 then 0 else coalesce(unique_actions_28d_click_app_custom_event_fb_mobile_purchase,0) end) as mobile_purchases , sum(case when 1=0 then 0 else coalesce(unique_actions_1d_view_mobile_app_install,0) end+case when 28=0 then 0 else coalesce(unique_actions_28d_click_mobile_app_install,0) end) as mobile_app_install , sum(case when 1=0 then 0 else coalesce(action_values_1d_view_app_custom_event_fb_mobile_purchase,0) end+case when 28=0 then 0 else coalesce(action_values_28d_click_app_custom_event_fb_mobile_purchase,0) end) as action_value_app_custom_event_fb_mobile_purchase , sum(case when 1=0 then 0 else coalesce(unique_actions_1d_view_link_click,0) end+case when 28=0 then 0 else coalesce(unique_actions_28d_click_link_click,0) end) as link_clicks ,sum(event_1) as event_1,sum(event_2) as event_2,sum(event_3) as event_3,sum(event_4) as event_4,sum(impressions) as impressions, sum(case when 1=0 then 0 else coalesce(actions_1d_view_app_custom_event_fb_mobile_purchase,0) end+case when 28=0 then 0 else coalesce(actions_28d_click_app_custom_event_fb_mobile_purchase,0) end) as nu_mobile_purchases ,sum(event_6) as event_6,sum(event_8) as event_8, sum(case when 1=0 then 0 else coalesce(actions_1d_view_mobile_app_install,0) end+case when 28=0 then 0 else coalesce(actions_28d_click_mobile_app_install,0) end) as nu_mobile_app_install , sum(case when c.buying_type = 10 then actions_1d_view_mobile_app_install end) as conversions --< - this , sum(case when 1=0 then 0 else coalesce(actions_1d_view_link_click,0) end+case when 28=0 then 0 else coalesce(actions_28d_click_link_click,0) end) as nu_link_clicks from ( select * from fb_ad_activity_daily where logdate between '11/01/2019' and '11/17/2019' and account_id in (128091690677840,205971526867688,183647115790295) union all select * from fb_ad_activity_daily_archive where logdate between '11/01/2019' and '11/17/2019' and account_id in (128091690677840,205971526867688,183647115790295) --< plus this ) a inner join fb_campaigns c on a.campaign_group_id=c.id left join ca_ud_conversions ud on a.logdate=ud.logdate and a.account_id=ud.account and a.adgroup_id=ud.adgroup where a.logdate between '11/01/2019' and '11/17/2019' and a.account_id in (128091690677840,205971526867688,183647115790295) group by a.campaign_id ) stats on obj.adset = stats.adset group by obj.none, obj.mobileos The outputs of the sums will be random; results will vary. sometimes they will be correct. correct results: 1 Android 66050.68 1 iOS 27624.41 1 examples of incorrect random results 1 Android 111098.92 1 iOS 34314.69 1 1 Android 116473.33 1 iOS 37640.63 1 1 Android 68350.25 1 iOS 27624.41 1 -- issue does occur set enable_nestloop TO 0; GroupAggregate (cost=356113.02..356115.54 rows=1 width=76) (actual time=1193.334..1194.411 rows=3 loops=1) Group Key: (1), (CASE WHEN ((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'iOS and Android'::text WHEN ((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NULL)) THEN 'iOS'::text WHEN ((fa.targeting_ios IS NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'Android'::text WHEN ((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'Any'::text ELSE NULL::text END) -> Sort (cost=356113.02..356113.52 rows=200 width=52) (actual time=1192.245..1193.084 rows=4280 loops=1) Sort Key: (1), (CASE WHEN ((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'iOS and Android'::text WHEN ((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NULL)) THEN 'iOS'::text WHEN ((fa.targeting_ios IS NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'Android'::text WHEN ((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'Any'::text ELSE NULL::text END) Sort Method: quicksort Memory: 395kB -> Merge Full Join (cost=356063.98..356105.38 rows=200 width=52) (actual time=1183.640..1190.639 rows=4280 loops=1) Merge Cond: (fa.id = "*SELECT* 1".campaign_id) -> Group (cost=179905.47..179905.84 rows=1 width=44) (actual time=85.440..90.013 rows=4280 loops=1) Group Key: fa.id, (CASE WHEN ((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'iOS and Android'::text WHEN ((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NULL)) THEN 'iOS'::text WHEN ((fa.targeting_ios IS NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'Android'::text WHEN ((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'Any'::text ELSE NULL::text END) -> Gather Merge (cost=179905.47..179905.83 rows=3 width=40) (actual time=85.438..87.984 rows=4280 loops=1) Workers Planned: 3 Workers Launched: 3 -> Group (cost=178905.43..178905.44 rows=1 width=40) (actual time=79.332..80.143 rows=1070 loops=4) Group Key: fa.id, (CASE WHEN ((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'iOS and Android'::text WHEN ((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NULL)) THEN 'iOS'::text WHEN ((fa.targeting_ios IS NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'Android'::text WHEN ((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'Any'::text ELSE NULL::text END) -> Sort (cost=178905.43..178905.43 rows=1 width=40) (actual time=79.329..79.567 rows=1070 loops=4) Sort Key: fa.id, (CASE WHEN ((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'iOS and Android'::text WHEN ((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NULL)) THEN 'iOS'::text WHEN ((fa.targeting_ios IS NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'Android'::text WHEN ((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'Any'::text ELSE NULL::text END) Sort Method: quicksort Memory: 97kB Worker 0: Sort Method: quicksort Memory: 72kB Worker 1: Sort Method: quicksort Memory: 99kB Worker 2: Sort Method: quicksort Memory: 103kB -> Parallel Hash Join (cost=167790.90..178905.42 rows=1 width=40) (actual time=70.443..78.798 rows=1070 loops=4) Hash Cond: (fc.id = fa.campaign) -> Parallel Seq Scan on fb_campaigns fc (cost=0.00..11113.68 rows=74 width=8) (actual time=0.090..7.951 rows=174 loops=4) Filter: ((account IS NULL) OR (account = ANY ('{128091690677840,205971526867688,183647115790295}'::bigint[]))) Rows Removed by Filter: 49408 -> Parallel Hash (cost=167790.81..167790.81 rows=7 width=64) (actual time=70.253..70.253 rows=1070 loops=4) Buckets: 4096 (originally 1024) Batches: 1 (originally 1) Memory Usage: 440kB -> Parallel Seq Scan on fb_adsets fa (cost=0.00..167790.81 rows=7 width=64) (actual time=0.988..55.405 rows=1070 loops=4) Filter: ((account = ANY ('{128091690677840,205971526867688,183647115790295}'::bigint[])) AND (account = ANY ('{128091690677840,205971526867688,183647115790295}'::bigint[]))) Rows Removed by Filter: 198077 -> Finalize GroupAggregate (cost=176158.52..176197.01 rows=200 width=160) (actual time=1098.196..1098.694 rows=64 loops=1) Group Key: "*SELECT* 1".campaign_id -> Gather Merge (cost=176158.52..176192.97 rows=272 width=24) (actual time=1098.160..1141.270 rows=181 loops=1) Workers Planned: 2 Workers Launched: 2 -> Partial GroupAggregate (cost=175158.49..175161.55 rows=136 width=24) (actual time=1088.396..1088.791 rows=60 loops=3) Group Key: "*SELECT* 1".campaign_id -> Sort (cost=175158.49..175158.83 rows=136 width=22) (actual time=1088.377..1088.547 rows=675 loops=3) Sort Key: "*SELECT* 1".campaign_id Sort Method: quicksort Memory: 73kB Worker 0: Sort Method: quicksort Memory: 80kB Worker 1: Sort Method: quicksort Memory: 79kB -> Parallel Hash Left Join (cost=90647.99..175153.67 rows=136 width=22) (actual time=963.744..1088.079 rows=675 loops=3) Hash Cond: (("*SELECT* 1".logdate = ud.logdate) AND ("*SELECT* 1".account_id = ud.account) AND ("*SELECT* 1".adgroup_id = ud.adgroup)) -> Parallel Hash Join (cost=11673.89..91194.54 rows=136 width=42) (actual time=64.276..76.694 rows=675 loops=3) Hash Cond: ("*SELECT* 1".campaign_group_id = c.id) -> Parallel Append (cost=0.42..79520.71 rows=140 width=48) (actual time=6.548..18.083 rows=675 loops=3) -> Subquery Scan on "*SELECT* 1" (cost=0.42..2.67 rows=1 width=48) (actual time=0.016..0.016 rows=0 loops=1) -> Index Scan using fb_ad_activity_daily2_pkey on fb_ad_activity_daily (cost=0.42..2.66 rows=1 width=1618) (actual time=0.015..0.015 rows=0 loops=1) Index Cond: ((logdate >= '2019-11-01'::date) AND (logdate <= '2019-11-17'::date) AND (logdate >= '2019-11-01'::date) AND (logdate <= '2019-11-17'::date)) Filter: ((account_id = ANY ('{128091690677840,205971526867688,183647115790295}'::bigint[])) AND (account_id = ANY ('{128091690677840,205971526867688,183647115790295}'::bigint[]))) -> Subquery Scan on "*SELECT* 2" (cost=0.56..79516.88 rows=326 width=48) (actual time=6.541..17.792 rows=675 loops=3) -> Parallel Append (cost=0.56..79513.62 rows=192 width=1618) (actual time=6.539..17.471 rows=675 loops=3) -> Parallel Index Scan using fb_ad_activity_daily_archive_2019_pkey on fb_ad_activity_daily_archive_2019 (cost=0.56..79512.66 rows=192 width=1618) (actual time=6.538..17.169 rows=675 loops=3) Index Cond: ((logdate >= '2019-11-01'::date) AND (logdate <= '2019-11-17'::date) AND (logdate >= '2019-11-01'::date) AND (logdate <= '2019-11-17'::date) AND (account_id = ANY ('{128091690677840,205971526867688,183647115790295}'::bigint[])) AND (account_id = ANY ('{128091690677840,205971526867688,183647115790295}'::bigint[]))) -> Parallel Hash (cost=10873.76..10873.76 rows=63976 width=10) (actual time=57.291..57.292 rows=66109 loops=3) Buckets: 262144 Batches: 1 Memory Usage: 11424kB -> Parallel Seq Scan on fb_campaigns c (cost=0.00..10873.76 rows=63976 width=10) (actual time=0.008..28.978 rows=66109 loops=3) -> Parallel Hash (cost=60953.22..60953.22 rows=771422 width=20) (actual time=875.479..875.479 rows=1031137 loops=3) Buckets: 131072 Batches: 32 Memory Usage: 6432kB -> Parallel Seq Scan on ca_ud_conversions ud (cost=0.00..60953.22 rows=771422 width=20) (actual time=0.004..405.186 rows=1031137 loops=3) Planning Time: 3.630 ms Execution Time: 1238.495 ms -- issue does not occur SET max_parallel_workers_per_gather TO 0; set enable_nestloop TO 0; GroupAggregate (cost=466729.81..466732.32 rows=1 width=76) (actual time=2866.611..2867.687 rows=3 loops=1) Group Key: (1), (CASE WHEN ((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'iOS and Android'::text WHEN ((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NULL)) THEN 'iOS'::text WHEN ((fa.targeting_ios IS NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'Android'::text WHEN ((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'Any'::text ELSE NULL::text END) -> Sort (cost=466729.81..466730.31 rows=200 width=52) (actual time=2865.528..2866.368 rows=4280 loops=1) Sort Key: (1), (CASE WHEN ((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'iOS and Android'::text WHEN ((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NULL)) THEN 'iOS'::text WHEN ((fa.targeting_ios IS NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'Android'::text WHEN ((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'Any'::text ELSE NULL::text END) Sort Method: quicksort Memory: 395kB -> Merge Full Join (cost=466713.55..466722.17 rows=200 width=52) (actual time=2857.653..2863.873 rows=4280 loops=1) Merge Cond: (fa.id = "*SELECT* 1".campaign_id) -> Group (cost=191854.00..191854.01 rows=1 width=44) (actual time=246.292..249.519 rows=4280 loops=1) Group Key: fa.id, (CASE WHEN ((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'iOS and Android'::text WHEN ((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NULL)) THEN 'iOS'::text WHEN ((fa.targeting_ios IS NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'Android'::text WHEN ((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'Any'::text ELSE NULL::text END) -> Sort (cost=191854.00..191854.00 rows=1 width=40) (actual time=246.288..247.277 rows=4280 loops=1) Sort Key: fa.id, (CASE WHEN ((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'iOS and Android'::text WHEN ((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NULL)) THEN 'iOS'::text WHEN ((fa.targeting_ios IS NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'Android'::text WHEN ((fa.targeting_ios IS NOT NULL) AND (fa.targeting_android IS NOT NULL)) THEN 'Any'::text ELSE NULL::text END) Sort Method: quicksort Memory: 393kB -> Hash Join (cost=12963.85..191853.99 rows=1 width=40) (actual time=31.233..244.131 rows=4280 loops=1) Hash Cond: (fa.campaign = fc.id) -> Seq Scan on fb_adsets fa (cost=0.00..178890.05 rows=34 width=64) (actual time=0.109..210.360 rows=4280 loops=1) Filter: ((account = ANY ('{128091690677840,205971526867688,183647115790295}'::bigint[])) AND (account = ANY ('{128091690677840,205971526867688,183647115790295}'::bigint[]))) Rows Removed by Filter: 792308 -> Hash (cost=12961.00..12961.00 rows=228 width=8) (actual time=31.110..31.111 rows=694 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 36kB -> Seq Scan on fb_campaigns fc (cost=0.00..12961.00 rows=228 width=8) (actual time=0.013..30.894 rows=694 loops=1) Filter: ((account IS NULL) OR (account = ANY ('{128091690677840,205971526867688,183647115790295}'::bigint[]))) Rows Removed by Filter: 197633 -> GroupAggregate (cost=274859.55..274865.64 rows=200 width=160) (actual time=2611.356..2612.484 rows=64 loops=1) Group Key: "*SELECT* 1".campaign_id -> Sort (cost=274859.55..274860.37 rows=327 width=22) (actual time=2611.327..2611.831 rows=2026 loops=1) Sort Key: "*SELECT* 1".campaign_id Sort Method: quicksort Memory: 207kB -> Hash Left Join (cost=171842.20..274845.89 rows=327 width=22) (actual time=2334.961..2610.504 rows=2026 loops=1) Hash Cond: (("*SELECT* 1".logdate = ud.logdate) AND ("*SELECT* 1".account_id = ud.account) AND ("*SELECT* 1".adgroup_id = ud.adgroup)) -> Hash Join (cost=15665.78..96165.51 rows=327 width=42) (actual time=156.656..198.283 rows=2026 loops=1) Hash Cond: ("*SELECT* 1".campaign_group_id = c.id) -> Append (cost=0.42..79524.29 rows=327 width=48) (actual time=13.125..45.025 rows=2026 loops=1) -> Subquery Scan on "*SELECT* 1" (cost=0.42..2.67 rows=1 width=48) (actual time=0.012..0.012 rows=0 loops=1) -> Index Scan using fb_ad_activity_daily2_pkey on fb_ad_activity_daily (cost=0.42..2.66 rows=1 width=1618) (actual time=0.011..0.011 rows=0 loops=1) Index Cond: ((logdate >= '2019-11-01'::date) AND (logdate <= '2019-11-17'::date) AND (logdate >= '2019-11-01'::date) AND (logdate <= '2019-11-17'::date)) Filter: ((account_id = ANY ('{128091690677840,205971526867688,183647115790295}'::bigint[])) AND (account_id = ANY ('{128091690677840,205971526867688,183647115790295}'::bigint[]))) -> Subquery Scan on "*SELECT* 2" (cost=0.56..79519.99 rows=326 width=48) (actual time=13.111..44.245 rows=2026 loops=1) -> Append (cost=0.56..79516.73 rows=326 width=1618) (actual time=13.110..43.372 rows=2026 loops=1) -> Index Scan using fb_ad_activity_daily_archive_2019_pkey on fb_ad_activity_daily_archive_2019 (cost=0.56..79515.10 rows=326 width=1618) (actual time=13.108..42.569 rows=2026 loops=1) Index Cond: ((logdate >= '2019-11-01'::date) AND (logdate <= '2019-11-17'::date) AND (logdate >= '2019-11-01'::date) AND (logdate <= '2019-11-17'::date) AND (account_id = ANY ('{128091690677840,205971526867688,183647115790295}'::bigint[])) AND (account_id = ANY ('{128091690677840,205971526867688,183647115790295}'::bigint[]))) -> Hash (cost=12217.27..12217.27 rows=198327 width=10) (actual time=143.122..143.122 rows=198327 loops=1) Buckets: 262144 Batches: 2 Memory Usage: 6693kB -> Seq Scan on fb_campaigns c (cost=0.00..12217.27 rows=198327 width=10) (actual time=0.007..74.000 rows=198327 loops=1) -> Hash (cost=84095.88..84095.88 rows=3085688 width=20) (actual time=2177.119..2177.119 rows=3093410 loops=1) Buckets: 131072 Batches: 32 Memory Usage: 6348kB -> Seq Scan on ca_ud_conversions ud (cost=0.00..84095.88 rows=3085688 width=20) (actual time=0.005..1005.261 rows=3093410 loops=1) Planning Time: 3.733 ms Execution Time: 2868.068 ms
PG Bug reporting form <noreply@postgresql.org> writes: > The following bug has been logged on the website: > Bug reference: 16627 > Logged by: Brian Kanaga > Email address: kanaga@consumeracquisition.com > PostgreSQL version: 11.4 > Operating system: Linux > Description: > Hoping this is something already discovered as the amount of data to > recreate is cumbersome. Well, 11.4 is more than a year out of date. Can you still reproduce this on the current minor release (11.9)? regards, tom lane
RE: BUG #16627: union all with partioned table yields random aggregate results
От
Brian Kanaga
Дата:
Hi, confirming it happens in 11.9 and 12.4. -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Tuesday, September 22, 2020 10:54 AM To: kanaga@consumeracquisition.com Cc: pgsql-bugs@lists.postgresql.org Subject: Re: BUG #16627: union all with partioned table yields random aggregate results PG Bug reporting form <noreply@postgresql.org> writes: > The following bug has been logged on the website: > Bug reference: 16627 > Logged by: Brian Kanaga > Email address: kanaga@consumeracquisition.com > PostgreSQL version: 11.4 > Operating system: Linux > Description: > Hoping this is something already discovered as the amount of data to > recreate is cumbersome. Well, 11.4 is more than a year out of date. Can you still reproduce this on the current minor release (11.9)? regards, tom lane
Brian Kanaga <kanaga@consumeracquisition.com> writes: > Hi, confirming it happens in 11.9 and 12.4. Ugh. I don't think it's any known problem then. We'll have to ask you to make a reproducer. Whatever it is probably isn't very data-dependent, so hopefully you can just create a script that fills some tables with generated/random data. regards, tom lane
Re: BUG #16627: union all with partioned table yields random aggregate results
От
David Rowley
Дата:
On Wed, 23 Sep 2020 at 03:30, PG Bug reporting form <noreply@postgresql.org> wrote: > > -- these columns will be missing in some tbls > avg(cast(relevance_score as float4)) as relevance_score, > avg(cast(positive_feedback as float4)) as positive_feedback, > avg(cast(negative_feedback as float4)) as negative_feedback, > In addition to Tom's request, it would also be good to understand more about these "random" results. I see you have a number of aggregate functions run on the results. It would be good to understand if all of them are incorrect in the parallel version of the query or if it's just some of them, e.g just the float4 ones I've quoted above. Essentially there is some sense of randomness to the results of floating-point aggregation in all cases due to the lossy precision of floating points. This is much more likely to be seen during parallel aggregation as how each tuple is aggregated is highly likely to differ for each run of the query. That order is much more likely to be stable in the serial query. If you are just seeing the floating-point aggregates vary then it might be worth casting floating-point values to NUMERIC to see if that gets rid of the randomness. If that's not the issue then you may also like to help try to narrow the scope of the problem. Does the problem still occur when you disable enable_parallel_hash? How about enable_parallel_append? There was also some mangling of the query plans you posted. I took the time to format those again and attached the result. Looking over the row counts between each version of the plan I didn't find anything that would indicate additional rows were seen in the parallel version of the plan. However, there's some precision loss which I couldn't account for around the actual rows being divided by the parallel workers with integer division, so that process is not 100% accurate. David
Вложения
RE: BUG #16627: union all with partioned table yields random aggregate results
От
Brian Kanaga
Дата:
Thank you David for your input here! Attached is a much-simplified version of the problem query along with screen shots of plans and what tweaks to the query produce changes to the plan. Both your hints fix it. Also if I force the partitioned table subquery to be evaluated differently by placing an "order by" on it, that fixes it. I have tried to recreate this for you in a dump file but I could not get it to happen without including gobs of data. Even tweaking the plan I could not get the filter part evaluating to match the offending plan. Hopefully these files shed some light. Oh - and float vs more precision wasn't it. The results variance is enormous: correct results ---------------------------- 27839.83example runs with the issue (each row here is a different run) ---------------------------- 27839.8352267.54 63620.24 30139.4 27839.83 <- randomly the right answer will be returned 30139.4 46044.1 30139.4 34350.93 39673.62 32239.73 27839.83 <- again 43351.18 -----Original Message----- From: David Rowley [mailto:dgrowleyml@gmail.com] Sent: Tuesday, September 22, 2020 10:10 PM To: kanaga@consumeracquisition.com; PostgreSQL mailing lists Subject: Re: BUG #16627: union all with partioned table yields random aggregate results On Wed, 23 Sep 2020 at 03:30, PG Bug reporting form <noreply@postgresql.org> wrote: > > -- these columns will be missing in some tbls > avg(cast(relevance_score as float4)) as > relevance_score, > avg(cast(positive_feedback as float4)) as > positive_feedback, > avg(cast(negative_feedback as float4)) as > negative_feedback, > In addition to Tom's request, it would also be good to understand more about these "random" results. I see you have a number of aggregate functions run on the results. It would be good to understand if all of them are incorrect in the parallel version of the query or if it's just some of them, e.g just the float4 ones I've quoted above. Essentially there is some sense of randomness to the results of floating-point aggregation in all cases due to the lossy precision of floating points. This is much more likely to be seen during parallel aggregation as how each tuple is aggregated is highly likely to differ for each run of the query. That order is much more likely to be stable in the serial query. If you are just seeing the floating-point aggregates vary then it might be worth casting floating-point values to NUMERIC to see if that gets rid of the randomness. If that's not the issue then you may also like to help try to narrow the scope of the problem. Does the problem still occur when you disable enable_parallel_hash? How about enable_parallel_append? There was also some mangling of the query plans you posted. I took the time to format those again and attached the result. Looking over the row counts between each version of the plan I didn't find anything that would indicate additional rows were seen in the parallel version of the plan. However, there's some precision loss which I couldn't account for around the actual rows being divided by the parallel workers with integer division, so that process is not 100% accurate. David
Вложения
RE: BUG #16627: union all with partioned table yields random aggregate results
От
Brian Kanaga
Дата:
If I may add (referencing the files I just sent): the parallel append and parallel index scan is unique to the issue query. -----Original Message----- From: Brian Kanaga [mailto:kanaga@consumeracquisition.com] Sent: Wednesday, September 23, 2020 8:48 AM To: 'David Rowley'; 'PostgreSQL mailing lists' Subject: RE: BUG #16627: union all with partioned table yields random aggregate results Thank you David for your input here! Attached is a much-simplified version of the problem query along with screen shots of plans and what tweaks to the query produce changes to the plan. Both your hints fix it. Also if I force the partitioned table subquery to be evaluated differently by placing an "order by" on it, that fixes it. I have tried to recreate this for you in a dump file but I could not get it to happen without including gobs of data. Even tweaking the plan I could not get the filter part evaluating to match the offending plan. Hopefully these files shed some light. Oh - and float vs more precision wasn't it. The results variance is enormous: correct results ---------------------------- 27839.83example runs with the issue (each row here is a different run) ---------------------------- 27839.8352267.54 63620.24 30139.4 27839.83 <- randomly the right answer will be returned 30139.4 46044.1 30139.4 34350.93 39673.62 32239.73 27839.83 <- again 43351.18 -----Original Message----- From: David Rowley [mailto:dgrowleyml@gmail.com] Sent: Tuesday, September 22, 2020 10:10 PM To: kanaga@consumeracquisition.com; PostgreSQL mailing lists Subject: Re: BUG #16627: union all with partioned table yields random aggregate results On Wed, 23 Sep 2020 at 03:30, PG Bug reporting form <noreply@postgresql.org> wrote: > > -- these columns will be missing in some tbls > avg(cast(relevance_score as float4)) as > relevance_score, > avg(cast(positive_feedback as float4)) as > positive_feedback, > avg(cast(negative_feedback as float4)) as > negative_feedback, > In addition to Tom's request, it would also be good to understand more about these "random" results. I see you have a number of aggregate functions run on the results. It would be good to understand if all of them are incorrect in the parallel version of the query or if it's just some of them, e.g just the float4 ones I've quoted above. Essentially there is some sense of randomness to the results of floating-point aggregation in all cases due to the lossy precision of floating points. This is much more likely to be seen during parallel aggregation as how each tuple is aggregated is highly likely to differ for each run of the query. That order is much more likely to be stable in the serial query. If you are just seeing the floating-point aggregates vary then it might be worth casting floating-point values to NUMERIC to see if that gets rid of the randomness. If that's not the issue then you may also like to help try to narrow the scope of the problem. Does the problem still occur when you disable enable_parallel_hash? How about enable_parallel_append? There was also some mangling of the query plans you posted. I took the time to format those again and attached the result. Looking over the row counts between each version of the plan I didn't find anything that would indicate additional rows were seen in the parallel version of the plan. However, there's some precision loss which I couldn't account for around the actual rows being divided by the parallel workers with integer division, so that process is not 100% accurate. David
Re: BUG #16627: union all with partioned table yields random aggregate results
От
David Rowley
Дата:
Hi Brian, On Thu, 24 Sep 2020 at 01:50, Brian Kanaga <kanaga@consumeracquisition.com> wrote: > Attached is a much-simplified version of the problem query along with screen > shots of plans and what tweaks to the query produce changes to the plan. Thanks for sending those. (For the future, attaching a text file with the queries and explain output would be much easier to work with. I personally tend to compare these sorts of things in a text compare tool. Playing spot the difference with images is more tricky.) Comparing the images you attached it does look like the index scan on fb_add_daily_archive_2019_pkey found more rows in the parallel version of the scan. 262 * 4 = 1048, but only 826 on the "noissue.png" plan. In the cases you've shown that were incorrect, the aggregated value is larger. So assuming you're always aggregating positive values then the incorrect result does hint that something is getting more rows than it should. The row counts I see indicate that's the case with ~1048 in the error case and only 826 in the correct result case. It would be good to get the full text of the EXPLAIN ANALYZE to confirm those predicates match properly. That part was chopped off the screenshot. I noticed that the "issue.png" plan has a nested Parallel Append, the outer of which has a mix of parallel and parallel safe paths. I'm not sure how relevant that is, but having nested parallel appends is probably not that common. I played around with the following trying to produce a similar plan with a nested parallel append with a mix of parallel and parallel safe paths. Trying this on 11.4 I didn't see any executions with the incorrect tuple count. drop table t; drop table pt; create table pt (a int) partition by range(a); create table pt1 partition of pt for values from (0) to (3000000); create index on pt1 (a); create table t (a int primary key); insert into t select x from generate_Series(1,2000000)x; insert into pt select x from generate_series(1,2000000)x; alter table t set (parallel_workers=0); set enable_bitmapscan=0; set enable_indexonlyscan=0; set work_mem = '200MB'; select count(*) from (select * from t where a between 100000 and 200000 union all select * from t where a between 200000 and 300000 union all select * from pt where a between 900000 and 999999) t; > I have tried to recreate this for you in a dump file but I could not get it > to happen without including gobs of data. Even tweaking the plan I could > not get the filter part evaluating to match the offending plan. How large is the dataset? and if the data was properly anonymised, and the size wasn't too insane, would you be allowed to share it? privately would be an option. David
Re: BUG #16627: union all with partioned table yields random aggregate results
От
Brian Kanaga
Дата:
Hi David,
Here's a dump file that is able to reproduce the issue. I would ask that this message and dump file url be kept non-public.
Please let me know when you've retrieved it and I will remove. Thank you!
-- dump file at https://ca-s3-1.s3-us-west-2.amazonaws.com/pgdumpadshfjas/dump.zip
-- database name to create: capgpod
-- users referenced in dump file: capgpod, devreader (might be able to ignore as i think these are ref'd after creates/copies)
-- 4x16 server (m4.xlarge in aws), v11.5
set enable_nestloop = 0;
select sum(spend), sum(conversions) from (
select a.campaign_id,
sum(cast(spend as decimal(12,2))) as spend,
sum(case when c.buying_type = 10 then actions_1d_view_mobile_app_install end) as conversions
from
(
select * from fb_ad_activity_daily where logdate between '11/01/2019' and '11/5/2019' and account_id in (128091690677840,205971526867688,183647115790295)
union all
select * from fb_ad_activity_daily_archive where logdate between '11/01/2019' and '11/5/2019' and account_id in (128091690677840,205971526867688,183647115790295)
)
a
inner join fb_campaigns c on a.campaign_group_id=c.id
where a.logdate between '11/01/2019' and '11/5/2019' and a.account_id in (128091690677840,205971526867688,183647115790295)
group by a.campaign_id
) x
-- database name to create: capgpod
-- users referenced in dump file: capgpod, devreader (might be able to ignore as i think these are ref'd after creates/copies)
-- 4x16 server (m4.xlarge in aws), v11.5
set enable_nestloop = 0;
select sum(spend), sum(conversions) from (
select a.campaign_id,
sum(cast(spend as decimal(12,2))) as spend,
sum(case when c.buying_type = 10 then actions_1d_view_mobile_app_install end) as conversions
from
(
select * from fb_ad_activity_daily where logdate between '11/01/2019' and '11/5/2019' and account_id in (128091690677840,205971526867688,183647115790295)
union all
select * from fb_ad_activity_daily_archive where logdate between '11/01/2019' and '11/5/2019' and account_id in (128091690677840,205971526867688,183647115790295)
)
a
inner join fb_campaigns c on a.campaign_group_id=c.id
where a.logdate between '11/01/2019' and '11/5/2019' and a.account_id in (128091690677840,205971526867688,183647115790295)
group by a.campaign_id
) x
On Thu, Sep 24, 2020 at 4:44 AM David Rowley <dgrowleyml@gmail.com> wrote:
Hi Brian,
On Thu, 24 Sep 2020 at 01:50, Brian Kanaga
<kanaga@consumeracquisition.com> wrote:
> Attached is a much-simplified version of the problem query along with screen
> shots of plans and what tweaks to the query produce changes to the plan.
Thanks for sending those.
(For the future, attaching a text file with the queries and explain
output would be much easier to work with. I personally tend to compare
these sorts of things in a text compare tool. Playing spot the
difference with images is more tricky.)
Comparing the images you attached it does look like the index scan on
fb_add_daily_archive_2019_pkey found more rows in the parallel version
of the scan. 262 * 4 = 1048, but only 826 on the "noissue.png" plan.
In the cases you've shown that were incorrect, the aggregated value is
larger. So assuming you're always aggregating positive values then the
incorrect result does hint that something is getting more rows than it
should. The row counts I see indicate that's the case with ~1048 in
the error case and only 826 in the correct result case. It would be
good to get the full text of the EXPLAIN ANALYZE to confirm those
predicates match properly. That part was chopped off the screenshot.
I noticed that the "issue.png" plan has a nested Parallel Append, the
outer of which has a mix of parallel and parallel safe paths. I'm not
sure how relevant that is, but having nested parallel appends is
probably not that common.
I played around with the following trying to produce a similar plan
with a nested parallel append with a mix of parallel and parallel safe
paths. Trying this on 11.4 I didn't see any executions with the
incorrect tuple count.
drop table t;
drop table pt;
create table pt (a int) partition by range(a);
create table pt1 partition of pt for values from (0) to (3000000);
create index on pt1 (a);
create table t (a int primary key);
insert into t select x from generate_Series(1,2000000)x;
insert into pt select x from generate_series(1,2000000)x;
alter table t set (parallel_workers=0);
set enable_bitmapscan=0;
set enable_indexonlyscan=0;
set work_mem = '200MB';
select count(*) from (select * from t where a between 100000 and
200000 union all select * from t where a between 200000 and 300000
union all select * from pt where a between 900000 and 999999) t;
> I have tried to recreate this for you in a dump file but I could not get it
> to happen without including gobs of data. Even tweaking the plan I could
> not get the filter part evaluating to match the offending plan.
How large is the dataset? and if the data was properly anonymised,
and the size wasn't too insane, would you be allowed to share it?
privately would be an option.
David
--
CTO, ConsumerAcquisition.com
Re: BUG #16627: union all with partioned table yields random aggregate results
От
"David G. Johnston"
Дата:
On Fri, Sep 25, 2020 at 11:15 AM Brian Kanaga <kanaga@consumeracquisition.com> wrote:
Hi David,Here's a dump file that is able to reproduce the issue. I would ask that this message and dump file url be kept non-public.Please let me know when you've retrieved it and I will remove. Thank you!
You just sent the email directly to a public mailing list so keeping the email private is a non-starter. You should probably remove the file from AWS and re-send a private message to David with a new link to minimize the file's publicity.
David J.
Re: BUG #16627: union all with partioned table yields random aggregate results
От
David Rowley
Дата:
Hi Brian, On Sat, 26 Sep 2020 at 05:58, Brian Kanaga <kanaga@consumeracquisition.com> wrote: > Here's a dump file that is able to reproduce the issue. I would ask that this message and dump file url be kept non-public. > Please let me know when you've retrieved it and I will remove. Thank you! Thanks for sending me the updated link. I can confirm that I can recreate this issue on 11.4. The two attached explain files show a variance on actual rows on the parallel index scan; -> Parallel Index Scan using fb_ad_activity_daily_archive_2019_pkey on fb_ad_activity_daily_archive_2019 (cost=0.56..37964.90 rows=49 width=1618) (actual time=1.909..8.301 rows=430 loops=3) from one execution, and; -> Parallel Index Scan using fb_ad_activity_daily_archive_2019_pkey on fb_ad_activity_daily_archive_2019 (cost=0.56..37964.90 rows=49 width=1618) (actual time=2.039..9.043 rows=472 loops=3) from another. The minimum case to reproduce that I could find is: create table t (a int not null); insert into t select x from generate_Series(1,100) x, generate_Series(1,4000000); create index on t (a); analyze t; set parallel_tuple_cost = 0; set parallel_setup_cost = 0; alter table t set (parallel_workers=8); explain select count(*) from t where a in(10,20,30,40,50,60) and a in (10,20,30,40,50,60); The problem only seems to occur with the redundant IN clause added. Your query is getting that as the outer one was pushed down into the inner query, but it already existed there. Although, I was testing on 11.3. On trying the latest, yet to be released v12 code on REL_12_STABLE, I can't reproduce. On looking a bit further as to why, I found a fix has already been pushed [1], but the commit message there does not really mention the wrong results issue. There's some discussion in [2]. There's another case to reproduce it on that thread too. If you have the ability to build from source away from production, feel free to try on the REL_11_STABLE branch and confirm that it's now working ok. I don't know the exact dates, but what will become 11.10 already has that fix backpatched. I expect that will be released around mid-November. In the meantime, you could remove the inner WHERE clause items which are duplicated on the outer query. These should get pushed down into the inner scans anyway. However, I'm not sure exactly how realiable that will be as a fix. (Copying in Amit, to let him know that someone did stumble upon this in the wild.) David [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=4bc63462d9d8dd12a5564c3d4ca06c99449d2d07 [2] https://www.postgresql.org/message-id/flat/4248CABC-25E3-4809-B4D0-128E1BAABC3C%40amazon.com