Re: BUG #16627: union all with partioned table yields random aggregate results

Поиск
Список
Период
Сортировка
От Brian Kanaga
Тема Re: BUG #16627: union all with partioned table yields random aggregate results
Дата
Msg-id CAAk_qnGumHD6kBiiexd8WQ1jnPpUSXhOu5XiCAvWatZAaKaRfQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #16627: union all with partioned table yields random aggregate results  (David Rowley <dgrowleyml@gmail.com>)
Ответы Re: BUG #16627: union all with partioned table yields random aggregate results  ("David G. Johnston" <david.g.johnston@gmail.com>)
Re: BUG #16627: union all with partioned table yields random aggregate results  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-bugs
Hi David,

Here's a dump file that is able to reproduce the issue.  I would ask that this message and dump file url be kept non-public.
Please let me know when you've retrieved it and I will remove.  Thank you!

-- dump file at https://ca-s3-1.s3-us-west-2.amazonaws.com/pgdumpadshfjas/dump.zip
-- database name to create: capgpod
-- users referenced in dump file: capgpod, devreader (might be able to ignore as i think these are ref'd after creates/copies)
-- 4x16 server (m4.xlarge in aws), v11.5


set enable_nestloop = 0;

select sum(spend), sum(conversions) from (
select a.campaign_id,

sum(cast(spend as decimal(12,2))) as spend,

sum(case when c.buying_type = 10 then actions_1d_view_mobile_app_install end) as conversions

from
(
select * from fb_ad_activity_daily where logdate between '11/01/2019' and '11/5/2019' and account_id in (128091690677840,205971526867688,183647115790295)
union all
select * from fb_ad_activity_daily_archive where logdate between '11/01/2019' and '11/5/2019' and account_id in (128091690677840,205971526867688,183647115790295)
)
a
inner join fb_campaigns c on a.campaign_group_id=c.id

where a.logdate between '11/01/2019' and '11/5/2019' and a.account_id in (128091690677840,205971526867688,183647115790295)

group by a.campaign_id
) x



Virus-free. www.avast.com

On Thu, Sep 24, 2020 at 4:44 AM David Rowley <dgrowleyml@gmail.com> wrote:
Hi Brian,

On Thu, 24 Sep 2020 at 01:50, Brian Kanaga
<kanaga@consumeracquisition.com> wrote:
> Attached is a much-simplified version of the problem query along with screen
> shots of plans and what tweaks to the query produce changes to the plan.

Thanks for sending those.

(For the future, attaching a text file with the queries and explain
output would be much easier to work with. I personally tend to compare
these sorts of things in a text compare tool. Playing spot the
difference with images is more tricky.)

Comparing the images you attached it does look like the index scan on
fb_add_daily_archive_2019_pkey found more rows in the parallel version
of the scan. 262 * 4 = 1048, but only 826 on the "noissue.png" plan.
In the cases you've shown that were incorrect, the aggregated value is
larger. So assuming you're always aggregating positive values then the
incorrect result does hint that something is getting more rows than it
should.  The row counts I see indicate that's the case with ~1048 in
the error case and only 826 in the correct result case.  It would be
good to get the full text of the EXPLAIN ANALYZE to confirm those
predicates match properly. That part was chopped off the screenshot.

I noticed that the "issue.png" plan has a nested Parallel Append, the
outer of which has a mix of parallel and parallel safe paths. I'm not
sure how relevant that is, but having nested parallel appends is
probably not that common.

I played around with the following trying to produce a similar plan
with a nested parallel append with a mix of parallel and parallel safe
paths. Trying this on 11.4 I didn't see any executions with the
incorrect tuple count.

drop table t;
drop table pt;
create table pt (a int) partition by range(a);
create table pt1 partition of pt for values from (0) to (3000000);
create index on pt1 (a);
create table t (a int primary key);
insert into t select x from generate_Series(1,2000000)x;
insert into pt select x from generate_series(1,2000000)x;
alter table t set (parallel_workers=0);
set enable_bitmapscan=0;
set enable_indexonlyscan=0;
set work_mem = '200MB';
select count(*) from (select * from t where a between 100000 and
200000 union all select * from t where a between 200000 and 300000
union all select * from pt where a between 900000 and 999999) t;

> I have tried to recreate this for you in a dump file but I could not get it
> to happen without including gobs of data.  Even tweaking the plan I could
> not get the filter part evaluating to match the offending plan.

How large is the dataset?  and if the data was properly anonymised,
and the size wasn't too insane, would you be allowed to share it?
privately would be an option.

David


--

Brian Kanaga

CTO, ConsumerAcquisition.com

kanaga@consumeracquisition.com

В списке pgsql-bugs по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: BUG #16636: Upper case issue in JSONB type
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Memory leak in RelationBuildRowSecurity