Обсуждение: Avoid parallel full and right join paths.
Tests:
create table mytab(x int,x1 char(9),x2 varchar(9));
create table mytab1(y int,y1 char(9),y2 varchar(9));
insert into mytab values (generate_series(1,50000),'aa','aaa');
insert into mytab1 values (generate_series(1,10000),'aa','aaa');
insert into mytab values (generate_series(1,500000),'aa','aaa');
insert into mytab values (generate_series(1,500000),'aa','aaa');
analyze mytab;
analyze mytab1;
vacuum mytab;
vacuum mytab1;
set max_parallel_degree=0;
SET
df=# SELECT count(*) FROM mytab RIGHT OUTER JOIN mytab1
ON mytab.x = mytab1.y;
count
-------
30000
(1 row)
# set max_parallel_degree=5;
SET
df=# SELECT count(*) FROM mytab RIGHT OUTER JOIN mytab1
ON mytab.x = mytab1.y;
count
-------
39089
(1 row)
Casue:
======
Normal plan
==========
explain SELECT count(*) FROM mytab RIGHT OUTER JOIN mytab1
ON mytab.x = mytab1.y;postgres-#
QUERY PLAN
------------------------------------------------------------------------------
Aggregate (cost=21682.71..21682.72 rows=1 width=8)
-> Hash Right Join (cost=289.00..21629.07 rows=21457 width=0)
Hash Cond: (mytab.x = mytab1.y)
-> Seq Scan on mytab (cost=0.00..17188.00 rows=1050000 width=4)
-> Hash (cost=164.00..164.00 rows=10000 width=4)
-> Seq Scan on mytab1 (cost=0.00..164.00 rows=10000 width=4)
=================================================================
Parallel plan.
==========
explain SELECT count(*) FROM mytab RIGHT OUTER JOIN mytab1
ON mytab.x = mytab1.y;postgres-#
QUERY PLAN
-----------------------------------------------------------------------------------------------
Finalize Aggregate (cost=14135.88..14135.89 rows=1 width=8)
-> Gather (cost=14135.67..14135.88 rows=2 width=8)
Number of Workers: 2
-> Partial Aggregate (cost=13135.67..13135.68 rows=1 width=8)
-> Hash Right Join (cost=289.00..13082.02 rows=21457 width=0)
Hash Cond: (mytab.x = mytab1.y)
-> Parallel Seq Scan on mytab (cost=0.00..11063.00 rows=437500 width=4)
-> Hash (cost=164.00..164.00 rows=10000 width=4)
-> Seq Scan on mytab1 (cost=0.00..164.00 rows=10000 width=4)
As above Right and Full join paths cannot be parallel as they can produce false null extended rows because outer table is partial path and not completely visible.
Adding a patch to fix same.
Вложения
On Tue, Apr 19, 2016 at 10:21 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: > Tests: > create table mytab(x int,x1 char(9),x2 varchar(9)); > create table mytab1(y int,y1 char(9),y2 varchar(9)); > insert into mytab values (generate_series(1,50000),'aa','aaa'); > insert into mytab1 values (generate_series(1,10000),'aa','aaa'); > insert into mytab values (generate_series(1,500000),'aa','aaa'); > insert into mytab values (generate_series(1,500000),'aa','aaa'); > analyze mytab; > analyze mytab1; > vacuum mytab; > vacuum mytab1; > > set max_parallel_degree=0; > SET > df=# SELECT count(*) FROM mytab RIGHT OUTER JOIN mytab1 > ON mytab.x = mytab1.y; > count > ------- > 30000 > (1 row) > > # set max_parallel_degree=5; > SET > df=# SELECT count(*) FROM mytab RIGHT OUTER JOIN mytab1 > ON mytab.x = mytab1.y; > count > ------- > 39089 > (1 row) > > Casue: > ====== > Normal plan > ========== > explain SELECT count(*) FROM mytab RIGHT OUTER JOIN mytab1 > ON mytab.x = mytab1.y;postgres-# > QUERY PLAN > ------------------------------------------------------------------------------ > Aggregate (cost=21682.71..21682.72 rows=1 width=8) > -> Hash Right Join (cost=289.00..21629.07 rows=21457 width=0) > Hash Cond: (mytab.x = mytab1.y) > -> Seq Scan on mytab (cost=0.00..17188.00 rows=1050000 width=4) > -> Hash (cost=164.00..164.00 rows=10000 width=4) > -> Seq Scan on mytab1 (cost=0.00..164.00 rows=10000 width=4) > ================================================================= > > Parallel plan. > ========== > explain SELECT count(*) FROM mytab RIGHT OUTER JOIN mytab1 > ON mytab.x = mytab1.y;postgres-# > QUERY PLAN > ----------------------------------------------------------------------------------------------- > Finalize Aggregate (cost=14135.88..14135.89 rows=1 width=8) > -> Gather (cost=14135.67..14135.88 rows=2 width=8) > Number of Workers: 2 > -> Partial Aggregate (cost=13135.67..13135.68 rows=1 width=8) > -> Hash Right Join (cost=289.00..13082.02 rows=21457 width=0) > Hash Cond: (mytab.x = mytab1.y) > -> Parallel Seq Scan on mytab (cost=0.00..11063.00 rows=437500 width=4) > -> Hash (cost=164.00..164.00 rows=10000 width=4) > -> Seq Scan on mytab1 (cost=0.00..164.00 rows=10000 width=4) > > > As above Right and Full join paths cannot be parallel as they can produce > false null extended rows because outer table is partial path and not > completely visible. > Adding a patch to fix same. Committed. But I think the regression test needs more thought, so I left that out. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Apr 20, 2016 at 2:49 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Committed. But I think the regression test needs more thought, so I > left that out. It would be nice if there was a fuzz testing infrastructure that verified that parallel plans produce the same answer as serial plans. -- Peter Geoghegan
On Thu, Apr 21, 2016 at 7:13 AM, Peter Geoghegan <pg@heroku.com> wrote: > On Wed, Apr 20, 2016 at 2:49 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> Committed. But I think the regression test needs more thought, so I >> left that out. > > It would be nice if there was a fuzz testing infrastructure that > verified that parallel plans produce the same answer as serial plans. Results of parallel plans and serial plans could be stored in temporary tables in the test, then that's a matter of comparing them I guess. That's largely doable. -- Michael
Michael Paquier wrote: > On Thu, Apr 21, 2016 at 7:13 AM, Peter Geoghegan <pg@heroku.com> wrote: > > On Wed, Apr 20, 2016 at 2:49 PM, Robert Haas <robertmhaas@gmail.com> wrote: > >> Committed. But I think the regression test needs more thought, so I > >> left that out. > > > > It would be nice if there was a fuzz testing infrastructure that > > verified that parallel plans produce the same answer as serial plans. > > Results of parallel plans and serial plans could be stored in > temporary tables in the test, then that's a matter of comparing them I > guess. That's largely doable. The brin.sql test does that ... -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Apr 20, 2016 at 6:43 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > The brin.sql test does that ... I actually copied brin.sql when creating regression tests for external sorting, primarily because I wanted to test a variety of collations, without having any control of what they happen to be on the target. Those went into amcheck's regression tests, and so have yet to be committed. I think that your approach there has plenty to recommend it, at least where requirements are more complicated. -- Peter Geoghegan