Re: [HACKERS] Partition-wise aggregation/grouping

Поиск
Список
Период
Сортировка
От Jeevan Chalke
Тема Re: [HACKERS] Partition-wise aggregation/grouping
Дата
Msg-id CAM2+6=V334YYwQ+9btut8LozO+MRi=_e_Lss78dENMQM5wgOog@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Partition-wise aggregation/grouping  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Ответы Re: [HACKERS] Partition-wise aggregation/grouping
Список pgsql-hackers


On Thu, Dec 14, 2017 at 4:01 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:

Sure no problem. Take your time. Here's set of comments for 0008. That
ends the first read of all the patches (2nd reading for the core
changes)

+-- Also, disable parallel paths.
+SET max_parallel_workers_per_gather TO 0;

If you enable parallel aggregation for smaller data partition-wise aggregation
paths won't be chosen. I think this is the reason why you are disabling
parallel query. But we should probably explain that in a comment. Better if we
could come up testcases without disabling parallel query. Since parallel append
is now committed, may be it can help.

Removed.
 

+
+-- Check with multiple columns in GROUP BY, order in target-list is reversed
+EXPLAIN (COSTS OFF)
+SELECT c, a, count(*) FROM pagg_tab GROUP BY a, c;
+                   QUERY PLAN
+-------------------------------------------------
+ Append
+   ->  HashAggregate
+         Group Key: pagg_tab_p1.a, pagg_tab_p1.c
+         ->  Seq Scan on pagg_tab_p1
[ ... clipped ... ]
+(10 rows)

Why do we need this testcase?

Rajkumar, earlier reported one issue when order in the target list is reversed. Fix then required redesigning the GROUP key matching algorithm. So I think it will be good to have this testcase.


+
+SELECT c, sum(a) FROM pagg_tab WHERE 1 = 2 GROUP BY c;
+ c | sum
+---+-----
+(0 rows)

I think we also need a case when the child input relations are marked dummy and
then the parent is marked dummy. Just use a condition with partkey = <none of
list bounds>.

I have added the testcase for that. But don't you think both are same. When all input children are dummy, parent too marked as dummy, i.e. input relation is itself dummy.
Am I missing something here?
 

+
+-- Check with SORTED paths. Disable hashagg to get group aggregate

Suggest: "Test GroupAggregate paths by disabling hash aggregates."

+-- When GROUP BY clause matches with PARTITION KEY.

I don't think we need "with", and just extend the same sentence with "complete
aggregation is performed for each partition"

+-- Should choose full partition-wise aggregation path

suggest: "Should choose full partition-wise GroupAggregate plan", but I guess
with the above suggestion, this sentence is not needed.

+
+-- When GROUP BY clause not matches with PARTITION KEY.
+-- Should choose partial partition-wise aggregation path

Similar suggestions as above.

+-- No aggregates, but still able to perform partition-wise aggregates

That's a funny construction. May be "Test partition-wise grouping without any
aggregates".

We should try some output for this query.

+
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab GROUP BY a ORDER BY 1;
+                   QUERY PLAN
+-------------------------------------------------
+ Group
+   Group Key: pagg_tab_p1.a
+   ->  Merge Append
+         Sort Key: pagg_tab_p1.a
+         ->  Group
+               Group Key: pagg_tab_p1.a
+               ->  Sort
+                     Sort Key: pagg_tab_p1.a
+                     ->  Seq Scan on pagg_tab_p1
[ ... clipped ... ]
+(19 rows)

It's strange that we do not annotate partial grouping as Partial. Does not look
like a bug in your patch. Do we get similar output with parallel grouping?


Its partial aggregation only which is finalize at the top.
But since tere are no aggregates involved we create a GROUP path and not an AGG path. GROUP path has no partial annotations.
Yes, we see similar plan for parallel grouping too.

+
+-- ORDERED SET within aggregate
+EXPLAIN (COSTS OFF)
+SELECT a, sum(b order by a) FROM pagg_tab GROUP BY a ORDER BY 1, 2;
+                               QUERY PLAN
+------------------------------------------------------------------------
+ Sort
+   Sort Key: pagg_tab_p1.a, (sum(pagg_tab_p1.b ORDER BY pagg_tab_p1.a))
+   ->  GroupAggregate
+         Group Key: pagg_tab_p1.a
+         ->  Sort
+               Sort Key: pagg_tab_p1.a
+               ->  Append
+                     ->  Seq Scan on pagg_tab_p1
+                     ->  Seq Scan on pagg_tab_p2
+                     ->  Seq Scan on pagg_tab_p3
+(10 rows)

pagg_tab is partitioned by column c. So, not having it in GROUP BY
itself might produce this plan if Partial parallel aggregation is expensive.
When testing negative tests like this GROUP BY should always have the partition
key.

I deliberatly wanted to test when GROUP BY key does not match with the partition key so that partial aggregation is forced. But then we do have some limitiation to perform the aggregation in partial i.e.  ORDERED SET cannot be done in partial mode, this is the test to excerisize that code path.


In case of full aggregation, since all the rows that belong to the same group
come from the same partition, having an ORDER BY doesn't make any difference.
We should support such a case.

We do support this.
Added testcase for it.
 

+INSERT INTO pagg_tab1 SELECT i%30, i%20 FROM generate_series(0, 299, 2) i;
+INSERT INTO pagg_tab2 SELECT i%20, i%30 FROM generate_series(0, 299, 3) i;

spaces around % operator?

+-- When GROUP BY clause matches with PARTITION KEY.
+-- Should choose full partition-wise aggregation path.

Probably we should just club single table and join cases under one set of
comments rather than repeating those? Create the tables once at the beginning
of the test file and group together the queries under one comment head.

I think other way round. It will be good to have corresponding CREATE/INSERTs near the test queries to avoid lengthy scrolls to see the table structure and data. Each query has a comment to describe what it does.
 

+-- Disable mergejoin to get hash aggregate.
+SET enable_mergejoin TO false;

Why? We have tested that once.


Removed.
 
+
+-- When GROUP BY clause not matches with PARTITION KEY.
+-- Should choose partial partition-wise aggregation path.
+-- Also check with SORTED paths. Disable hashagg to get group aggregate.
+SET enable_hashagg TO false;

Same as above. Two of those clubbed together they will produce one hash and one
group plan. That will cover it.

For join queries plan with GroupAgg is not chosen which I wanted to have in a test-coverage. Thus kept this as is.
We have tested GroupAgg for single partitioned relations though. Let me know if you think this test is not necessary, I will remove it then.
 

+-- Check with LEFT/RIGHT/FULL OUTER JOINs which produces NULL values for
+-- aggregation
+-- LEFT JOIN, should produce partial partition-wise aggregation plan as
+-- GROUP BY is on nullable column
+EXPLAIN (COSTS OFF)
+SELECT b.y, sum(a.y) FROM pagg_tab1 a LEFT JOIN pagg_tab2 b ON a.x =
b.y GROUP BY 1 ORDER BY 1 NULLS LAST;

May be you should explicitly use GROUP BY b.y in all of these queries.

I actually wanted to test GROUP BY n case too. But as you said, in these queries I have used b.y and modified some other queries to have positional notation in GROUP BY.
 

+-- FULL JOIN, should produce partial partition-wise aggregation plan as
+-- GROUP BY is on nullable column

In case of a FULL JOIN partition keys from the joining relations land on
nullable side; there is no key on non-nulllable side, so an aggregation on top
of FULL JOIN will always be partial partition-wise aggregation.


Yep.
Do you want me to add this explanation in the comment? I don't think so.

+
+-- Empty relations on LEFT side, no partition-wise agg plan.

Suggest: Empty join relation because of empty outer side.  I don't think we are
writing a negative test to check whether partition-wise agg plan is not chosen.
We are testing the case when the join relation is empty.

I didn't get what exactly you mean here. However updated the comment as per your suggestion.
 

+
+EXPLAIN (COSTS OFF)
+SELECT a, c, sum(b), avg(c), count(*) FROM pagg_tab GROUP BY c, a,
(a+b)/2 HAVING sum(b) = 50 AND avg(c) > 25 ORDER BY 1, 2, 3;

Keep this or the previous one, both is overkill. I will vote for this one, but
it's upto you.

Removed previous one.
 

May be add a testcase with the partition keys themselves switched; output just
the plan.

I don't think we need this, instead modified the earlier one. Please have a look.
 

+-- Test with multi-level partitioning scheme
+-- Partition-wise aggregation is tried only on first level.
[ ... clipped ... ]
+-- Full aggregation as GROUP BY clause matches with PARTITION KEY

This seems to contradict with the previous comment. May be club them together
and say "Partition-wise aggregation with full aggregation only at the first
leve" and move that whole comment down.

+
+-- Partial aggregation as GROUP BY clause does not match with PARTITION KEY
+EXPLAIN (COSTS OFF)
+SELECT b, sum(a), count(*) FROM pagg_tab GROUP BY b ORDER BY 1, 2, 3;
+                           QUERY PLAN
+----------------------------------------------------------------
+ Sort
+   Sort Key: pagg_tab_p1.b, (sum(pagg_tab_p1.a)), (count(*))
+   ->  Finalize GroupAggregate
+         Group Key: pagg_tab_p1.b
+         ->  Sort
+               Sort Key: pagg_tab_p1.b
+               ->  Append
+                     ->  Partial HashAggregate
+                           Group Key: pagg_tab_p1.b
+                           ->  Seq Scan on pagg_tab_p1
+                     ->  Partial HashAggregate
+                           Group Key: pagg_tab_p2_s1.b
+                           ->  Append
+                                 ->  Seq Scan on pagg_tab_p2_s1
+                                 ->  Seq Scan on pagg_tab_p2_s2
+                     ->  Partial HashAggregate
+                           Group Key: pagg_tab_p3_s1.b
+                           ->  Append
+                                 ->  Seq Scan on pagg_tab_p3_s1
+                                 ->  Seq Scan on pagg_tab_p3_s2
+(20 rows)

Why aren't we seeing partial aggregation paths for level two and below
partitions?

In this version of the patch I have not recursed into next level.
Will work on it and submit changes in the next patch-set.
 

+
+-- Test on middle level partitioned table which is further partitioned on b.
+-- Full aggregation as GROUP BY clause matches with PARTITION KEY
+EXPLAIN (COSTS OFF)
+SELECT b, sum(a), count(*) FROM pagg_tab_p3 GROUP BY b ORDER BY 1, 2, 3;
+                            QUERY PLAN
+-------------------------------------------------------------------
+ Sort
+   Sort Key: pagg_tab_p3_s1.b, (sum(pagg_tab_p3_s1.a)), (count(*))
+   ->  Append
+         ->  HashAggregate
+               Group Key: pagg_tab_p3_s1.b
+               ->  Seq Scan on pagg_tab_p3_s1
+         ->  HashAggregate
+               Group Key: pagg_tab_p3_s2.b
+               ->  Seq Scan on pagg_tab_p3_s2
+(9 rows)
+
+SELECT b, sum(a), count(*) FROM pagg_tab_p3 GROUP BY b ORDER BY 1, 2, 3;
+ b | sum  | count
+---+------+-------
+ 0 | 2000 |   100
+ 1 | 2100 |   100
+ 2 | 2200 |   100
+ 3 | 2300 |   100
+ 4 | 2400 |   100
+ 5 | 2500 |   100
+ 6 | 2600 |   100
+ 7 | 2700 |   100
+ 8 | 2800 |   100
+ 9 | 2900 |   100
+(10 rows)

We should just remove this case, it's same as testing top-level partitioned
tables.

Removed.
 

+
+-- Full aggregation as GROUP BY clause matches with PARTITION KEY
+EXPLAIN (COSTS OFF)
+SELECT a, sum(b), array_agg(distinct c), count(*) FROM pagg_tab GROUP
BY a, b HAVING avg(b) < 3 ORDER BY 1, 2, 3;
+                                      QUERY PLAN
+--------------------------------------------------------------------------------------
+ Sort
+   Sort Key: pagg_tab_p1.a, (sum(pagg_tab_p1.b)), (array_agg(DISTINCT
pagg_tab_p1.c))
+   ->  Append
+         ->  GroupAggregate
+               Group Key: pagg_tab_p1.a, pagg_tab_p1.b
+               Filter: (avg(pagg_tab_p1.b) < '3'::numeric)
+               ->  Sort
+                     Sort Key: pagg_tab_p1.a, pagg_tab_p1.b
+                     ->  Seq Scan on pagg_tab_p1
+         ->  GroupAggregate
+               Group Key: pagg_tab_p2_s1.a, pagg_tab_p2_s1.b
+               Filter: (avg(pagg_tab_p2_s1.b) < '3'::numeric)
+               ->  Sort
+                     Sort Key: pagg_tab_p2_s1.a, pagg_tab_p2_s1.b
+                     ->  Append
+                           ->  Seq Scan on pagg_tab_p2_s1
+                           ->  Seq Scan on pagg_tab_p2_s2
+         ->  GroupAggregate
+               Group Key: pagg_tab_p3_s1.a, pagg_tab_p3_s1.b
+               Filter: (avg(pagg_tab_p3_s1.b) < '3'::numeric)
+               ->  Sort
+                     Sort Key: pagg_tab_p3_s1.a, pagg_tab_p3_s1.b
+                     ->  Append
+                           ->  Seq Scan on pagg_tab_p3_s1
+                           ->  Seq Scan on pagg_tab_p3_s2
+(25 rows)

Instead of an Append node appearing under GroupAggregate, I think we should
flatten all the partition scans for the subpartitions whose partition keys are
part of group keys and add GroupAggregate on top of each of such partition
scans.

Yes. As explained earlier, will do that as a separate patch.


+-- Parallelism within partition-wise aggregates
+RESET max_parallel_workers_per_gather;
+SET min_parallel_table_scan_size TO '8kB';
+SET parallel_setup_cost TO 0;
+INSERT INTO pagg_tab_para SELECT i%30, i%20 FROM generate_series(0, 29999) i;

spaces around % operator?

+SHOW max_parallel_workers_per_gather;
+ max_parallel_workers_per_gather
+---------------------------------
+ 2

Why do we need this?


Removed.
 
+
+-- When GROUP BY clause matches with PARTITION KEY.
+EXPLAIN (COSTS OFF)
+SELECT x, sum(y), avg(y), count(*) FROM pagg_tab_para GROUP BY x
HAVING avg(y) < 7 ORDER BY 1, 2, 3;
+                                      QUERY PLAN
+--------------------------------------------------------------------------------------
+ Sort
+   Sort Key: pagg_tab_para_p1.x, (sum(pagg_tab_para_p1.y)),
(avg(pagg_tab_para_p1.y))
+   ->  Append
[ ... clipped ...]
+         ->  Finalize GroupAggregate
+               Group Key: pagg_tab_para_p3.x
+               Filter: (avg(pagg_tab_para_p3.y) < '7'::numeric)
+               ->  Sort
+                     Sort Key: pagg_tab_para_p3.x
+                     ->  Gather
+                           Workers Planned: 2
+                           ->  Partial HashAggregate
+                                 Group Key: pagg_tab_para_p3.x
+                                 ->  Parallel Seq Scan on pagg_tab_para_p3
[ ... clipped ... ]
+-- When GROUP BY clause not matches with PARTITION KEY.
+EXPLAIN (COSTS OFF)
+SELECT y, sum(x), avg(x), count(*) FROM pagg_tab_para GROUP BY y
HAVING avg(x) < 12 ORDER BY 1, 2, 3;
+                                      QUERY PLAN
+--------------------------------------------------------------------------------------
+ Sort
+   Sort Key: pagg_tab_para_p1.y, (sum(pagg_tab_para_p1.x)),
(avg(pagg_tab_para_p1.x))
+   ->  Finalize HashAggregate
+         Group Key: pagg_tab_para_p1.y
[ ... clipped ... ]
+               ->  Gather
+                     Workers Planned: 2
+                     ->  Partial HashAggregate
+                           Group Key: pagg_tab_para_p3.y
+                           ->  Parallel Seq Scan on pagg_tab_para_p3

Per a prior discussion on this thread, we shouldn't produce such plans;
Parallel Append instead?

Yes. We do get a Parallel Append path now.
For full aggregation, normal Append plan in chosen over Append, but we do create that.


+SET enable_partition_wise_agg to true;

May be just enable it at the beginning instead of enabling and disabling twice?

Done as you said. However, this affected one more testcase from partition_join.sql. Updated expected output for that too.

Comments for which I have not responded are all done.

All these fixes are part of v9 patchset.

Thanks Ashutosh for detailed reviews so far.
 

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Thanks
--
Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jeevan Chalke
Дата:
Сообщение: Re: [HACKERS] Partition-wise aggregation/grouping
Следующее
От: Andrey Borodin
Дата:
Сообщение: Re: [HACKERS] wrong t_bits alignment in pageinspect