WIP: Aggregation push-down
От | Antonin Houska |
---|---|
Тема | WIP: Aggregation push-down |
Дата | |
Msg-id | 9666.1491295317@localhost обсуждение исходный текст |
Список | pgsql-hackers |
This is a new version of the patch I presented in [1]. A new thread seems appropriate because the current version can aggregate both base relations and joins, so the original subject would no longer match. There's still work to do but I'd consider the patch complete in terms of concept. A few things worth attention for those who want to look into the code: * I've abandoned the concept of aggmultifn proposed in [1], as it doesn't appear to be very useful. That implies that a "grouped join" can be formed in 2 ways: 1) join a grouped relation to a "plain" (i.e. non-grouped) one, 2) join 2 plain relations and aggregate the result. However, w/o the aggmultifn we can't join 2 grouped relations. * GroupedVar type is used to propagate the result of partial aggregation from to the top-level join. It's conceptually very similar to PlaceHolderVar. * Although I intended to use the "unique join" feature [2], I postponed it so far. The point is that [2] does conflict with my patch and thus I'd have to rebase the patch more often. Anyway, the impact of [2] on aggregation finalization (i.e. possible avoidance of the "finalize aggregate node" setup) is not really specific to my patch. * Scan of base relation or join result can be partially aggregated for 2 reasons: 1) it makes the whole plan cheaper because the aggregation takes place on remote node and thus the amount of data to be transferred via network is significanlty reduced, 2) aggregate functions are rather expensive so it makes sense to evaluate them by multiple parallel workers. The patch contains both of these features as they are hard to separate from each other. While 1) needs additional work on postgres_fdw, scripts to simulate 2) are attached. Planner settings are such that cost of expression evaluation is significant, so that it's worth to engage multiple parallel workers. In my environment it yields the following output: Parallel Finalize HashAggregate Group Key: a.i -> Gather Merge Workers Planned: 4 -> Merge Join Merge Cond: (b.parent = a.i) -> Sort Sort Key: b.parent -> Parallel Partial HashAggregate Group Key: b.parent -> Hash Join Hash Cond: ((b.parent = c.parent) AND (b.j = c.k)) -> Parallel Seq Scan on b -> Hash -> Seq Scan on c -> Sort Sort Key: a.i -> Seq Scan on a Feedback is appreciated. [1] https://www.postgresql.org/message-id/29111.1483984605%40localhost [2] https://commitfest.postgresql.org/13/859/ -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at
Вложения
В списке pgsql-hackers по дате отправления: