Re: Final Patch for GROUPING SETS
От | Noah Misch |
---|---|
Тема | Re: Final Patch for GROUPING SETS |
Дата | |
Msg-id | 20141221210005.GA1864976@tornado.leadboat.com обсуждение исходный текст |
Ответ на | Re: Final Patch for GROUPING SETS (Andrew Gierth <andrew@tao11.riddles.org.uk>) |
Ответы |
Re: Final Patch for GROUPING SETS
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-hackers |
On Sat, Dec 13, 2014 at 04:37:48AM +0000, Andrew Gierth wrote: > >>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: > > >> I'd already explained in more detail way back when we posted the > >> patch. But to reiterate: the ChainAggregate nodes pass through > >> their input data unchanged, but on group boundaries they write > >> aggregated result rows to a tuplestore shared by the whole > >> chain. The top node returns the data from the tuplestore after its > >> own output is completed. > > Tom> That seems pretty grotty from a performance+memory consumption > Tom> standpoint. At peak memory usage, each one of the Sort nodes > Tom> will contain every input row, > > Has this objection ever been raised for WindowAgg, which has the same > issue? I caution against using window function performance as the template for GROUPING SETS performance goals. The benefit of GROUPING SETS compared to its UNION ALL functional equivalent is 15% syntactic pleasantness, 85% performance opportunities. Contrast that having window functions is great even with naive performance, because they enable tasks that are otherwise too hard in SQL. > Tom> In principle, with the CTE+UNION approach I was suggesting, the > Tom> peak memory consumption would be one copy of the input rows in > Tom> the CTE's tuplestore plus one copy in the active branch's Sort > Tom> node. I think a bit of effort would be needed to get there (ie, > Tom> shut down one branch's Sort node before starting the next, > Tom> something I'm pretty sure doesn't happen today). > > Correct, it doesn't. > > However, I notice that having ChainAggregate shut down its input would > also have the effect of bounding the memory usage (to two copies, > which is as good as the append+sorts+CTE case). Agreed, and I find that more promising than the CTE approach. Both strategies require temporary space covering two copies of the input data. (That, or you accept rescanning the original input.) The chained approach performs less I/O. Consider "SELECT count(*) FROM t GROUP BY GROUPING SETS (a, b)", where pg_relation_size(t) >> RAM. I/O consumed with the chained approach: read table write tuplesort 1 read tuplesort 1 write tuplesort 2 read tuplesort 2 I/O consumed with the CTE approach: read table write CTE read CTE write tuplesort 1 read tuplesort 1 read CTE write tuplesort 2 read tuplesort 2 Tom rightly brought up the space requirements for result rows. The CTE approach naturally avoids reserving space for that. However, I find it a safe bet to optimize GROUPING SETS for input >> result. Reserving temporary space for result rows to save input data I/O is a good trade. We don't actually need to compromise; one can imagine a GroupAggregateChain plan node with a sortChain list that exhibits the efficiencies of both. I'm fine moving forward with the cross-node tuplestore, though. The elephant in the performance room is the absence of hash aggregation. I agree with your decision to make that a follow-on patch, but the project would be in an awkward PR situation if 9.5 has GroupAggregate-only GROUPING SETS. I may argue to #ifdef-out the feature rather than release that way. We don't need to debate that prematurely, but keep it in mind while planning. Thanks, nm
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Andrew DunstanДата:
Сообщение: Re: Add min and max execute statement time in pg_stat_statement