Re: [HACKERS] Parallel Aggregation support for aggregate functions that use transitions not implemented for array_agg

Поиск

Список

Период

Сортировка

От	Regina Obe
Тема	Re: [HACKERS] Parallel Aggregation support for aggregate functions that use transitions not implemented for array_agg
Дата	7 июня 2017 г. 06:52:08
Msg-id	000101d2df41$6feda800$4fc8f800$@pcorp.us обсуждение исходный текст
Ответ на	Re: [HACKERS] Parallel Aggregation support for aggregate functionsthat use transitions not implemented for array_agg (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Ответы	Re: [HACKERS] Parallel Aggregation support for aggregate functionsthat use transitions not implemented for array_agg
Список	pgsql-hackers

Дерево обсуждения

> On 6/6/17 13:52, Regina Obe wrote:
>> It seems CREATE  AGGREGATE was expanded in 9.6 to support
>> parallelization of aggregate functions using transitions, with the
>> addition of serialfunc and deserialfunc to the aggregate definitions.
>>
>> https://www.postgresql.org/docs/10/static/sql-createaggregate.html
>>
>> I was looking at the PostgreSQL 10 source code for some example usages
>> of this and was hoping that array_agg and string_agg would support the feature.

> I'm not sure how you would parallelize these, since in most uses you want to have a deterministic output order.

>--
> Peter Eisentraut              http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Good point.  If that's the reason it wasn't done, that's good just wasn't sure.

But if you didn't have an ORDER BY in your aggregate usage, and you did have those transition functions, it shouldn't
beany different from any other use case right? 
I imagine you are right that most folks who use array_agg and string_agg usually combine it with array_agg(... ORDER BY
..)

My main reason for asking is that most of the PostGIS geometry and raster aggregate functions use transitions and were
patternedafter array agg. 

In the case of PostGIS the sorting is done internally and really only to expedite take advantage of things like
cascadedunion algorithms.  That is always done though (so even if each worker does it on just it's batch that's still
betterthan having only one worker). 
So I think it's still very beneficial to break into separate jobs since in the end the gather, will have  say 2 biggish
geometriesor 2 biggish rasters to union if you have 2 workers which is still better than having a million smallish
geometries/rastersto union 

Split Union

Worker 1:

Parallel agg (internal sort geoms by box)  - Union

Worker 2:
Parallel Agg (internal sort geoms )  - Union

Gather  Union(union, union) internal sort.

Thanks,
Regina

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [HACKERS] Parallel Aggregation support for aggregate functions that use transitions not implemented for array_agg