Re: [HACKERS] Parallel Aggregation support for aggregate functionsthat use transitions not implemented for array_agg

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: [HACKERS] Parallel Aggregation support for aggregate functionsthat use transitions not implemented for array_agg
Дата
Msg-id 269bca9e-9248-2d22-82be-6e82bbc101b3@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Parallel Aggregation support for aggregate functions that use transitions not implemented for array_agg  ("Regina Obe" <lr@pcorp.us>)
Список pgsql-hackers
Hi,

On 6/7/17 5:52 AM, Regina Obe wrote:
>> On 6/6/17 13:52, Regina Obe wrote:
>>> It seems CREATE  AGGREGATE was expanded in 9.6 to support
>>> parallelization of aggregate functions using transitions, with the
>>> addition of serialfunc and deserialfunc to the aggregate definitions.
>>>
>>> https://www.postgresql.org/docs/10/static/sql-createaggregate.html
>>>
>>> I was looking at the PostgreSQL 10 source code for some example usages
>>> of this and was hoping that array_agg and string_agg would support the feature.
> 
>> I'm not sure how you would parallelize these, since in most uses
>> you want to have a deterministic output order.
> 
>> -- 
>> Peter Eisentraut              http://www.2ndQuadrant.com/
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> 
> Good point.  If that's the reason it wasn't done, that's good just wasn't sure.
> 
> But if you didn't have an ORDER BY in your aggregate usage, and you
> did have those transition functions, it shouldn't be any different from
> any other use case right?
> I imagine you are right that most folks who use array_agg and
> string_agg usually combine it with array_agg(... ORDER BY ..)
> 

I think that TL had in mind is something like
    SELECT array_agg(x) FROM (       SELECT x FROM bar ORDER BY y    ) foo;

i.e. a subquery producing the data in predictable order.
>
> My main reason for asking is that most of the PostGIS geometry and
> raster aggregate functions use transitions and were patterned after
> array agg.
> 

> In the case of PostGIS the sorting is done internally and really
> only to expedite take advantage of things like cascaded union
> algorithms.
> That is always done though (so even if each worker does it on just it's
> batch that's still better than having only one worker).
> So I think it's still very beneficial to break into separate jobs
> since in the end the gather, will have say 2 biggish geometries or 2
> biggish rasters to union if you have 2 workers which is still better
> than having a million smallish geometries/rasters to union
I'm not sure I got your point correctly, but if you can (for example) 
sort the per-worker results as part of the "serialize" function, and 
benefit from that while combining that in the gather, then sure, that 
should be a huge win.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Gierth
Дата:
Сообщение: Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table
Следующее
От: Tatsuo Ishii
Дата:
Сообщение: Re: [HACKERS] improve release-note for pg_current_logfile()