Re: Parallel Aggregates for string_agg and array_agg

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Parallel Aggregates for string_agg and array_agg
Дата
Msg-id fdbf52dc-2e80-f5bc-5d43-b66a2deba021@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: Parallel Aggregates for string_agg and array_agg  ("Tels" <nospam-pg-abuse@bloodgate.com>)
Список pgsql-hackers

On 04/05/2018 09:10 PM, Tels wrote:
> Moin,
> 
> On Wed, April 4, 2018 11:41 pm, David Rowley wrote:
>> Hi Tomas,
>>
>> Thanks for taking another look.
>>
>> On 5 April 2018 at 07:12, Tomas Vondra <tomas.vondra@2ndquadrant.com>
>> wrote:
>>> Other than that, the patch seems fine to me, and it's already marked as
>>> RFC so I'll leave it at that.
>>
>> Thanks.
> 
> I have one more comment - sorry for not writing sooner, the flu got to me ...
> 
> Somewhere in the code there is a new allocation of memory when the string
> grows beyond the current size - and that doubles the size. This can lead
> to a lot of wasted space (think: constructing a string that is a bit over
> 1 Gbyte, which would presumable allocate 2 GByte).
> 

I don't think we support memory chunks above 1GB, so that's likely going
to fail anyway. See

  #define MaxAllocSize       ((Size) 0x3fffffff) /* 1 gigabyte - 1 */
  #define AllocSizeIsValid(size)     ((Size) (size) <= MaxAllocSize)

But I get your point - we may be wasting space here. But that's hardly
something this patch should mess with - that's a more generic allocation
question.

> The same issue happens when each worker allocated 512 MByte for a 256
> Mbyte + 1 result.
> 
> IMHO a factor of like 1.4 or 1.2 would work better here - not sure what
> the current standard in situations like this in PG is.
> 

With a 2x scale factor, we only waste 25% of the space on average.
Consider that you're growing because you've reached the current size,
and you double the size - say, from 1MB to 2MB. But the 1MB wasted space
is the worst case - in reality we'll use something between 1MB and 2MB,
so 1.5MB on average. At which point we've wasted just 0.5MB, i.e. 25%.

That sounds perfectly reasonable to me. Lower factor would be more
expensive in terms of repalloc, for example.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


В списке pgsql-hackers по дате отправления:

Предыдущее
От: John Naylor
Дата:
Сообщение: Re: WIP: a way forward on bootstrap data
Следующее
От: Tom Lane
Дата:
Сообщение: Re: WIP: a way forward on bootstrap data