Re: Spilling hashed SetOps and aggregates to disk

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Spilling hashed SetOps and aggregates to disk
Дата
Msg-id 8f16eb97-09b0-e96b-ff78-d9677677909d@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: Spilling hashed SetOps and aggregates to disk  (David Rowley <david.rowley@2ndquadrant.com>)
Ответы RE: Re: Spilling hashed SetOps and aggregates to disk  (serge@rielau.com)
Список pgsql-hackers
On 06/05/2018 03:17 PM, David Rowley wrote:
> On 6 June 2018 at 01:09, Andres Freund <andres@anarazel.de> wrote:
>> On 2018-06-06 01:06:39 +1200, David Rowley wrote:
>>> My concern is that only accounting memory for the group and not the
>>> state is only solving half the problem. It might be fine for
>>> aggregates that don't stray far from their aggtransspace, but for the
>>> other ones, we could still see OOM.
>>
>>> If solving the problem completely is too hard, then a half fix (maybe
>>> 3/4) is better than nothing, but if we can get a design for a full fix
>>> before too much work is done, then isn't that better?
>>
>> I don't think we actually disagree.  I was really primarily talking
>> about the case where we can't really do better because we don't have
>> serialization support.  I mean we could just rescan from scratch, using
>> a groupagg, but that obviously sucks.
> 
> I don't think we do. To take yours to the 100% solution might just
> take adding the memory accounting to palloc that Jeff proposed a few
> years ago and use that accounting to decide when we should switch
> method.
> 
> However, I don't quite fully recall how the patch accounted for
> memory consumed by sub-contexts and if getting the entire
> consumption required recursively looking at subcontexts. If that's
> the case then checking the consumption would likely cost too much if
> it was done after each tuple was aggregated.

Yeah, a simple wrapper would not really fix the issue, because 
allocating memory in the subcontext would not update the accounting 
information. So we'd be oblivious of possibly large amounts of memory. I 
don't quite see how is this related to (not) having serialization 
support, as it's more about not knowing we already hit the memory limit.


That being said, I don't think array_agg actually has the issue, at 
least since b419865a814abbca12bdd6eef6a3d5ed67f432e1 (it does behave 
differently in aggregate and non-aggregate contexts, IIRC).

I don't know how common this issue is, though. I don't think any 
built-in aggregates create additional contexts, but I may be wrong. But 
we have this damn extensibility thing, where users can write their own 
aggregates ...

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: Spilling hashed SetOps and aggregates to disk
Следующее
От: Amit Kapila
Дата:
Сообщение: install doesn't work on HEAD