Re: PATCH: decreasing memory needlessly consumed by array_agg

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: PATCH: decreasing memory needlessly consumed by array_agg
Дата
Msg-id 20707.1396372100@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: PATCH: decreasing memory needlessly consumed by array_agg  (Tomas Vondra <tv@fuzzy.cz>)
Ответы Re: PATCH: decreasing memory needlessly consumed by array_agg  (Tomas Vondra <tv@fuzzy.cz>)
Re: PATCH: decreasing memory needlessly consumed by array_agg  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers
Tomas Vondra <tv@fuzzy.cz> writes:
> I've been thinking about it a bit more and maybe the doubling is not
> that bad idea, after all.

It is not.  There's a reason why that's our standard behavior.

> The "current" array_agg however violates some of the assumptions
> mentioned above, because it
> (1) pre-allocates quite large number of items (64) at the beginning,
>     resulting in ~98% of memory being "wasted" initially
> (2) allocates one memory context per group, with 8kB initial size, so
>     you're actually wasting ~99.999% of the memory
> (3) thanks to the dedicated memory contexts, the doubling is pretty
>     much pointless up until you cross the 8kB boundary

> IMNSHO these are the issues we really should fix - by lowering the
> initial element count (64->4) and using a single memory context.

The real issue here is that all those decisions are perfectly reasonable
if you expect that a large number of values will get aggregated --- and
even if you don't expect that, they're cheap insurance in simple cases.
It only gets to be a problem if you have a lot of concurrent executions
of array_agg, such as in a grouped-aggregate query.  You're essentially
arguing that in the grouped-aggregate case, it's better to optimize on
the assumption that only a very small number of values will get aggregated
(per hash table entry) --- which is possibly reasonable, but the argument
that it's okay to pessimize the behavior for other cases seems pretty
flimsy from here.

Actually, though, the patch as given outright breaks things for both the
grouped and ungrouped cases, because the aggregate no longer releases
memory when it's done.  That's going to result in memory bloat not
savings, in any situation where the aggregate is executed repeatedly.

I think a patch that stood a chance of getting committed would need to
detect whether the aggregate was being called in simple or grouped
contexts, and apply different behaviors in the two cases.  And you
can't just remove the sub-context without providing some substitute
cleanup mechanism.  Possibly you could keep the context but give it
some much-more-miserly allocation parameters in the grouped case.
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: PATCH: decreasing memory needlessly consumed by array_agg
Следующее
От: Robert Haas
Дата:
Сообщение: Re: psql \d+ and oid display