On 02/06/2016 08:39 PM, Andres Freund wrote:
> On 2016-02-06 20:34:07 +0100, Tomas Vondra wrote:
>> On 02/06/2016 06:47 PM, Tom Lane wrote:
>>> * It incorporates a bespoke reimplementation of palloc into hash
>>> joins. This is not a maintainable/scalable way to go about reducing
>>> memory consumption. It should have been done with an arm's-length API
>>> to a new type of memory context, IMO (probably one that supports
>>> palloc but not pfree, repalloc, or any chunk-header-dependent
>>> operations).
>>
>> Hmmm, interesting idea. I've been thinking about doing this using a memory
>> context when writing the dense allocation, but was stuck in the "must
>> support all operations" mode, making it impossible. Disallowing some of the
>> operations would make it a viable approach, I guess.
>
> FWIW, I've done that at some point. Noticeable speedups (that's what
> I cared about), but a bit annoying to use. There's many random
> pfree()s around, and then there's MemoryContextContains(),
> GetMemoryChunkContext(), GetMemoryChunkSpace() - which all are
> pretty fundamentally incompatible with such an allocator. I ended up
> having a full header when assertions are enabled, to be able to
> detect usage of these functions and assert out.
>
> I didn't concentrate on improving memory usage, but IIRC it was even
> noticeable for some simpler things.
I think the hassle is not that bad when most of the fragments have the
same life cycle. With hashjoin that's almost exactly the case, except
when we realize we need to increase the number of buckets - in that case
we need to split the set of accumulated tuples in two.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services