Re: PATCH: two slab-like memory allocators

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: PATCH: two slab-like memory allocators
Дата
Msg-id 68b837a3-ce55-f9a2-f685-9eafac536f21@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: PATCH: two slab-like memory allocators  (Petr Jelinek <petr@2ndquadrant.com>)
Ответы Re: PATCH: two slab-like memory allocators  (Petr Jelinek <petr@2ndquadrant.com>)
Список pgsql-hackers
On 09/25/2016 08:48 PM, Petr Jelinek wrote:
> Hi Tomas,
>
> On 02/08/16 17:44, Tomas Vondra wrote:
>>
>> This patch actually includes two new memory allocators (not one). Very
>> brief summary (for more detailed explanation of the ideas, see comments
>> at the beginning of slab.c and genslab.c):
>>
>> Slab
>> ----
>> * suitable for fixed-length allocations (other pallocs fail)
>> * much simpler than AllocSet (no global freelist management etc.)
>> * free space is tracked per block (using a simple bitmap)
>> * which allows freeing the block once all chunks are freed (AllocSet
>> will hold the memory forever, in the hope of reusing it)
>>
>> GenSlab
>> -------
>> * suitable for non-fixed-length allocations, but with chunks of mostly
>> the same size (initially unknown, the context will tune itself)
>> * a combination AllocSet and Slab (or a sequence of Slab allocators)
>> * the goal is to do most allocations in Slab context
>> * there's always a single 'current' Slab context, and every now and and
>> then it's replaced with a new generation (with the chunk size computed
>> from recent requests)
>> * the AllocSet context is used for chunks too large for current Slab
>>
>
> So it's just wrapper around the other two allocators to make this
> usecase easier to handle. Do you expect there will be use for the
> GenSlab eventually outside of the reoderbuffer?
>

Yes, you might say it's just a wrapper around the other two allocators,
but it *also* includes the logic of recomputing chunk size etc.

I haven't thought about other places that might benefit from these new 
allocators very much - in general, it's useful for places that produce a 
stream of equally-sized items (GenSlab relaxes this), that are pfreed() 
in ~FIFO manner (i.e. roughly in the order of allocation).


>> So none of this is meant as a universal replacement of AllocSet,
>> but in the suitable cases the results seem really promising. For
>> example for the simple test query in [1], the performance
>> improvement is this:
>>
>>         N |   master |  patched
>>    -----------------------------
>>     10000 |    100ms |    100ms
>>     50000 |  15000ms |    350ms
>>    100000 | 146000ms |    700ms
>>    200000 |        ? |   1400ms
>>
>> That's a fairly significant improvement, and the submitted version
>> of the patches should perform even better (~2x, IIRC).
>>
>
> I agree that it improves performance quite nicely and that
> reoderbuffer could use this.
>
> About the code. I am not quite sure that this needs to be split into
> two patches especially if 1/3 of the second patch is the removal of
> the code added by the first one and otherwise it's quite small and
> straightforward. That is unless you expect the GenSlab to not go in.
>

I don't know - it seemed natural to first introduce the Slab, as it's 
easier to discuss it separately, and it works for 2 of the 3 contexts 
needed in reorderbuffer.

GenSlab is an improvement of Slab, or rather based on it, so that it 
works for the third context. And it introduces some additional ideas 
(particularly the generational design, etc.)

Of course, none of this means it has to be committed in two separate 
chunks, or that I don't expect GenSlab to get committed ...

> Slab:
> In general it seems understandable, the initial description helps to
> understand what's happening well enough.
>
> One thing I don't understand however is why the freelist is both
> array and doubly linked list and why there is new implementation of
> said doubly linked list given that we have dlist.
>

Hmm, perhaps that should be explained better.

In AllocSet, we only have a global freelist of chunks, i.e. we have a 
list of free chunks for each possible size (there's 11 sizes, starting 
with 8 bytes and then doubling the size). So freelist[0] is a list of 
free 8B chunks, freelist[1] is a list of free 16B chunks, etc.

In Slab, the freelist has two levels - first there's a bitmap on each 
block (which is possible, as the chunks have constant size), tracking 
which chunks of that particular block are free. This makes it trivial to 
check that all chunks on the block are free, and free the whole block 
(which is impossible with AllocSet).

Second, the freelist at the context level tracks blocks with a given 
number of free chunks - so freelist[0] tracks completely full blocks, 
freelist[1] is a list of blocks with 1 free chunk, etc. This is used to 
reuse space on almost full blocks first, in the hope that some of the 
less full blocks will get completely empty (and freed to the OS).

Is that clear?

>> +/*
>> + * SlabContext is our standard implementation of MemoryContext.
>> + *
>
> Really?
>

Meh, that's clearly a bogus comment.

>> +/*
>> + * SlabChunk
>> + *        The prefix of each piece of memory in an SlabBlock
>> + *
>> + * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h.
>> + */
>
> Is this true? Why? And if it is then why doesn't the SlabChunk
> actually match the StandardChunkHeader?

It is true, a lot of stuff in MemoryContext infrastructure relies on 
that. For example when you do pfree(ptr), we actually do something like
   header = (StandardChunkHeader*)(ptr - sizeof(StandardChunkHeader))

to get the chunk header - which includes pointer to the memory context 
and other useful stuff.

This also means we can put additional fields before StandardChunkHeader 
as that does not break this pointer arithmetic, i.e. SlabChunkData is 
effectively defined like this:

typedef struct SlabChunkData
{    /* block owning this chunk */    void *block;
    /* standard header */    StandardChunkHeader header;
} SlabChunkData;

>
>> +#define SlabPointerIsValid(pointer) PointerIsValid(pointer)
>
> What's the point of this given that it's defined in the .c file?
>

Meh, I've copied this from aset.c, but I see it's useless there too.

>
>> +static void *
>> +SlabAlloc(MemoryContext context, Size size)
>> +{
>> +    Slab    set = (Slab) context;
>> +    SlabBlock    block;
>> +    SlabChunk    chunk;
>> +    int            idx;
>> +
>> +    AssertArg(SlabIsValid(set));
>> +
>> +    Assert(size == set->chunkSize);
>
> I wonder if there should be stronger protection (ie, elog) for the size
> matching.
>

Perhaps, I'm not opposed to that.

>
>> +static void *
>> +SlabRealloc(MemoryContext context, void *pointer, Size size)
>> +{
>> +    Slab    set = (Slab)context;
>> +
>> +    /* can't do actual realloc with slab, but let's try to be gentle */
>> +    if (size == set->chunkSize)
>> +        return pointer;
>> +
>> +    /* we can't really do repalloc with this allocator */
>> +    Assert(false);
>
> This IMHO should definitely be elog.
>

Yeah, you're probably right.

>
>> +static void
>> +SlabCheck(MemoryContext context)
>> +{
>> +    /* FIXME */
>> +}
>
> Do you plan to implement this interface?
>

Yes, although I'm not sure what checks should go there. The only thing I 
can think of right now is checking that the number of free chunks on a 
block (according to the bitmap) matches the freelist index.

>> +#define SLAB_DEFAULT_BLOCK_SIZE        8192
>> +#define SLAB_LARGE_BLOCK_SIZE        (8 * 1024 * 1024)
>
> I am guessing this is based on max_cached_tuplebufs? Maybe these
> could be written with same style?
>

Not sure I understand what you mean by "based on"? I don't quite 
remember how I came up with those constants, but I guess 8kB and 8MB 
seemed like good values.

Also, what style you mean? I've used the same style as for ALLOCSET_* 
constants in the same file.

> GenSlab:
>
> Since this is relatively simple wrapper it looks mostly ok to me. The
> only issue I have here is that I am not quite sure about those FIXME
> functions (Free, Realloc, GetChunkSpace). It's slightly weird to call to
> mcxt but I guess the alternative there is to error out so this is
> probably preferable. Would want to hear other opinions here.
>

Yeah, I'd like to get some opinions on that too - that's why I left 
there the FIXMEs, actually.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: Better tracking of free space during SP-GiST index build
Следующее
От: Rahila Syed
Дата:
Сообщение: Re: Optimization for lazy_scan_heap