Re: PATCH: two slab-like memory allocators

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: PATCH: two slab-like memory allocators
Дата
Msg-id efd47210-cf95-d7ee-e984-a873f76596a8@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: PATCH: two slab-like memory allocators  (Andres Freund <andres@anarazel.de>)
Ответы Re: PATCH: two slab-like memory allocators
Re: PATCH: two slab-like memory allocators
Список pgsql-hackers
On 11/12/2016 08:12 PM, Andres Freund wrote:
> Hi,
>
> Subject: [PATCH 1/2] slab allocator
>
> diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
> index 6ad7e7d..520f295 100644
> --- a/src/backend/replication/logical/reorderbuffer.c
> +++ b/src/backend/replication/logical/reorderbuffer.c
>
> I'd rather have that in a separate followup commit...
>
>
> + * IDENTIFICATION
> + *      src/backend/utils/mmgr/slab.c
> + *
> + *
> + *    The constant allocation size allows significant simplification and various
> + *    optimizations that are not possible in AllocSet. Firstly, we can get rid
> + *    of the doubling and carve the blocks into chunks of exactly the right size
> + *    (plus alignment), not wasting memory.
>
> References to AllocSet aren't necessarily a good idea, they'll quite
> possibly get out of date. The argument can be quite easily be made
> without referring to a concrete reference to behaviour elsewhere.
>

Yeah, that's probably true.

> + *
> + *    At the context level, we use 'freelist' array to track blocks grouped by
> + *    number of free chunks. For example freelist[0] is a list of completely full
> + *    blocks, freelist[1] is a block with a single free chunk, etc.
>
> Hm. Those arrays are going to be quite large for small allocations w/
> big blocks (an imo sensible combination). Maybe it'd make more sense to
> model it as a linked list of blocks? Full blocks are at one end, empty
> ones at the other?

So there'd be one huge list of blocks, sorted by the number of empty 
chunks? Hmm, that might work I guess.

I don't think the combination of large blocks with small allocations is 
particularly sensible, though - what exactly would be the benefit of 
such combination? I would even consider enforcing some upper limit on 
the number of chunks per block - say, 256, for example.

>
>
> + *    About MEMORY_CONTEXT_CHECKING:
> + *
> + *    Since we usually round request sizes up to the next power of 2, there
> + *    is often some unused space immediately after a requested data
> area.
>
> I assume the "round up" stuff is copy-paste?
>

Yeah, sorry about that.

>
> + *    Thus, if someone makes the common error of writing past what they've
> + *    requested, the problem is likely to go unnoticed ... until the day when
> + *    there *isn't* any wasted space, perhaps because of different memory
> + *    ...
> + *
> + *    See also the cooperating Valgrind client requests in mcxt.c.
>
> I think we need a preliminary patch moving a lot of this into something
> like mcxt_internal.h. Copying this comment, and a lot of the utility
> functions, into every memory context implementation is a bad pattern.
>

Yes, I planned to do that for the next version of patch. Laziness.

>
> +typedef struct SlabBlockData *SlabBlock;        /* forward reference */
> +typedef struct SlabChunkData *SlabChunk;
>
> Can we please not continue hiding pointers behind typedefs? It's a bad
> pattern, and that it's fairly widely used isn't a good excuse to
> introduce further usages of it.
>

Why is it a bad pattern?

> +/*
> + * SlabContext is a specialized implementation of MemoryContext.
> + */
> +typedef struct SlabContext
> +{
> +    MemoryContextData header;    /* Standard memory-context fields */
> +    /* Allocation parameters for this context: */
> +    Size        chunkSize;        /* chunk size */
> +    Size        fullChunkSize;    /* chunk size including header and alignment */
> +    Size        blockSize;        /* block size */
> +    int            chunksPerBlock;    /* number of chunks per block */
> +    int            minFreeChunks;    /* min number of free chunks in any block */
> +    int            nblocks;        /* number of blocks allocated */
> +    /* Info about storage allocated in this context: */
> +    SlabBlock    freelist[1];    /* free lists (block-level) */
>
> I assume this is a variable-length array? If so, that a) should be
> documented b) use FLEXIBLE_ARRAY_MEMBER as length - not doing so
> actually will cause compiler warnings and potential misoptimizations.
>

Will fix, thanks.

> +/*
> + * SlabBlockData
> + *        Structure of a single block in SLAB allocator.
> + *
> + * slab: context owning this block
>
> What do we need this for?
>

You're right the pointer to the owning context is unnecessary - there's 
nothing like "standard block header" and we already have the pointer in 
the standard chunk header. But maybe keeping the pointer at least with 
MEMORY_CONTEXT_CHECKING would be a good idea?

> + * prev, next: used for doubly-linked list of blocks in global freelist
>
> I'd prefer using an embedded list here (cf. ilist.h).
>

Makes sense.

> +/*
> + * SlabChunk
> + *        The prefix of each piece of memory in an SlabBlock
> + *
> + * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h.
> + * However it's possible to fields in front of the StandardChunkHeader fields,
> + * which is used to add pointer to the block.
> + */
>
> Wouldn't that be easier to enforce - particularly around alignment
> requirements - by embedding a StandardChunkHeader here? That'd also
> avoid redundancies.
>

Also makes sense.

> +/* ----------
> + * Debug macros
> + * ----------
> + */
> +#ifdef HAVE_ALLOCINFO
> +#define SlabFreeInfo(_cxt, _chunk) \
> +            fprintf(stderr, "SlabFree: %s: %p, %d\n", \
> +                (_cxt)->header.name, (_chunk), (_chunk)->size)
> +#define SlabAllocInfo(_cxt, _chunk) \
> +            fprintf(stderr, "SlabAlloc: %s: %p, %d\n", \
> +                (_cxt)->header.name, (_chunk), (_chunk)->size)
> +#else
> +#define SlabFreeInfo(_cxt, _chunk)
> +#define SlabAllocInfo(_cxt, _chunk)
> +#endif
>
> Do we really have to copy that stuff from aset.c? Obviously no-one uses
> that, since it doesn't even compile cleanly if HAVE_ALLOCINFO is
> defined:
> /home/andres/src/postgresql/src/backend/utils/mmgr/aset.c:302:20: warning: format ‘%d’ expects argument of type
‘int’,but argument 5 has type ‘Size {aka long unsigned int}’ [-Wformat=]
 
>     fprintf(stderr, "AllocAlloc: %s: %p, %d\n", \
>

I don't really care. Sure, we should fix the warning, but not supporting 
HAVE_ALLOCINFO in the new allocator(s) seems wrong - we should either 
support it everywhere, or we should rip it out. That's not the purpose 
of this patch, though.

>
> +#ifdef CLOBBER_FREED_MEMORY
> +
> +/* Wipe freed memory for debugging purposes */
> +static void
> +wipe_mem(void *ptr, size_t size)
>
> +#ifdef MEMORY_CONTEXT_CHECKING
> +static void
> +set_sentinel(void *base, Size offset)
> +
> +static bool
> +sentinel_ok(const void *base, Size offset)
> +#endif
>
> +#ifdef RANDOMIZE_ALLOCATED_MEMORY
> +static void
> +randomize_mem(char *ptr, size_t size)
>
> Let's move these into an mcxt_internal.h or mcxt_impl.h or such, as
> static inlines.
>

Yes, next to the valgrind stuff.

> +MemoryContext
> +SlabContextCreate(MemoryContext parent,
> +                      const char *name,
> +                      Size blockSize,
> +                      Size chunkSize)
> +{
> +    int        chunksPerBlock;
> +    Size    fullChunkSize;
> +    Slab    set;
> +
> +    /* chunk, including SLAB header (both addresses nicely aligned) */
> +    fullChunkSize = MAXALIGN(sizeof(SlabChunkData) + MAXALIGN(chunkSize));
>
> +    /* make sure the block can store at least one chunk (plus a bitmap) */
> +    if (blockSize - sizeof(SlabChunkData) < fullChunkSize + 1)
> +        elog(ERROR, "block size %ld for slab is too small for chunks %ld",
> +                    blockSize, chunkSize);
>
> I assume the 1 is the bitmap size?
>

Yes, the smallest bitmap is 1 byte.

>
> +    /* so how many chunks can we fit into a block, including header and bitmap? */
> +    chunksPerBlock
> +        =  (8 * (blockSize - sizeof(SlabBlockData)) - 7) / (8 * fullChunkSize + 1);
>
> I'm slightly drunk due to bad airline wine, but right now that seems a
> bit odd and/or documentation worthy. I understand the (xxx + 7) / 8
> pattern elsewhere, but right now I can't follow the - 7.
>

We need all the bits (header, chunks and bitmap) to fit onto the block, 
so this needs to hold:
   blockSize >= sizeof(SlabBlockData) +                chunksPerBlock * fullChunkSize +                (chunksPerBlock
+7) / 8 

solve for 'chunksPerBlock' and you'll get the above formula. Moving the 
7 to the other side of the inequality is the reason for the minus.

But documenting this is probably a good idea.

> +/*
> + * SlabAlloc
> + *        Returns pointer to allocated memory of given size or NULL if
> + *        request could not be completed; memory is added to the set.
> + *
> + * No request may exceed:
> + *        MAXALIGN_DOWN(SIZE_MAX) - SLAB_BLOCKHDRSZ - SLAB_CHUNKHDRSZ
> + * All callers use a much-lower limit.
>
> That seems like a meaningless comment in the context of a slab allocator
> with a fixed size.
>

Why? It might be worth moving this to SlabContextCreate though.

>
> +    /*
> +     * If there are no free chunks in any existing block, create a new block
> +     * and put it to the last freelist bucket.
> +     *
> +     * (set->minFreeChunks == 0) means there are no blocks with free chunks,
> +     * thanks to how minFreeChunks is updated at the end of SlabAlloc().
> +     */
> +    if (set->minFreeChunks == 0)
> +    {
> +        block = (SlabBlock)malloc(set->blockSize);
>
> Space after cast - maybe run pgindent over the file before submission?
> Doing that manually helps to avoid ugly damange by the per-release run
> later. I'm pretty sure there'll be a significant number of changes.
>

Will do.

>
>
> +    if (block->nfree == 0)
> +        block->firstFreeChunk = set->chunksPerBlock;
> +    else
> +    {
> +        /* look for the next free chunk in the block, after the first one */
> +        while ((++block->firstFreeChunk) < set->chunksPerBlock)
> +        {
> +            int byte = block->firstFreeChunk / 8;
> +            int bit  = block->firstFreeChunk % 8;
> +
> +            /* stop when you find 0 (unused chunk) */
> +            if (! (block->bitmapptr[byte] & (0x01 << bit)))
> +                break;
> +        }
>
> I haven't profiled (or even compiled) this patchset yet, but FWIW, in
> the tuple deforming code, I could measure a noticeable speedup by
> accessing bitmap-bytes in the native word-size, rather than char. I'm
> *NOT* saying you should do that, but if this ever shows up as a
> bottlneck, it might be worthwhile to optimize.
>

OK, will try, although I don't expect this branch to be very hot.

> +    /*
> +     * And finally update minFreeChunks, i.e. the index to the block with the
> +     * lowest number of free chunks. We only need to do that when the block
> +     * got full (otherwise we know the current block is the right one).
> +     * We'll simply walk the freelist until we find a non-empty entry.
> +     */
> +    if (set->minFreeChunks == 0)
> +        for (idx = 1; idx <= set->chunksPerBlock; idx++)
> +            if (set->freelist[idx])
> +            {
> +                set->minFreeChunks = idx;
> +                break;
> +            }
>
> Yuck. This definitely needs braces.
>

OK ;-)


thanks

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Tsunakawa, Takayuki"
Дата:
Сообщение: Re: [RFC] Should we fix postmaster to avoid slow shutdown?
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Physical append-only tables