Re: PATCH: two slab-like memory allocators
От | Tomas Vondra |
---|---|
Тема | Re: PATCH: two slab-like memory allocators |
Дата | |
Msg-id | efd47210-cf95-d7ee-e984-a873f76596a8@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: PATCH: two slab-like memory allocators (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: PATCH: two slab-like memory allocators
Re: PATCH: two slab-like memory allocators |
Список | pgsql-hackers |
On 11/12/2016 08:12 PM, Andres Freund wrote: > Hi, > > Subject: [PATCH 1/2] slab allocator > > diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c > index 6ad7e7d..520f295 100644 > --- a/src/backend/replication/logical/reorderbuffer.c > +++ b/src/backend/replication/logical/reorderbuffer.c > > I'd rather have that in a separate followup commit... > > > + * IDENTIFICATION > + * src/backend/utils/mmgr/slab.c > + * > + * > + * The constant allocation size allows significant simplification and various > + * optimizations that are not possible in AllocSet. Firstly, we can get rid > + * of the doubling and carve the blocks into chunks of exactly the right size > + * (plus alignment), not wasting memory. > > References to AllocSet aren't necessarily a good idea, they'll quite > possibly get out of date. The argument can be quite easily be made > without referring to a concrete reference to behaviour elsewhere. > Yeah, that's probably true. > + * > + * At the context level, we use 'freelist' array to track blocks grouped by > + * number of free chunks. For example freelist[0] is a list of completely full > + * blocks, freelist[1] is a block with a single free chunk, etc. > > Hm. Those arrays are going to be quite large for small allocations w/ > big blocks (an imo sensible combination). Maybe it'd make more sense to > model it as a linked list of blocks? Full blocks are at one end, empty > ones at the other? So there'd be one huge list of blocks, sorted by the number of empty chunks? Hmm, that might work I guess. I don't think the combination of large blocks with small allocations is particularly sensible, though - what exactly would be the benefit of such combination? I would even consider enforcing some upper limit on the number of chunks per block - say, 256, for example. > > > + * About MEMORY_CONTEXT_CHECKING: > + * > + * Since we usually round request sizes up to the next power of 2, there > + * is often some unused space immediately after a requested data > area. > > I assume the "round up" stuff is copy-paste? > Yeah, sorry about that. > > + * Thus, if someone makes the common error of writing past what they've > + * requested, the problem is likely to go unnoticed ... until the day when > + * there *isn't* any wasted space, perhaps because of different memory > + * ... > + * > + * See also the cooperating Valgrind client requests in mcxt.c. > > I think we need a preliminary patch moving a lot of this into something > like mcxt_internal.h. Copying this comment, and a lot of the utility > functions, into every memory context implementation is a bad pattern. > Yes, I planned to do that for the next version of patch. Laziness. > > +typedef struct SlabBlockData *SlabBlock; /* forward reference */ > +typedef struct SlabChunkData *SlabChunk; > > Can we please not continue hiding pointers behind typedefs? It's a bad > pattern, and that it's fairly widely used isn't a good excuse to > introduce further usages of it. > Why is it a bad pattern? > +/* > + * SlabContext is a specialized implementation of MemoryContext. > + */ > +typedef struct SlabContext > +{ > + MemoryContextData header; /* Standard memory-context fields */ > + /* Allocation parameters for this context: */ > + Size chunkSize; /* chunk size */ > + Size fullChunkSize; /* chunk size including header and alignment */ > + Size blockSize; /* block size */ > + int chunksPerBlock; /* number of chunks per block */ > + int minFreeChunks; /* min number of free chunks in any block */ > + int nblocks; /* number of blocks allocated */ > + /* Info about storage allocated in this context: */ > + SlabBlock freelist[1]; /* free lists (block-level) */ > > I assume this is a variable-length array? If so, that a) should be > documented b) use FLEXIBLE_ARRAY_MEMBER as length - not doing so > actually will cause compiler warnings and potential misoptimizations. > Will fix, thanks. > +/* > + * SlabBlockData > + * Structure of a single block in SLAB allocator. > + * > + * slab: context owning this block > > What do we need this for? > You're right the pointer to the owning context is unnecessary - there's nothing like "standard block header" and we already have the pointer in the standard chunk header. But maybe keeping the pointer at least with MEMORY_CONTEXT_CHECKING would be a good idea? > + * prev, next: used for doubly-linked list of blocks in global freelist > > I'd prefer using an embedded list here (cf. ilist.h). > Makes sense. > +/* > + * SlabChunk > + * The prefix of each piece of memory in an SlabBlock > + * > + * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h. > + * However it's possible to fields in front of the StandardChunkHeader fields, > + * which is used to add pointer to the block. > + */ > > Wouldn't that be easier to enforce - particularly around alignment > requirements - by embedding a StandardChunkHeader here? That'd also > avoid redundancies. > Also makes sense. > +/* ---------- > + * Debug macros > + * ---------- > + */ > +#ifdef HAVE_ALLOCINFO > +#define SlabFreeInfo(_cxt, _chunk) \ > + fprintf(stderr, "SlabFree: %s: %p, %d\n", \ > + (_cxt)->header.name, (_chunk), (_chunk)->size) > +#define SlabAllocInfo(_cxt, _chunk) \ > + fprintf(stderr, "SlabAlloc: %s: %p, %d\n", \ > + (_cxt)->header.name, (_chunk), (_chunk)->size) > +#else > +#define SlabFreeInfo(_cxt, _chunk) > +#define SlabAllocInfo(_cxt, _chunk) > +#endif > > Do we really have to copy that stuff from aset.c? Obviously no-one uses > that, since it doesn't even compile cleanly if HAVE_ALLOCINFO is > defined: > /home/andres/src/postgresql/src/backend/utils/mmgr/aset.c:302:20: warning: format ‘%d’ expects argument of type ‘int’,but argument 5 has type ‘Size {aka long unsigned int}’ [-Wformat=] > fprintf(stderr, "AllocAlloc: %s: %p, %d\n", \ > I don't really care. Sure, we should fix the warning, but not supporting HAVE_ALLOCINFO in the new allocator(s) seems wrong - we should either support it everywhere, or we should rip it out. That's not the purpose of this patch, though. > > +#ifdef CLOBBER_FREED_MEMORY > + > +/* Wipe freed memory for debugging purposes */ > +static void > +wipe_mem(void *ptr, size_t size) > > +#ifdef MEMORY_CONTEXT_CHECKING > +static void > +set_sentinel(void *base, Size offset) > + > +static bool > +sentinel_ok(const void *base, Size offset) > +#endif > > +#ifdef RANDOMIZE_ALLOCATED_MEMORY > +static void > +randomize_mem(char *ptr, size_t size) > > Let's move these into an mcxt_internal.h or mcxt_impl.h or such, as > static inlines. > Yes, next to the valgrind stuff. > +MemoryContext > +SlabContextCreate(MemoryContext parent, > + const char *name, > + Size blockSize, > + Size chunkSize) > +{ > + int chunksPerBlock; > + Size fullChunkSize; > + Slab set; > + > + /* chunk, including SLAB header (both addresses nicely aligned) */ > + fullChunkSize = MAXALIGN(sizeof(SlabChunkData) + MAXALIGN(chunkSize)); > > + /* make sure the block can store at least one chunk (plus a bitmap) */ > + if (blockSize - sizeof(SlabChunkData) < fullChunkSize + 1) > + elog(ERROR, "block size %ld for slab is too small for chunks %ld", > + blockSize, chunkSize); > > I assume the 1 is the bitmap size? > Yes, the smallest bitmap is 1 byte. > > + /* so how many chunks can we fit into a block, including header and bitmap? */ > + chunksPerBlock > + = (8 * (blockSize - sizeof(SlabBlockData)) - 7) / (8 * fullChunkSize + 1); > > I'm slightly drunk due to bad airline wine, but right now that seems a > bit odd and/or documentation worthy. I understand the (xxx + 7) / 8 > pattern elsewhere, but right now I can't follow the - 7. > We need all the bits (header, chunks and bitmap) to fit onto the block, so this needs to hold: blockSize >= sizeof(SlabBlockData) + chunksPerBlock * fullChunkSize + (chunksPerBlock +7) / 8 solve for 'chunksPerBlock' and you'll get the above formula. Moving the 7 to the other side of the inequality is the reason for the minus. But documenting this is probably a good idea. > +/* > + * SlabAlloc > + * Returns pointer to allocated memory of given size or NULL if > + * request could not be completed; memory is added to the set. > + * > + * No request may exceed: > + * MAXALIGN_DOWN(SIZE_MAX) - SLAB_BLOCKHDRSZ - SLAB_CHUNKHDRSZ > + * All callers use a much-lower limit. > > That seems like a meaningless comment in the context of a slab allocator > with a fixed size. > Why? It might be worth moving this to SlabContextCreate though. > > + /* > + * If there are no free chunks in any existing block, create a new block > + * and put it to the last freelist bucket. > + * > + * (set->minFreeChunks == 0) means there are no blocks with free chunks, > + * thanks to how minFreeChunks is updated at the end of SlabAlloc(). > + */ > + if (set->minFreeChunks == 0) > + { > + block = (SlabBlock)malloc(set->blockSize); > > Space after cast - maybe run pgindent over the file before submission? > Doing that manually helps to avoid ugly damange by the per-release run > later. I'm pretty sure there'll be a significant number of changes. > Will do. > > > + if (block->nfree == 0) > + block->firstFreeChunk = set->chunksPerBlock; > + else > + { > + /* look for the next free chunk in the block, after the first one */ > + while ((++block->firstFreeChunk) < set->chunksPerBlock) > + { > + int byte = block->firstFreeChunk / 8; > + int bit = block->firstFreeChunk % 8; > + > + /* stop when you find 0 (unused chunk) */ > + if (! (block->bitmapptr[byte] & (0x01 << bit))) > + break; > + } > > I haven't profiled (or even compiled) this patchset yet, but FWIW, in > the tuple deforming code, I could measure a noticeable speedup by > accessing bitmap-bytes in the native word-size, rather than char. I'm > *NOT* saying you should do that, but if this ever shows up as a > bottlneck, it might be worthwhile to optimize. > OK, will try, although I don't expect this branch to be very hot. > + /* > + * And finally update minFreeChunks, i.e. the index to the block with the > + * lowest number of free chunks. We only need to do that when the block > + * got full (otherwise we know the current block is the right one). > + * We'll simply walk the freelist until we find a non-empty entry. > + */ > + if (set->minFreeChunks == 0) > + for (idx = 1; idx <= set->chunksPerBlock; idx++) > + if (set->freelist[idx]) > + { > + set->minFreeChunks = idx; > + break; > + } > > Yuck. This definitely needs braces. > OK ;-) thanks -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления:
Предыдущее
От: "Tsunakawa, Takayuki"Дата:
Сообщение: Re: [RFC] Should we fix postmaster to avoid slow shutdown?