Re: Use generation memory context for tuplestore.c
От | Matthias van de Meent |
---|---|
Тема | Re: Use generation memory context for tuplestore.c |
Дата | |
Msg-id | CAEze2WiusMw4Ern1HriocdMdt-r+zjw3tKoQn3hCSDRjt5VOyA@mail.gmail.com обсуждение исходный текст |
Ответ на | Use generation memory context for tuplestore.c (David Rowley <dgrowleyml@gmail.com>) |
Ответы |
Re: Use generation memory context for tuplestore.c
Re: Use generation memory context for tuplestore.c |
Список | pgsql-hackers |
On Fri, 3 May 2024 at 15:55, David Rowley <dgrowleyml@gmail.com> wrote: > > (40af10b57 did this for tuplesort.c, this is the same, but for tuplestore.c) > > I was looking at the tuplestore.c code a few days ago and noticed that > it allocates tuples in the memory context that tuplestore_begin_heap() > is called in, which for nodeMaterial.c, is ExecutorState. > > I didn't think this was great because: > 1. Allocating many chunks in ExecutorState can bloat the context with > many blocks worth of free'd chunks, stored on freelists that might > never be reused for anything. > 2. To clean up the memory, pfree must be meticulously called on each > allocated tuple > 3. ExecutorState is an aset.c context which isn't the most efficient > allocator for this purpose. Agreed on all counts. > I've attached 2 patches: > > 0001: Adds memory tracking to Materialize nodes, which looks like: > > -> Materialize (actual time=0.033..9.157 rows=10000 loops=2) > Storage: Memory Maximum Storage: 10441kB > > 0002: Creates a Generation MemoryContext for storing tuples in tuplestore. > > Using generation has the following advantages: [...] > 6. Generation has a page-level freelist, so is able to reuse pages > instead of freeing and mallocing another if tuplestore_trim() is used > to continually remove no longer needed tuples. aset.c can only > efficiently do this if the tuples are all in the same size class. Was a bump context considered? If so, why didn't it make the cut? If tuplestore_trim is the only reason why the type of context in patch 2 is a generation context, then couldn't we make the type of context conditional on state->eflags & EXEC_FLAG_REWIND, and use a bump context if we require rewind capabilities (i.e. where _trim is never effectively executed)? > master @ 8f0a97dff > Storage: Memory Maximum Storage: 16577kB > > patched: > Storage: Memory Maximum Storage: 8577kB Those are some impressive numbers. Kind regards, Matthias van de Meent Neon (https://neon.tech) On Fri, 3 May 2024 at 15:55, David Rowley <dgrowleyml@gmail.com> wrote: > > (40af10b57 did this for tuplesort.c, this is the same, but for tuplestore.c) > > I was looking at the tuplestore.c code a few days ago and noticed that > it allocates tuples in the memory context that tuplestore_begin_heap() > is called in, which for nodeMaterial.c, is ExecutorState. > > I didn't think this was great because: > 1. Allocating many chunks in ExecutorState can bloat the context with > many blocks worth of free'd chunks, stored on freelists that might > never be reused for anything. > 2. To clean up the memory, pfree must be meticulously called on each > allocated tuple > 3. ExecutorState is an aset.c context which isn't the most efficient > allocator for this purpose. > > I've attached 2 patches: > > 0001: Adds memory tracking to Materialize nodes, which looks like: > > -> Materialize (actual time=0.033..9.157 rows=10000 loops=2) > Storage: Memory Maximum Storage: 10441kB > > 0002: Creates a Generation MemoryContext for storing tuples in tuplestore. > > Using generation has the following advantages: > > 1. It does not round allocations up to the next power of 2. Using > generation will save an average of 25% memory for tuplestores or allow > an average of 25% more tuples before going to disk. > 2. Allocation patterns in tuplestore.c are FIFO, which is exactly what > generation was designed to handle best. > 3. Generation is faster to palloc/pfree than aset. (See [1]. Compare > the 4-bit times between aset_palloc_pfree.png and > generation_palloc_pfree.png) > 4. tuplestore_clear() and tuplestore_end() can reset or delete the > tuple context instead of pfreeing every tuple one by one. > 5. Higher likelihood of neighbouring tuples being stored consecutively > in memory, resulting in better CPU memory prefetching. > 6. Generation has a page-level freelist, so is able to reuse pages > instead of freeing and mallocing another if tuplestore_trim() is used > to continually remove no longer needed tuples. aset.c can only > efficiently do this if the tuples are all in the same size class. > > The attached bench.sh.txt tests the performance of this change and > result_chart.png shows the results I got when running on an AMD 3990x > master @ 8f0a97dff vs patched. > The script runs benchmarks for various tuple counts stored in the > tuplestore -- 1 to 8192 in power-2 increments. > > The script does output the memory consumed by the tuplestore for each > query. Here are the results for the 8192 tuple test: > > master @ 8f0a97dff > Storage: Memory Maximum Storage: 16577kB > > patched: > Storage: Memory Maximum Storage: 8577kB > > Which is roughly half, but I did pad the tuple to just over 1024 > bytes, so the alloc set allocations would have rounded up to 2048 > bytes. > > Some things I've *not* done: > > 1. Gone over other executor nodes which use tuplestore to add the same > additional EXPLAIN output. CTE Scan, Recursive Union, Window Agg > could get similar treatment. > 2. Given much consideration for the block sizes to use for > GenerationContextCreate(). (Maybe using ALLOCSET_SMALL_INITSIZE for > the start size is a good idea.) > 3. A great deal of testing. > > I'll park this here until we branch for v18. > > David > > [1] https://www.postgresql.org/message-id/CAApHDvqUWhOMkUjYXzq95idAwpiPdJLCxxRbf8kV6PYcW5y=Cg@mail.gmail.com
В списке pgsql-hackers по дате отправления: