Обсуждение: Improving the memory allocator

Поиск
Список
Период
Сортировка

Improving the memory allocator

От
Andres Freund
Дата:
Hi,

In the thread http://archives.postgresql.org/message-id/4DA69D60.4000108@2ndquadrant.com /
"Single client performance on trivial SELECTs" I wondered whether it might be
possible to increase the performance of the current allocator by using some
slab allocator like concept.

While avoiding doing my tax returns and having a long and delayed train ride
I started to play around.

The profile I used in this case was:

pgbench -h /tmp/ -p5433 -s 30 pgbench -S -T 20

I hacked together a patch which records every allocation to a file. That patch
obviously slows everything down quite a bit so that I only get about a third
of the normal tps.

The current profile for that looks like that (hope thats recognizable). I
guess the only unclear column is "compile_time". Thats the result of
__builtin_constant_p(size) for the size argument of the various allocation
operations which is true for the case where size is known at compile time.

          action          |       file       |                 func                 | line  | size  | compile_time |
count
--------------------------+------------------+--------------------------------------+-------+-------+--------------+--------
chunk_alloc             | list.c           | new_list                             |    68 |    16 | t            |
940753chunk_alloc              | list.c           | new_list                             |    72 |    24 | t
|940753 chunk_alloc              | list.c           | new_tail_cell                        |   112 |    16 | t
 | 202908 chunk_alloc_zero         | bitmapset.c      | bms_make_singleton                   |   189 |     8 | f
   | 129122 chunk_alloc_zero_aligned | value.c          | makeString                           |    55 |    16 | t
     | 129122 chunk_alloc_zero_aligned | makefuncs.c      | makeVar                              |    73 |    40 | t
       | 110676 chunk_alloc_zero_aligned | makefuncs.c      | makeTargetEntry                      |   216 |    48 | t
         |  92230 chunk_alloc              | stringinfo.c     | initStringInfo                       |    50 |  1024 |
t           |  73799 chunk_alloc              | bitmapset.c      | bms_copy                             |   119 |     8
|f            |  73784 chunk_alloc              | lock.c           | LockAcquireExtended                  |   559 |
128| t            |  55441 chunk_alloc              | nodeFuncs.c      | expression_tree_mutator              |  2036 |
  40 | t            |  55338 chunk_alloc              | snapmgr.c        | PushActiveSnapshot                   |   282
|   24 | t            |  55338 chunk_alloc              | nbtsearch.c      | _bt_search                           |
121|    24 | t            |  36908 chunk_alloc              | resowner.c       | ResourceOwnerEnlargeCatCacheRefs     |
 612 |   128 | t            |  36893 chunk_alloc              | resowner.c       | ResourceOwnerEnlargeRelationRefs
|  754 |   128 | t            |  36893 chunk_alloc_zero         | resowner.c       | ResourceOwnerCreate
 |   135 |   168 | t            |  36893 chunk_alloc              | nodeFuncs.c      | expression_tree_mutator
   |  2027 |    40 | t            |  36892 chunk_alloc              | snapmgr.c        | CopySnapshot
     |   217 |    72 | f            |  36892 chunk_alloc_zero_aligned | copyfuncs.c      | _copyVar
       |  1041 |    40 | t            |  36892 chunk_alloc_zero_aligned | equivclass.c     | add_eq_member
         |   452 |    32 | t            |  36892 chunk_alloc_zero_aligned | execQual.c       | ExecInitExpr
           |  4231 |    24 | t            |  36892 chunk_alloc_zero_aligned | execTuples.c     | MakeTupleTableSlot
             |   118 |    96 | t            |  36892 chunk_alloc_zero_aligned | gram.y           | makeColumnRef
               | 12286 |    24 | t            |  36892 chunk_alloc              | list.c           | new_head_cell
                 |    93 |    16 | t            |  18529 chunk_alloc              | genam.c          |
RelationGetIndexScan                |    77 |   104 | t            |  18489 chunk_alloc              | nbtree.c
|btbeginscan                          |   385 |  6608 | t            |  18489 chunk_alloc              | tupdesc.c
 | CreateTemplateTupleDesc              |    63 |   160 | f            |  18479 chunk_alloc              | genam.c
   | RelationGetIndexScan                 |    89 |    72 | f            |  18471 chunk_alloc              | nbtree.c
     | btbeginscan                          |   388 |    72 | f            |  18471 chunk_alloc              |
resowner.c      | ResourceOwnerEnlargeBuffers          |   524 |    64 | t            |  18448 chunk_alloc
|trigger.c        | AfterTriggerBeginXact                |  3607 |   112 | t            |  18447 chunk_alloc
 | trigger.c        | AfterTriggerBeginXact                |  3619 |   192 | t            |  18447
 

splitted to contexts:
                  self                   |          action          |       file       |                 func
     | line  | size  | compile_time | count
-----------------------------------------+--------------------------+------------------+--------------------------------------+-------+-------+--------------+--------
MessageContext                         | chunk_alloc              | list.c           | new_list
   |    68 |    16 | t            | 848520 MessageContext                          | chunk_alloc              | list.c
        | new_list                             |    72 |    24 | t            | 848520 MessageContext
      | chunk_alloc              | list.c           | new_tail_cell                        |   112 |    16 | t
 | 166016 MessageContext                          | chunk_alloc_zero         | bitmapset.c      | bms_make_singleton
              |   189 |     8 | f            | 129122 MessageContext                          |
chunk_alloc_zero_aligned| value.c          | makeString                           |    55 |    16 | t            |
129122MessageContext                          | chunk_alloc_zero_aligned | makefuncs.c      | makeVar
          |    73 |    40 | t            | 110676 ExecutorState                           | chunk_alloc              |
list.c          | new_list                             |    68 |    16 | t            |  92230 ExecutorState
              | chunk_alloc              | list.c           | new_list                             |    72 |    24 | t
         |  92230 MessageContext                          | chunk_alloc_zero_aligned | makefuncs.c      |
makeTargetEntry                     |   216 |    48 | t            |  92230 MessageContext                          |
chunk_alloc             | bitmapset.c      | bms_copy                             |   119 |     8 | f            |
73784TopMemoryContext                        | chunk_alloc              | lock.c           | LockAcquireExtended
         |   559 |   128 | t            |  55441 MessageContext                          | chunk_alloc              |
nodeFuncs.c     | expression_tree_mutator              |  2036 |    40 | t            |  55338 TopTransactionContext
              | chunk_alloc              | snapmgr.c        | PushActiveSnapshot                   |   282 |    24 | t
         |  55338 MessageContext                          | chunk_alloc              | stringinfo.c     |
initStringInfo                      |    50 |  1024 | t            |  36894 TopMemoryContext                        |
chunk_alloc             | resowner.c       | ResourceOwnerEnlargeCatCacheRefs     |   612 |   128 | t            |
36893TopMemoryContext                        | chunk_alloc              | resowner.c       |
ResourceOwnerEnlargeRelationRefs    |   754 |   128 | t            |  36893 TopMemoryContext                        |
chunk_alloc_zero        | resowner.c       | ResourceOwnerCreate                  |   135 |   168 | t            |
36893ExecutorState                           | chunk_alloc              | list.c           | new_tail_cell
         |   112 |    16 | t            |  36892 ExecutorState                           | chunk_alloc              |
nbtsearch.c     | _bt_search                           |   121 |    24 | t            |  36892 ExecutorState
              | chunk_alloc              | stringinfo.c     | initStringInfo                       |    50 |  1024 | t
         |  36892 ExecutorState                           | chunk_alloc_zero_aligned | execQual.c       | ExecInitExpr
                      |  4231 |    24 | t            |  36892 ExecutorState                           |
chunk_alloc_zero_aligned| execTuples.c     | MakeTupleTableSlot                   |   118 |    96 | t            |
36892MessageContext                          | chunk_alloc              | nodeFuncs.c      | expression_tree_mutator
         |  2027 |    40 | t            |  36892 MessageContext                          | chunk_alloc_zero_aligned |
copyfuncs.c     | _copyVar                             |  1041 |    40 | t            |  36892 MessageContext
              | chunk_alloc_zero_aligned | equivclass.c     | add_eq_member                        |   452 |    32 | t
         |  36892 MessageContext                          | chunk_alloc_zero_aligned | gram.y           | makeColumnRef
                      | 12286 |    24 | t            |  36892 TopTransactionContext                   | chunk_alloc
        | snapmgr.c        | CopySnapshot                         |   217 |    72 | f            |  36892
TopMemoryContext                       | chunk_alloc              | resowner.c       | ResourceOwnerEnlargeBuffers
   |   524 |    64 | t            |  18448 ExecutorState                           | chunk_alloc              | genam.c
        | RelationGetIndexScan                 |    77 |   104 | t            |  18447 ExecutorState
      | chunk_alloc              | genam.c          | RelationGetIndexScan                 |    89 |    72 | f
 |  18447 ExecutorState                           | chunk_alloc              | nbtree.c         | btbeginscan
              |   385 |  6608 | t            |  18447 ExecutorState                           | chunk_alloc
| nbtree.c         | btbeginscan                          |   388 |    72 | f            |  18447 MessageContext
                 | chunk_alloc              | list.c           | new_head_cell                        |    93 |    16 |
t           |  18447 TopTransactionContext                   | chunk_alloc              | trigger.c        |
AfterTriggerBeginXact               |  3607 |   112 | t            |  18447 TopTransactionContext                   |
chunk_alloc             | trigger.c        | AfterTriggerBeginXact                |  3619 |   192 | t            |
18447

1:
For that allocation profile I found that one significant part of the costs is
AllocSetFreeIndex (which is inlined in optimized mode, but still).

If we would move more stuff to the headers those very common cases with
compile time known sizes could eliminate the AllocSetFreeIndex computation
at runtime as its only dependent on the size.

The problem with that is obviously that it would violate the whole mctx.c
abstraction as it has to be known at compile time which memory manager is
used.

Are we willing to reduce that abstraction?

2:
aset.c fragments horribly for longliving contexts. I haven't generated
sensibly accessible data for that as I currently just store it in an sql
database and sql (at least my queries :P) isn't really that efficient
for figuring out when how much was allocated from that:                            Table "public.allocations"    Column
  |  Type   |                        Modifiers
--------------+---------+----------------------------------------------------------id           | integer | not null
defaultnextval('allocations_id_seq'::regclass) action       | text    |  parent       | text    |  self         | text
 |  size         | bigint  |  addr         | bigint  |  file         | text    |  func         | text    |  line
| integer |  compile_time | boolean | 
 

schema when the data is sensibly large. A sensible benchmark easily produces
three digit million rows...

I guess some custom implementation for evaluating that data should be way much
better at that.

That fragmentation causes two main problems. 1 space wastage and 2. (more 
importantly imo) cache misses.

3:
Given the frequentness of very small allocations the current space overhead
of at least 16 bytes on 64bit Platforms seems quite high.

I don't think its too hard to devise a scheme (actually I have started that)
where the overhead is only sizeof(*void). But again that would reduce the 
the abstraction because it would make it harder to implement something like
pfree/repalloc/GetMemoryChunkContext which need to figure out the context
from a pointer.

If we had a separate api for small allocations we could even devise a schema
where a single chunk wouldn't have any memory overhead, but that seems to
complicated for now.


So after all this my question basically is: How important do we think the
mctx.c abstraction is?
I personally found the speed of AllocSetAlloc being the most limiting factor
in several workloads so I am willing to sacrifice some abstraction to gain
speed here. Especially as I hope its possible to write a single allocator
which is "good enough" for everything.

I currently started working on two pieces:
* A new memory manager implementation. For that the equivalent ofAllocSetFreeIdx seems to be the biggest limit so far.
Itsnot really readyfor anything yet though.
 

* recording and replaying MemoryContext* to get a somewhat reasonableand reproducable micro benchmark. That
unfortunatelydoesn't model theeffects of cache misses that well but I got no better idea so far.
 

Any opinions, input, ideas?

Thanks for reading,

Andres


Re: Improving the memory allocator

От
Robert Haas
Дата:
On Mon, Apr 25, 2011 at 5:45 PM, Andres Freund <andres@anarazel.de> wrote:
> [ lots of awesome test results ]

Very interesting work.  I think it's interesting that there is so much
allocation happening inside MessageContext; I barely knew that
existed, let alone that it was a hotspot for memory allocation.  I
think it would be interesting to break out the calls to new_list() by
caller.  It's been vaguely bothering me for a long time that we store
so many things in using List structures.  That is not a particularly
efficient representation, because a List of N nodes requires 2N+1
memory allocations - N nodes, N ListCell objects, and the List object
itself.  It might be worth experimenting with some kind of array/list
hybridy thing, like this:

typedef struct BunchOfThings {   NodeTag type;   int total_length;   int this_node_length;   struct BunchOfThings
*next;  Thing data[1];   /* really variable-length */
 
};

This is probably a bit more of a pain to deal with than a plain old
List, and for many purposes it might not be suitable, but there might
be some hotspots where the performance gain is enough to justify the
trouble.  In some cases you might be able to get even simpler - just
use a fixed array and, if it fills up, either reallocate-and-copy or
error out.

> The problem with that is obviously that it would violate the whole mctx.c
> abstraction as it has to be known at compile time which memory manager is
> used.
>
> Are we willing to reduce that abstraction?

I think it's definitely worth considering, if there is a significant
performance benefit.

> Given the frequentness of very small allocations the current space overhead
> of at least 16 bytes on 64bit Platforms seems quite high.

Yeah.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Improving the memory allocator

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> So after all this my question basically is: How important do we think the
> mctx.c abstraction is?

I think it's pretty important.  As a specific example, a new context
type in which pfree() is a no-op would be fine with me.  A new context
type in which pfree() dumps core will be useless, or close enough to
useless.  That means you can't get rid of the per-chunk back-link to the
context, but you might be able to get rid of the other overhead such as
per-chunk size data.  (It might be practical to not support
GetMemoryChunkSize for such contexts, or if it's a slab allocator then
you could possibly know that all the chunks in a given block have size X.)

Another point worth making is that it's a nonstarter to propose running
any large part of the system in memory context types that are incapable
of supporting all the debugging options we rely on (freed-memory-reuse
detection and write-past-end-of-chunk in particular).  It's okay if a
production build hasn't got that support, not okay for debug builds.
Perhaps you'll propose using completely different context
implementations in the two cases, but I'd be suspicious of that because
it'd mean the "fast" context code doesn't get any developer testing.

> Especially as I hope its possible to write a single allocator
> which is "good enough" for everything.

I'll lay a side bet that that approach is a dead end.  If one size fits
all were good enough, we'd not be having this conversation at all.  The
point of the mctx interface layer from the beginning was to support
multiple allocator policies, and that's the direction I think we want to
go to improve this.

BTW, what your numbers actually suggest to me is not that we need a
better allocator, but that we need a better implementation of List.
We speculated back when Neil redid List the first time about aggregating
list cells to reduce palloc traffic, but it was left out to keep the
patch complexity down.  Now that the bugs have been shaken out it might
be time to have another go at that.  In particular, teaching List to
allocate the list head and first cell together would alone remove a
third of your runtime ...
        regards, tom lane


Re: Improving the memory allocator

От
Andres Freund
Дата:
On Tuesday, April 26, 2011 01:39:37 AM Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > So after all this my question basically is: How important do we think the
> > mctx.c abstraction is?
> 
> I think it's pretty important.  As a specific example, a new context
> type in which pfree() is a no-op would be fine with me.  A new context
> type in which pfree() dumps core will be useless, or close enough to
> useless.
Well, what I suggested for that would be using a different api for such small 
+ static size allocations.
But as I said, I would prefer to exhaust other possibilities first.

> That means you can't get rid of the per-chunk back-link to the
> context, but you might be able to get rid of the other overhead such as
> per-chunk size data.  (It might be practical to not support
> GetMemoryChunkSize for such contexts, or if it's a slab allocator then
> you could possibly know that all the chunks in a given block have size X.)
For the slab allocator design I have in mind I would need to have a back 
pointer to the block, not the context... Thats one other reason why I started 
thinking about removing the abstraction.
So far I couldn't envision a clean design where you can intermix two 
implementations with such a different interpretations. One could have the 
blocks and contexts have a 'allocator' node tag in the first element and 
switch over that but I don't really like that.

And I don't see a way with that abstraction to let the compiler do expensive 
stuff like the index offset determination at compile time instead of run time 
with that abstraction. And I think pulling such computations out of runtime is 
quite an important part of improvements in that area.

> Another point worth making is that it's a nonstarter to propose running
> any large part of the system in memory context types that are incapable
> of supporting all the debugging options we rely on (freed-memory-reuse
> detection and write-past-end-of-chunk in particular).  It's okay if a
> production build hasn't got that support, not okay for debug builds.
Totally with you. I have absolutely no problem of enlarging the chunkheader 
for debug builds and I can't envision a design where that would be a major 
problem.

> Perhaps you'll propose using completely different context
> implementations in the two cases, but I'd be suspicious of that because
> it'd mean the "fast" context code doesn't get any developer testing.
I don't like that option either. Its *way* to easy to screw up slightly in 
that area.

> > Especially as I hope its possible to write a single allocator
> > which is "good enough" for everything.
> I'll lay a side bet that that approach is a dead end.  If one size fits
> all were good enough, we'd not be having this conversation at all.  The
> point of the mctx interface layer from the beginning was to support
> multiple allocator policies, and that's the direction I think we want to
> go to improve this.
I don't think I am with you here. I am not around that long so I might be 
missing something but I haven't found much evidence of somebody trying to 
improve the allocator on a whole instead of improving currently problematic 
pieces for 10+ years. So I don't see there is enough evidence proving that 
there isn't a possibility to envision an allocator thats good enough for all 
needs.

I quite much fear having to figure out which allocator to use where. I don't 
see that working very well.

> BTW, what your numbers actually suggest to me is not that we need a
> better allocator, but that we need a better implementation of List.
> We speculated back when Neil redid List the first time about aggregating
> list cells to reduce palloc traffic, but it was left out to keep the
> patch complexity down.  Now that the bugs have been shaken out it might
> be time to have another go at that.  In particular, teaching List to
> allocate the list head and first cell together would alone remove a
> third of your runtime ...
Thats certainly true for that workload. The hotspot is somewhere else entirely 
though if you start doing even mildly more complex statements than the default 
readonly statements from pgbench.
Its actually not totally easy finding any workload thats not totally IO bound 
where memory allocation is not in the top 5 in a profile...

But sure. Improving that point is a good idea independent from the allocator. 
One less allocation won't hurt any allocator ;)

Greetings,

Andres


Re: Improving the memory allocator

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Apr 25, 2011 at 5:45 PM, Andres Freund <andres@anarazel.de> wrote:
>> [ lots of awesome test results ]

> Very interesting work.  I think it's interesting that there is so much
> allocation happening inside MessageContext; I barely knew that
> existed, let alone that it was a hotspot for memory allocation.

The reason it shows as a hotspot is that parsing and planning happen in
that context for the simple-Query code path.  You'd get different
answers if you were using extended-query protocol, or running prepared
statements, or doing the bulk of your work in plpgsql, etc etc.

It would be interesting if Andres could dig down another level or so in
the call stack to see where the list creations etc are called from, but
I'll bet quite a bit that what we're looking at is mostly parse/plan
activity.  (The high showing of bitmapset activity is a dead tipoff for
planner, in particular, since that datatype is hardly used at all
elsewhere.)
        regards, tom lane


Re: Improving the memory allocator

От
Andres Freund
Дата:
On Tuesday, April 26, 2011 01:24:20 AM Robert Haas wrote:
> On Mon, Apr 25, 2011 at 5:45 PM, Andres Freund <andres@anarazel.de> wrote:
> > [ lots of awesome test results ]
> 
> Very interesting work.  I think it's interesting that there is so much
> allocation happening inside MessageContext; I barely knew that
> existed, let alone that it was a hotspot for memory allocation.
Yes, I was surprised as well. I haven't checked yet what the callsites for 
that are.
As I just wrote to TOm the hotspots are wildly different if you use slightly 
more complex statements. Which doesn't mean its not worthy of improvement ;)

> I
> think it would be interesting to break out the calls to new_list() by
> caller.  It's been vaguely bothering me for a long time that we store
> so many things in using List structures.  That is not a particularly
> efficient representation, because a List of N nodes requires 2N+1
> memory allocations - N nodes, N ListCell objects, and the List object
> itself.  It might be worth experimenting with some kind of array/list
> hybridy thing, like this:
> ...
While I aggree that that might be beneficial I think in this case nearly all 
of the lists are of 1 element length because its callers seem to look mostly 
like:
{{{
List *
lappend(List *list, void *datum)
{Assert(IsPointerList(list));
if (list == NIL)    list = new_list(T_List);else    new_tail_cell(list);
lfirst(list->tail) = datum;check_list_invariants(list);return list;
}
}}}

Hm. Seems worthy of some further investigation...

Thanks,

Andres


Re: Improving the memory allocator

От
Andres Freund
Дата:
On Tuesday, April 26, 2011 02:22:34 AM Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Mon, Apr 25, 2011 at 5:45 PM, Andres Freund <andres@anarazel.de> wrote:
> >> [ lots of awesome test results ]
> > 
> > Very interesting work.  I think it's interesting that there is so much
> > allocation happening inside MessageContext; I barely knew that
> > existed, let alone that it was a hotspot for memory allocation.
> 
> The reason it shows as a hotspot is that parsing and planning happen in
> that context for the simple-Query code path.  You'd get different
> answers if you were using extended-query protocol, or running prepared
> statements, or doing the bulk of your work in plpgsql, etc etc.
> 
> It would be interesting if Andres could dig down another level or so in
> the call stack to see where the list creations etc are called from, but
> I'll bet quite a bit that what we're looking at is mostly parse/plan
> activity.  (The high showing of bitmapset activity is a dead tipoff for
> planner, in particular, since that datatype is hardly used at all
> elsewhere.)
Currently thats just output via 

#define MemoryContextAlloc(context, size) \TraceMemoryContextAlloc(context, size, __FILE__, __FUNCTION__, __LINE__, \
                    __builtin_constant_p(size))
 
which in turn then calls the ordinary MemoryContextAlloc (yes, ugly I know). 
But I guess it shouldn't be a problem to get more information...

Andres


Re: Improving the memory allocator

От
Radosław Smogura
Дата:
I didn't followed this topic carefully, so sorry If I wrote something that was written, but I thought about following
approachat least for message sending, etc.:
 
1. When initializing MemoryContext (pool) give one parameter that will be stack size. Stack is addition to normal
operations.2.Operations on stack are only allocations, so alloc will be just pointer bump.3. When allocating memory for
messages,internally set size to be 1/3 greater (or 2/3 I don't remember well) for realloc, may help when encoding
changeoccurs.3. free is no op (as it was pointed), stack will be released or pointer resetted, when context is
released
From one point of view this may waste some memory (no free), but for messages should not happen, and even giving 100kb
ofmemory for stack may be quite enough, depending ofcourse how many data you send.
 
Regards,Radek


Re: Improving the memory allocator

От
Andres Freund
Дата:
On Tuesday, April 26, 2011 01:59:45 PM Radosław Smogura wrote:
>  I didn't followed this topic carefully, so sorry If I wrote something
>  that was written, but I thought about following approach at least for
>  message sending, etc.:
>
>  1. When initializing MemoryContext (pool) give one parameter that will
>  be stack size. Stack is addition to normal operations.
You can't allocate memory on the stack and return that.

Andres


Re: Improving the memory allocator

От
Radosław Smogura
Дата:
 On Tue, 26 Apr 2011 14:25:10 +0200, Andres Freund wrote:
> On Tuesday, April 26, 2011 01:59:45 PM Radosław Smogura wrote:
>>  I didn't followed this topic carefully, so sorry If I wrote 
>> something
>>  that was written, but I thought about following approach at least 
>> for
>>  message sending, etc.:
>>
>>  1. When initializing MemoryContext (pool) give one parameter that 
>> will
>>  be stack size. Stack is addition to normal operations.
> You can't allocate memory on the stack and return that.
>
> Andres
I didn't wrote allocate on stack. I thought about preallocate n bytes chunk for stack-like memory but without popping.
Regards,Radek


Re: Improving the memory allocator

От
Andres Freund
Дата:
On Tuesday, April 26, 2011 03:21:05 PM Radosław Smogura wrote:
>  On Tue, 26 Apr 2011 14:25:10 +0200, Andres Freund wrote:
> > On Tuesday, April 26, 2011 01:59:45 PM Radosław Smogura wrote:
> >>  I didn't followed this topic carefully, so sorry If I wrote
> >>
> >> something
> >>
> >>  that was written, but I thought about following approach at least
> >>
> >> for
> >>
> >>  message sending, etc.:
> >>
> >>  1. When initializing MemoryContext (pool) give one parameter that
> >>
> >> will
> >>
> >>  be stack size. Stack is addition to normal operations.
> >
> > You can't allocate memory on the stack and return that.
> >
> > Andres
>
>  I didn't wrote allocate on stack. I thought about preallocate n bytes
>  chunk for stack-like memory but without popping.
The implementation already does something like that.


Re: Improving the memory allocator

От
Greg Smith
Дата:
On 04/25/2011 05:45 PM, Andres Freund wrote:
> The profile I used in this case was:
>
> pgbench -h /tmp/ -p5433 -s 30 pgbench -S -T 20
>    

I'd suggest collecting data from running this with "-M prepared" at some 
point too, so that you can get a basic idea which of these allocations 
are avoided when using prepared statements.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD