Обсуждение: reducing memory usage by using "proxy" memory contexts?
Hi,
I was responding to a question about postgres' per-backend memory usage,
making me look at the various contexts below CacheMemoryContext.  There
is pretty much always a significant number of contexts below, one for
each index:
  CacheMemoryContext: 524288 total in 7 blocks; 8680 free (0 chunks); 515608 used
    index info: 2048 total in 2 blocks; 568 free (1 chunks); 1480 used: pg_class_tblspc_relfilenode_index
    index info: 2048 total in 2 blocks; 960 free (0 chunks); 1088 used: pg_statistic_ext_relid_index
    index info: 2048 total in 2 blocks; 976 free (0 chunks); 1072 used: blarg_pkey
    index info: 2048 total in 2 blocks; 872 free (0 chunks); 1176 used: pg_index_indrelid_index
    index info: 2048 total in 2 blocks; 600 free (1 chunks); 1448 used: pg_attrdef_adrelid_adnum_index
    index info: 2048 total in 2 blocks; 656 free (2 chunks); 1392 used: pg_db_role_setting_databaseid_rol_index
    index info: 2048 total in 2 blocks; 544 free (2 chunks); 1504 used: pg_opclass_am_name_nsp_index
    index info: 2048 total in 2 blocks; 928 free (2 chunks); 1120 used: pg_foreign_data_wrapper_name_index
    index info: 2048 total in 2 blocks; 960 free (2 chunks); 1088 used: pg_enum_oid_index
    index info: 2048 total in 2 blocks; 600 free (1 chunks); 1448 used: pg_class_relname_nsp_index
    index info: 2048 total in 2 blocks; 960 free (2 chunks); 1088 used: pg_foreign_server_oid_index
    index info: 2048 total in 2 blocks; 960 free (2 chunks); 1088 used: pg_publication_pubname_index
...
    index info: 3072 total in 2 blocks; 1144 free (2 chunks); 1928 used: pg_conversion_default_index
...
while I also think we could pretty easily reduce the amount of memory
used for each index, I want to focus on something else here:
We waste a lot of space due to all these small contexts. Even leaving
aside the overhead of the context and its blocks - not insignificant -
they are mostly between ~1/2 a ~1/4 empty.
At the same time we probably don't want to inline all of them into
CacheMemoryContext - too likely to introduce bugs, and too hard to
maintain leak free.
But what if we had a new type of memory context that did not itself
manage memory underlying allocations, but instead did so via the parent?
If such a context tracked all the live allocations in some form of list,
it could then free them from the parent at reset time. In other words,
it'd proxy all memory management via the parent, only adding a separate
name, and tracking of all live chunks.
Obviously such a context would be less efficient to reset than a plain
aset.c one - but I don't think that'd matter much for these types of
use-cases.  The big advantage in this case would be that we wouldn't
have separate two separate "blocks" for each index cache entry, but
instead allocations could all be done within CacheMemoryContext.
Does that sound like a sensible idea?
Greetings,
Andres Freund
			
		Andres Freund <andres@anarazel.de> writes:
> I was responding to a question about postgres' per-backend memory usage,
> making me look at the various contexts below CacheMemoryContext.  There
> is pretty much always a significant number of contexts below, one for
> each index:
>     index info: 2048 total in 2 blocks; 568 free (1 chunks); 1480 used: pg_class_tblspc_relfilenode_index
Yup.
> But what if we had a new type of memory context that did not itself
> manage memory underlying allocations, but instead did so via the parent?
> If such a context tracked all the live allocations in some form of list,
> it could then free them from the parent at reset time. In other words,
> it'd proxy all memory management via the parent, only adding a separate
> name, and tracking of all live chunks.
I dunno, that seems like a *lot* of added overhead, and opportunity for
bugs.  Maybe it'd be all right for contexts in which alloc/dealloc is
very infrequent.  But why not just address this problem by reducing the
allocset blocksize parameter (some more) for these index contexts?
I'd even go a bit further, and suggest that the right way to exploit
our knowledge that these contexts' contents don't change much is to
go the other way, and reduce not increase their per-chunk overhead.
I've wanted for some time to build a context type that doesn't support
pfree() but just makes it a no-op, and doesn't round request sizes up
further than the next maxalign boundary.  Without pfree we don't need
a normal chunk header; the minimum requirement of a context pointer
is enough.  And since we aren't going to be recycling any chunks, there's
no need to try to standardize their sizes.  This seems like it'd be ideal
for cases like the index cache contexts.
(For testing purposes, the generation.c context type might be close
enough for this, and it'd be easier to shove in.)
            regards, tom lane
			
		On Mon, Dec 16, 2019 at 03:35:12PM -0800, Andres Freund wrote: >Hi, > >I was responding to a question about postgres' per-backend memory usage, >making me look at the various contexts below CacheMemoryContext. There >is pretty much always a significant number of contexts below, one for >each index: > > CacheMemoryContext: 524288 total in 7 blocks; 8680 free (0 chunks); 515608 used > index info: 2048 total in 2 blocks; 568 free (1 chunks); 1480 used: pg_class_tblspc_relfilenode_index > index info: 2048 total in 2 blocks; 960 free (0 chunks); 1088 used: pg_statistic_ext_relid_index > index info: 2048 total in 2 blocks; 976 free (0 chunks); 1072 used: blarg_pkey > index info: 2048 total in 2 blocks; 872 free (0 chunks); 1176 used: pg_index_indrelid_index > index info: 2048 total in 2 blocks; 600 free (1 chunks); 1448 used: pg_attrdef_adrelid_adnum_index > index info: 2048 total in 2 blocks; 656 free (2 chunks); 1392 used: pg_db_role_setting_databaseid_rol_index > index info: 2048 total in 2 blocks; 544 free (2 chunks); 1504 used: pg_opclass_am_name_nsp_index > index info: 2048 total in 2 blocks; 928 free (2 chunks); 1120 used: pg_foreign_data_wrapper_name_index > index info: 2048 total in 2 blocks; 960 free (2 chunks); 1088 used: pg_enum_oid_index > index info: 2048 total in 2 blocks; 600 free (1 chunks); 1448 used: pg_class_relname_nsp_index > index info: 2048 total in 2 blocks; 960 free (2 chunks); 1088 used: pg_foreign_server_oid_index > index info: 2048 total in 2 blocks; 960 free (2 chunks); 1088 used: pg_publication_pubname_index >... > index info: 3072 total in 2 blocks; 1144 free (2 chunks); 1928 used: pg_conversion_default_index >... > >while I also think we could pretty easily reduce the amount of memory >used for each index, I want to focus on something else here: > >We waste a lot of space due to all these small contexts. Even leaving >aside the overhead of the context and its blocks - not insignificant - >they are mostly between ~1/2 a ~1/4 empty. > >At the same time we probably don't want to inline all of them into >CacheMemoryContext - too likely to introduce bugs, and too hard to >maintain leak free. > > >But what if we had a new type of memory context that did not itself >manage memory underlying allocations, but instead did so via the parent? >If such a context tracked all the live allocations in some form of list, >it could then free them from the parent at reset time. In other words, >it'd proxy all memory management via the parent, only adding a separate >name, and tracking of all live chunks. > >Obviously such a context would be less efficient to reset than a plain >aset.c one - but I don't think that'd matter much for these types of >use-cases. The big advantage in this case would be that we wouldn't >have separate two separate "blocks" for each index cache entry, but >instead allocations could all be done within CacheMemoryContext. > >Does that sound like a sensible idea? > I do think it's an interesting idea, worth exploring. I agree it's probably OK if the proxy contexts are a bit less efficient, but I think we can restrict their use to places where that's not an issue (i.e. low frequency of resets, small number of allocated chunks etc.). And if needed we can probably find ways to improve the efficiency e.g. by replacing the linked list with a small hash table or something (to speed-up pfree etc.). Or something. I think the big question is what this would mean for the parent context. Because suddenly it's a mix of chunks with different life spans, which would originally be segregared in different malloc-ed blocks. And now that would not be true, so e.g. after deleting the child context the memory would not be freed but just moved to the freelist. It would also confuse MemoryContextStats, which would suddenly not realize some of the chunks are actually "owned" by the child context. Maybe this could be improved, but only partially (unless we'd want to have a per-chunk flag if it's owned by the context or by a proxy). Not sure if this would impact accounting (e.g. what if someone creates a custom aggregate, creating a separate proxy context per group?). Would that work or not? Also, would this need to support nested proxy contexts? That might complicate things quite a bit, I'm afraid. FWIW I don't know answers to these questions. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From: Andres Freund <andres@anarazel.de> > We waste a lot of space due to all these small contexts. Even leaving > aside the overhead of the context and its blocks - not insignificant - > they are mostly between ~1/2 a ~1/4 empty. > > > But what if we had a new type of memory context that did not itself > manage memory underlying allocations, but instead did so via the parent? > If such a context tracked all the live allocations in some form of list, > it could then free them from the parent at reset time. In other words, > it'd proxy all memory management via the parent, only adding a separate > name, and tracking of all live chunks. It sounds like that it will alleviate the memory bloat caused by SAVEPOINT and RELEASE, which leave CurTransactionContextfor each subtransaction. The memory overuse got Linux down when our customer's batch application ranmillions of SQL statements in a transaction with psqlODBC. psqlODBC uses savepoints by default to enable statement rollback. (I guess this issue of one memory context per subtransaction caused the crash of Amazon Aurora on the Prime Day last year.) Regards Takayuki Tsunakawa
Hi, On 2019-12-16 18:58:36 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > But what if we had a new type of memory context that did not itself > > manage memory underlying allocations, but instead did so via the parent? > > If such a context tracked all the live allocations in some form of list, > > it could then free them from the parent at reset time. In other words, > > it'd proxy all memory management via the parent, only adding a separate > > name, and tracking of all live chunks. > > I dunno, that seems like a *lot* of added overhead, and opportunity for > bugs. What kind of bugs are you thinking of? > Maybe it'd be all right for contexts in which alloc/dealloc is > very infrequent. I don't think the overhead would be enough to matter even for moderaly common cases. Sure, another 16bytes of overhead isn't free, nor is the indirection upon allocation/free, but it's also not that bad. I'd be surprised if it didn't turn out to be cheaper in a lot of cases, actually, due to not needing a separate init block etc. Obviously it'd make no sense to use such a context for cases with very frequent allocations (say parsing, copying a node tree), or where bulk deallocations of a lot of small allocations is important - but there's plenty other types of cases. > But why not just address this problem by reducing the allocset > blocksize parameter (some more) for these index contexts? Well, but what would we set it to? The total allocated memory sizes for different indexes varies between ~1kb and 4kb. And we'll have to cope with that without creating waste again. We could allow much lower initial and max block sizes for aset, I guess, so anything large gets to be its own malloc() block. > I'd even go a bit further, and suggest that the right way to exploit > our knowledge that these contexts' contents don't change much is to > go the other way, and reduce not increase their per-chunk overhead. Yea, I was wondering about that too. However, while there's also a good number of small allocations, a large fraction of the used space is actually larger allocations. And using a "alloc only" context doesn't really address the fact that the underlying memory blocks are quite wasteful - especially given that this data essentially lives forever. For the specific case of RelationInitIndexAccessInfo(), allocations that commonly live for the rest of the backend's life and are frequent enough of them to matter, it might be worth micro-optimizing the allocations. E.g. not doing ~7 separate allocations within a few lines... Not primarily because of the per-allocation overheads, but more because that'd allow to size things right directly. Greetings, Andres Freund
Hi, On 2019-12-17 01:12:43 +0100, Tomas Vondra wrote: > On Mon, Dec 16, 2019 at 03:35:12PM -0800, Andres Freund wrote: > > But what if we had a new type of memory context that did not itself > > manage memory underlying allocations, but instead did so via the parent? > > If such a context tracked all the live allocations in some form of list, > > it could then free them from the parent at reset time. In other words, > > it'd proxy all memory management via the parent, only adding a separate > > name, and tracking of all live chunks. > > > > Obviously such a context would be less efficient to reset than a plain > > aset.c one - but I don't think that'd matter much for these types of > > use-cases. The big advantage in this case would be that we wouldn't > > have separate two separate "blocks" for each index cache entry, but > > instead allocations could all be done within CacheMemoryContext. > > > > Does that sound like a sensible idea? > > > > I do think it's an interesting idea, worth exploring. > > I agree it's probably OK if the proxy contexts are a bit less efficient, > but I think we can restrict their use to places where that's not an > issue (i.e. low frequency of resets, small number of allocated chunks > etc.). And if needed we can probably find ways to improve the efficiency > e.g. by replacing the linked list with a small hash table or something > (to speed-up pfree etc.). Or something. I don't think you'd need a hash table for efficiency - I was thinking of just using a doubly linked list. That allows O(1) unlinking. > I think the big question is what this would mean for the parent context. > Because suddenly it's a mix of chunks with different life spans, which > would originally be segregared in different malloc-ed blocks. And now > that would not be true, so e.g. after deleting the child context the > memory would not be freed but just moved to the freelist. I think in the case of CacheMemoryContext it'd not really be a large change - we already have vastly different lifetimes there, e.g. for the relcache entries themselves. I could also see using something like this for some of the executor sub-contexts - they commonly have only very few allocations, but need to be resettable individually. > It would also confuse MemoryContextStats, which would suddenly not > realize some of the chunks are actually "owned" by the child context. > Maybe this could be improved, but only partially (unless we'd want to > have a per-chunk flag if it's owned by the context or by a proxy). I'm not sure it'd really be worth fixing this fully, tbh. Maybe just reporting at MemoryContextStats time whether a sub-context is included in the parent's total or not. > Not sure if this would impact accounting (e.g. what if someone creates a > custom aggregate, creating a separate proxy context per group?). Would > that work or not? I'm not sure what problem you're thinking of? > Also, would this need to support nested proxy contexts? That might > complicate things quite a bit, I'm afraid. I mean, it'd probably not be a great idea to do so much - due to increased overhead - but I don't see why it wouldn't work. If it actually is something that we'd want to make work efficiently at some point, it shouldn't be too hard to have code to walk up the chain of parent contexts at creation time to the next context that's not a proxy. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes:
> For the specific case of RelationInitIndexAccessInfo(), allocations that
> commonly live for the rest of the backend's life and are frequent enough
> of them to matter, it might be worth micro-optimizing the
> allocations. E.g. not doing ~7 separate allocations within a few
> lines...  Not primarily because of the per-allocation overheads, but
> more because that'd allow to size things right directly.
Hmm ... that would be worth trying, for sure, since it's so easy ...
            regards, tom lane