RE: BUG #15923: Prepared statements take way too much memory.

Поиск
Список
Период
Сортировка
От Daniel Migowski
Тема RE: BUG #15923: Prepared statements take way too much memory.
Дата
Msg-id 41ED3F5450C90F4D8381BC4D8DF6BBDCF02E19B4@EXCHANGESERVER.ikoffice.de
обсуждение исходный текст
Ответ на BUG #15923: Prepared statements take way too much memory.  (PG Bug reporting form <noreply@postgresql.org>)
Ответы Re: BUG #15923: Prepared statements take way too much memory.  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-bugs
Hello,

Thanks for all your replies, I try to combine the answers for the last mails on the subject.

------ In reply to Tom Lane -------

Creating a new type of MemoryContext (FlatMemoryContext?) sounds like a good idea for me, because it sounds minimal
invasive,I will try that. I believe that it doesn't need a maximum block size, does it? One could give a size for
blocksthat are requested by malloc, and every time someone pallocs a larger area this is directly malloced and
returned.I saw that AllocSec used an initial size which is doubled every time the previous size isn't sufficient
anymore.Should the new MemoryContext should behave identically?  

I wonder if this MemoryContext should make good use of a "finalize" function, which can be called after no work in the
contextis expected anymore, and which realloc's the allocated blocks to the really used size so the C-lib can reuse the
unusedmemory. Maybe even AllocSec could profit from it as a first step. Checking the stats the used vs. malloced mem is
ata low 60%.   

> This wouldn't take very much new code, and it would potentially offer up to perhaps 2X space savings compared to
aset.c,thanks to cutting the per-chunk overhead.  Whether that's enough to float your boat with respect to cached plans
isn'ttoo clear, though. 

No, boat will still sink, but it's a good start and servers are less likely to crash.

> A totally different idea is to make a variant version of copyObject that is intended to produce a compact form of a
nodetree, and does not create a separate palloc allocation for each node but just packs them as tightly as it can in
largerpalloc chunks. .. 

On the copyObject code that should be autogenerated, I thought a lot about how this would be possible with macros, but
couldn'tthink of anything, because one macro statement would have to produce source code in two distinct places (at
leastin two functions). Is there similar code somewhere else in the source tree ? 

See below for more on that subject

> ...serialization of node tree...

I also don't like serialization very much, because I already rewrote the Standard Serialization Mechanism in Java very
successfully,but now about the problems in case of graph type object dependencies (which don't seem to exist in
NodeSetsbecause then the objectCopy function would also have to cope with that and doesn't). But if everything else
failsin the end it might by a good idea. However, I believe the size of the rewritten query plan is many times the
originalnode tree and used often, so it might not be a good idea.  

------ In reply to Thomas Munro ------

Regarding your comments on the variant version of copyObject, I also believe that a FlatMemoryContext would result in
nearlythe same memory usage at a first glance, except for the Chunk headers Tom Lane also noticed. One should measure
howmuch space could really be saved when we don't need the chunk headers.  

But I like the calculate-memory variant one would get when the copyObject code is generated. The two phase variant
wouldnot even be slower because the overhead of pallocing every chunk for every node MUST be higher than a simple
calculationof the required space. This feels a bit like inlining the allocation logic into the copyObject code, and
wouldbe the best variant regarding speed AND size.  

------ In reply to Andres Freund ------

> Yea, I think we'd have to autogenerate them to go for that.  I'm kinda hoping that maybe John Naylor would take a
look,after he made all our lives quite a bit easier with the other catalog script work... In the 'pipe dream' version
ofthis, that script would be able to generate code to adjust pointers after a memcpy... That'd make copyObject() of
suchflattened trees much cheaper. 

Thanks for the input, I will have a look at it. When copyObject on such trees is such a common operation, it might be
feasibleto store the length of such a readonly-tree somewhere within the tree itself, so one knows the exact length of
anode including all its subnodes, which "should" always be in a contiguous range of memory anyway, and we would just
haveto apply an offset to all pointers within the copied area.  

Or, when we come back to the two-pass idea of preallocating a large chunk of memory for the nodes, one could use the
samefunction to calculate the range on the fly.  

I see the problem here that the current implementation of copyObject (which is used in many placed in the code) would
haveto be aware that the source nodes are allocated in a flat MemoryContext for this to work through everywhere in the
codebase.This could be achived if a chunk header for the nodes is not omitted (althought the chunk header would not be
setby the MemoryContext but by the copyNode implementation for FlatMemoryContext itself. If the current memory isn't
flat,normal pallocs have to be done.). Does code exist with the assumption that every node resides in its own allocated
chunk?

> It'd sure be interesting to write a function that computes stats about which types of nodes consume how much memory.
Shouldn'tbe too hard. I'm mildly suspecting that a surprisingly large overhead in complex query trees comes from target
liststhat are the same in many parts of the tree. 

This would be the second part of the refactoring. I was wondering if one could reduce the data that is stored in the
treein the first place by using references to existing structures instead.  

------ Conclusion -------

* A FlatMemoryContext would reduce memory usage by 40%(?), and the current copyNode infrastructure could remain as is.
* Using a code generator for the nodes is commonly agreed to be a good idea (and the amount of node types in nodes.h
makeme believe that the amount of hand written code would drop significantly when using one).  
* An autogenerated copyObjectFlat would be able to determine the total size of a node and its descendants and could
copya tree with a single palloc, making FlatMemoryContext obsolete and even yielding a higher memory reduction.  
* IF we keep the ChunkHeader in FlatMemoryContext, then the "standard" copy node would be able to fallback to a very
fastmemcopy+pointeroffsetchange variant. The profit here might outweigh the few percent of memory we gain when we skip
theheader. But then we have no memory gains vs. using the FlatMemoryContext alone. 

So:

* A FlatMemoryContext is a good idea and I will try to implement it.
* Code generation could speed things up significantly and a variant would just contribute to memory if we skip chunk
headers(if that is allowed at all). 
* There is more work to do afterwards, because the trees are still too big.

------------------------------------------------

Some more questions:
* Why are MemoryContextAllocZeroAligned and MemoryContextAllocZero the same function?
* I wonder if we just cannot drop the raw_parse_tree altogether for large statements when the query tree has been
generated.I know you need it in case the definition of tables etc. change to recreate the query tree, but wouldn't it
bepossible the recreate the raw_parse_tree from the query_string in that very rare event also? The query_string is
alreadythe most compacted form of the raw_parse_tree, right?   
* Does code exist with the assumption that every node resides in its own allocated chunk? (asked already above)
* Where can I find the existing code generation script? Is it called during make or before checkin?

------------------------------------------------

Thanks for reading through this. I sadly cannot provide an example because this would mean to include my companies
wholedatabase schema and I just gave up after spending 30 minutes to clean it up enough, but the complexity of the view
canstill be viewed at https://explain.depesz.com/s/gN2  

Regards,
Daniel Migowski



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: BUG #15883: Event Trigger Firing Matrix Table is incomplete
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: BUG #15883: Event Trigger Firing Matrix Table is incomplete