Обсуждение: Patch to clean Query after rewrite-and-analyze - reduces memusage upto 50% - increases TPS by up to 50%
Hello, While examining the reasons for excessive memory usage in prepared statements I noticed that RTE_JOIN-kind RTEs contain a bunch of columnNames and joinaliasvars, that are irrelevant after the Query after has been rewritten. I have some queries that join about 20 tables and select only a few values, mainly names of objects from those tables. The attached patch adds a small cleanup function that iterates thought the query and cleans stuff up. I may have missed some places that could also be cleaned up but for now the memory requirements for my largest statements have dropped from 31.2MB to 10.4MB with this patch. After the statement has be executed seven times a generic plan is stored in the statement, resulting in an extra 8,8MB memory usage, but still this makes a difference of more than 50% total. But the most interesting thing was that this patch reduced query execution time by 50% (~110ms vs. 55ms) when no generic was created yet, and by 35% (7.5ms vs. 5.1ms) when the global query plan had been created. All tests still pass with my cleanup command, but I am afraid the tests might not contain queries that still need that info after statement preparation. If anyone might have a look at it and hint me to a situation where this might crash later on? Also, would it be possible for someone to run a benchmark after applying this test to ensure my findings are not totally off? I tested on a Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz with SSDs, but everything should have been in memory when I ran the test. Regards, Daniel Migowski
Вложения
Daniel Migowski <dmigowski@ikoffice.de> writes: > While examining the reasons for excessive memory usage in prepared > statements I noticed that RTE_JOIN-kind RTEs contain a bunch of > columnNames and joinaliasvars, that are irrelevant after the Query after > has been rewritten. Uh, they're not irrelevant to planning, nor to EXPLAIN. I don't know how thoroughly you tested this patch, but it seems certain to break things. As far as the final plan goes, setrefs.c's add_rte_to_flat_rtable already drops RTE infrastructure that isn't needed by either the executor or EXPLAIN. But we need it up to that point. (After thinking a bit, I'm guessing that it seemed not to break because your tests never actually exercised the generic-plan path, or perhaps there was always a plancache invalidation before we tried to use the query_list submitted by PrepareQuery. I wonder if this is telling us something about the value of having PrepareQuery do that at all, rather than just caching the raw parse tree and calling it a day.) A few tips on submitting patches: * Providing concrete test cases to back up improvement claims is a good idea. * Please try to make your code follow established PG style. Ignoring project conventions about whitespace and brace layout just makes your code harder to read. (A lot of us would just summarily run the code through pgindent before trying to review it.) * Please don't include random cosmetic changes (eg renaming of unrelated variables) in a patch that's meant to illustrate some specific functional change. It just confuses matters. regards, tom lane
Am 03.08.2019 um 18:38 schrieb Tom Lane: > Daniel Migowski <dmigowski@ikoffice.de> writes: >> While examining the reasons for excessive memory usage in prepared >> statements I noticed that RTE_JOIN-kind RTEs contain a bunch of >> columnNames and joinaliasvars, that are irrelevant after the Query after >> has been rewritten. > Uh, they're not irrelevant to planning, nor to EXPLAIN. I don't know how > thoroughly you tested this patch, but it seems certain to break things. > > As far as the final plan goes, setrefs.c's add_rte_to_flat_rtable already > drops RTE infrastructure that isn't needed by either the executor or > EXPLAIN. But we need it up to that point. OK, I will investigate. > (After thinking a bit, I'm guessing that it seemed not to break because > your tests never actually exercised the generic-plan path, or perhaps > there was always a plancache invalidation before we tried to use the > query_list submitted by PrepareQuery. I wonder if this is telling us > something about the value of having PrepareQuery do that at all, > rather than just caching the raw parse tree and calling it a day.) Having PreparyQuery do _what_ exactly? Sorry, I am still learning how everything works here. It seems like the patch crashes the postmaster when I use JOINSs directly in the PreparedStatement, not when I just place all the Joins in views. I will also look into this further. > A few tips on submitting patches: > > * Providing concrete test cases to back up improvement claims is a > good idea. OK, I will provide. > * Please try to make your code follow established PG style. Ignoring > project conventions about whitespace and brace layout just makes your > code harder to read. (A lot of us would just summarily run the code > through pgindent before trying to review it.) OK. I just tried to have git diff stop marking my indentations red, but I am also new to git, will use pgindent now. > * Please don't include random cosmetic changes (eg renaming of unrelated > variables) in a patch that's meant to illustrate some specific functional > change. It just confuses matters. Was useful here because I had to declare a ListCell anyway, and ListCell's in other places where named 'lc' not 'l', and 'l' was usually used for lists, so I thought reusal was nice, but OK.
Daniel Migowski <dmigowski@ikoffice.de> writes: > Am 03.08.2019 um 18:38 schrieb Tom Lane: >> (After thinking a bit, I'm guessing that it seemed not to break because >> your tests never actually exercised the generic-plan path, or perhaps >> there was always a plancache invalidation before we tried to use the >> query_list submitted by PrepareQuery. I wonder if this is telling us >> something about the value of having PrepareQuery do that at all, >> rather than just caching the raw parse tree and calling it a day.) > Having PreparyQuery do _what_ exactly? Sorry, I am still learning how > everything works here. A plancache entry stores a raw parsetree (which is, at least theoretically, an immutable representation of the parsed string), and an analyzed-and-rewritten parsetree, and optionally a generic plan tree. PrepareQuery is setting up the first two of these, but only the raw parsetree is really essential ... or for that matter, it might be possible to store just the source string representation and re-parse that. It's all about space versus speed tradeoffs. Our current philosophy is that if you bothered to prepare a query it's because you want speed, but perhaps that assumption needs rethinking. > It seems like the patch crashes the postmaster when I use JOINSs > directly in the PreparedStatement, not when I just place all the Joins > in views. I will also look into this further. Um. Now that I think about it, the regression tests probably don't try to PREPARE any complex queries, so it's not impossible that they just missed the fact that you were storing broken parse trees for joins. regards, tom lane