Обсуждение: Patch to clean Query after rewrite-and-analyze - reduces memusage upto 50% - increases TPS by up to 50%

Поиск
Список
Период
Сортировка

Patch to clean Query after rewrite-and-analyze - reduces memusage upto 50% - increases TPS by up to 50%

От
Daniel Migowski
Дата:
Hello,

While examining the reasons for excessive memory usage in prepared 
statements I noticed that RTE_JOIN-kind RTEs contain a bunch of 
columnNames and joinaliasvars, that are irrelevant after the Query after 
has been rewritten. I have some queries that join about 20 tables and 
select only a few values, mainly names of objects from those tables.

The attached patch adds a small cleanup function that iterates thought 
the query and cleans stuff up. I may have missed some places that could 
also be cleaned up but for now the memory requirements for my largest 
statements have dropped from 31.2MB to 10.4MB with this patch.

After the statement has be executed seven times a generic plan is stored 
in the statement, resulting in an extra 8,8MB memory usage, but still 
this makes a difference of more than 50% total.

But the most interesting thing was that this patch reduced query 
execution time by 50% (~110ms vs. 55ms) when no generic was created yet, 
and by 35% (7.5ms vs. 5.1ms) when the global query plan had been created.

All tests still pass with my cleanup command, but I am afraid the tests 
might not contain queries that still need that info after statement 
preparation.

If anyone might have a look at it and hint me to a situation where this 
might crash later on? Also, would it be possible for someone to run a 
benchmark after applying this test to ensure my findings are not totally 
off? I tested on a Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz with SSDs, 
but everything should have been in memory when I ran the test.

Regards,
Daniel Migowski



Вложения
Daniel Migowski <dmigowski@ikoffice.de> writes:
> While examining the reasons for excessive memory usage in prepared 
> statements I noticed that RTE_JOIN-kind RTEs contain a bunch of 
> columnNames and joinaliasvars, that are irrelevant after the Query after 
> has been rewritten.

Uh, they're not irrelevant to planning, nor to EXPLAIN.  I don't know how
thoroughly you tested this patch, but it seems certain to break things.

As far as the final plan goes, setrefs.c's add_rte_to_flat_rtable already
drops RTE infrastructure that isn't needed by either the executor or
EXPLAIN.  But we need it up to that point.

(After thinking a bit, I'm guessing that it seemed not to break because
your tests never actually exercised the generic-plan path, or perhaps
there was always a plancache invalidation before we tried to use the
query_list submitted by PrepareQuery.  I wonder if this is telling us
something about the value of having PrepareQuery do that at all,
rather than just caching the raw parse tree and calling it a day.)

A few tips on submitting patches:

* Providing concrete test cases to back up improvement claims is a
good idea.

* Please try to make your code follow established PG style.  Ignoring
project conventions about whitespace and brace layout just makes your
code harder to read.  (A lot of us would just summarily run the code
through pgindent before trying to review it.)

* Please don't include random cosmetic changes (eg renaming of unrelated
variables) in a patch that's meant to illustrate some specific functional
change.  It just confuses matters.

            regards, tom lane



Re: Patch to clean Query after rewrite-and-analyze - reduces memusageup to 50% - increases TPS by up to 50%

От
Daniel Migowski
Дата:
Am 03.08.2019 um 18:38 schrieb Tom Lane:
> Daniel Migowski <dmigowski@ikoffice.de> writes:
>> While examining the reasons for excessive memory usage in prepared
>> statements I noticed that RTE_JOIN-kind RTEs contain a bunch of
>> columnNames and joinaliasvars, that are irrelevant after the Query after
>> has been rewritten.
> Uh, they're not irrelevant to planning, nor to EXPLAIN.  I don't know how
> thoroughly you tested this patch, but it seems certain to break things.
>
> As far as the final plan goes, setrefs.c's add_rte_to_flat_rtable already
> drops RTE infrastructure that isn't needed by either the executor or
> EXPLAIN.  But we need it up to that point.
OK, I will investigate.
> (After thinking a bit, I'm guessing that it seemed not to break because
> your tests never actually exercised the generic-plan path, or perhaps
> there was always a plancache invalidation before we tried to use the
> query_list submitted by PrepareQuery.  I wonder if this is telling us
> something about the value of having PrepareQuery do that at all,
> rather than just caching the raw parse tree and calling it a day.)

Having PreparyQuery do _what_ exactly? Sorry, I am still learning how 
everything works here.

It seems like the patch crashes the postmaster when I use JOINSs 
directly in the PreparedStatement, not when I just place all the Joins 
in views. I will also look into this further.

> A few tips on submitting patches:
>
> * Providing concrete test cases to back up improvement claims is a
> good idea.
OK, I will provide.
> * Please try to make your code follow established PG style.  Ignoring
> project conventions about whitespace and brace layout just makes your
> code harder to read.  (A lot of us would just summarily run the code
> through pgindent before trying to review it.)
OK. I just tried to have git diff stop marking my indentations red, but 
I am also new to git, will use pgindent now.
> * Please don't include random cosmetic changes (eg renaming of unrelated
> variables) in a patch that's meant to illustrate some specific functional
> change.  It just confuses matters.

Was useful here because I had to declare a ListCell anyway, and 
ListCell's in other places where named 'lc' not 'l', and 'l' was usually 
used for lists, so I thought reusal was nice, but OK.





Daniel Migowski <dmigowski@ikoffice.de> writes:
> Am 03.08.2019 um 18:38 schrieb Tom Lane:
>> (After thinking a bit, I'm guessing that it seemed not to break because
>> your tests never actually exercised the generic-plan path, or perhaps
>> there was always a plancache invalidation before we tried to use the
>> query_list submitted by PrepareQuery.  I wonder if this is telling us
>> something about the value of having PrepareQuery do that at all,
>> rather than just caching the raw parse tree and calling it a day.)

> Having PreparyQuery do _what_ exactly? Sorry, I am still learning how 
> everything works here.

A plancache entry stores a raw parsetree (which is, at least
theoretically, an immutable representation of the parsed string),
and an analyzed-and-rewritten parsetree, and optionally a generic
plan tree.  PrepareQuery is setting up the first two of these,
but only the raw parsetree is really essential ... or for that
matter, it might be possible to store just the source string
representation and re-parse that.  It's all about space versus
speed tradeoffs.  Our current philosophy is that if you bothered
to prepare a query it's because you want speed, but perhaps that
assumption needs rethinking.

> It seems like the patch crashes the postmaster when I use JOINSs 
> directly in the PreparedStatement, not when I just place all the Joins 
> in views. I will also look into this further.

Um.  Now that I think about it, the regression tests probably don't
try to PREPARE any complex queries, so it's not impossible that
they just missed the fact that you were storing broken parse trees
for joins.

            regards, tom lane