Re: Adding column "mem_usage" to view pg_prepared_statements

Поиск
Список
Период
Сортировка
От Daniel Migowski
Тема Re: Adding column "mem_usage" to view pg_prepared_statements
Дата
Msg-id adb82679-8109-fd5c-ede0-12c4b9150e94@ikoffice.de
обсуждение исходный текст
Ответ на Re: Adding column "mem_usage" to view pg_prepared_statements  (Andres Freund <andres@anarazel.de>)
Ответы Re: Adding column "mem_usage" to view pg_prepared_statements  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Am 05.08.2019 um 19:16 schrieb Andres Freund:
> On 2019-07-28 06:20:40 +0000, Daniel Migowski wrote:
>> how do you want to generalize it? Are you thinking about a view solely
>> for the display of the memory usage of different objects?
> I'm not quite sure. I'm just not sure that adding separate
> infrastructure for various objects is a sutainable approach. We'd likely
> want to have this for prepared statements, for cursors, for the current
> statement, for various caches, ...
>
> I think an approach would be to add an 'owning_object' field to memory
> contexts, which has to point to a Node type if set. A table returning reporting
> function could recursively walk through the memory contexts, starting at
> TopMemoryContext. Whenever it encounters a context with owning_object
> set, it prints a string version of nodeTag(owning_object). For some node
> types it knows about (e.g. PreparedStatement, Portal, perhaps some of
> the caches), it prints additional metadata specific to the type (so for
> prepared statement it'd be something like 'prepared statement', '[name
> of prepared statement]'), and prints information about the associated
> context and all its children.
I understand. So it would be something like the output of 
MemoryContextStatsInternal, but in table form with some extra columns. I 
would have loved this extra information already in 
MemoryContextStatsInternal btw., so it might be a good idea to upgrade 
it first to find the information and wrap a table function over it 
afterwards.
> The general context information probably should be something like:
> context_name, context_ident,
> context_total_bytes, context_total_blocks, context_total_freespace, context_total_freechunks, context_total_used,
context_total_children
> context_self_bytes, context_self_blocks, context_self_freespace, context_self_freechunks, context_self_used,
context_self_children,
>
> It might make sense to have said function return a row for the contexts
> it encounters that do not have an owner set too (that way we'd e.g. get
> CacheMemoryContext handled), but then still recurse.
A nice way to learn about the internals of the server and to analyze the 
effects of memory reducing enhancements.
> Arguably the proposed owning_object field would be a bit redundant with
> the already existing ident/MemoryContextSetIdentifier field, which
> e.g. already associates the query string with the contexts used for a
> prepared statement. But I'm not convinced that's going to be enough
> context in a lot of cases, because e.g. for prepared statements it could
> be interesting to have access to both the prepared statement name, and
> the statement.
The identifier seems to be more like a category at the moment, because 
it does not seem to hold any relevant information about the object in 
question. So a more specific name would be nice.
> The reason I like something like this is that we wouldn't add new
> columns to a number of views, and lack views to associate such
> information to for some objects. And it'd be disproportional to add all
> the information to numerous places anyway.
I understand your argumentation, but things like Cursors and Portals are 
rather short living while prepared statements seem to be the place where 
memory really builds up.
> One counter-argument is that it'd be more expensive to get information
> specific to prepared statements (or other object types) that way. I'm
> not sure I buy that that's a problem - this isn't something that's
> likely going to be used at a high frequency. But if it becomes a
> problem, we can add a function that starts that process at a distinct
> memory context (e.g. a function that does this just for a single
> prepared statement, identified by name) - but I'd not start there.
I also see no problem here, and with Konstantin Knizhnik's autoprepare I 
wouldn't use this very often anyway, more just for monitoring purposes, 
where I don't care if my query is a bit more complex.
>> While being interesting I still believe monitoring the mem usage of
>> prepared statements is a bit more important than that of other objects
>> because of how they change memory consumption of the server without
>> using any DDL or configuration options and I am not aware of other
>> objects with the same properties, or are there some? And for the other
>> volatile objects like tables and indexes and their contents PostgreSQL
>> already has it's information functions.
> Plenty other objects have that property. E.g. cursors. And for the
> catalog/relation/... caches it's even more pernicious - the client might
> have closed all its "handles", but we still use memory (and it's
> absolutely crucial for performance).

Maybe we can do both? Add a single column to pg_prepared_statements, and 
add another table for the output of MemoryContextStatsDetail? This has 
the advantage that the single real memory indicator useful for end users 
(to the question: How much mem takes my sh*t up?) is in 
pg_prepared_statements and some more intrinsic information in a detail 
view.

Thinking about the latter I am against such a table, at least in the 
form where it gives information like context_total_freechunks, because 
it would just be useful for us developers. Why should any end user care 
for how many chunks are still open in a MemoryContext, except when he is 
working on C-style extensions. Could just be a source of confusion for 
them.

Let's think about the goal this should have: The end user should be able 
to monitor the memory consumption of things he's in control of or could 
affect the system performance. Should such a table automatically 
aggregate some information? I think so. I would not add more than two 
memory columns to the view, just mem_used and mem_reserved. And even 
mem_used is questionable, because in his eyes only the memory he cannot 
use for other stuff because of object x is important for him (that was 
the reason I just added one column). He would even ask: WHY is there 50% 
more memory reserved than used, and how I can optimize it? (Would lead 
to more curious PostgreSQL developers maybe, so that's maybe a plus).

Something that also clearly speaks FOR such a table and against my 
proposal is, that if someone cares for memory, he would most likely care 
for ALL his memory, and in that case monitoring prepared statements 
would just be a small subset of stuff to monitor. Ok, I am defeated and 
will rewrite my patch if the next proposal finds approval:

I would propose a table pg_mem_usage containing the columns 
object_class, name, detail, mem_usage (rename them if it fits the style 
of the other tables more). The name would be empty for some objects like 
the unnamed prepared statement, the query strings would be in the detail 
column. One could add a final "Other" row containing the mem no specific 
output line has been accounted for. Also it could contain lines for 
Cursors and other stuff I am to novice to think of here.

And last: A reason why still we need a child-parent-relationship in this 
table (and distinct this_ and total_ mem functions), is that prepared 
statements start up to use much more memory when the Generic Plan is 
stored in it after a few uses. As a user I always had the assumption 
that prepared a statement would already do all the required work to be 
fast, but a statement just becomes blazingly fast when the Generic Plan 
is available (and used), and it would be nice to see for which 
statements that plan has already been generated to consume his memory. I 
believe the reason for this would be the fear of excessive memory usage.

On the other hand: The Generic Plan had been created for the first 
invocation of the prepared statement, why not store it immediatly. It is 
a named statement for a reason that it is intended to be reused, even 
when it is just twice, and since memory seems not to be seen as a scarce 
resource in this context why not store that immediately. Would drop the 
need for a hierarchy here also.

Any comments?

Regards,
Daniel Migowski




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fabien COELHO
Дата:
Сообщение: Re: pgbench - implement strict TPC-B benchmark
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Adding column "mem_usage" to view pg_prepared_statements