Обсуждение: [DOCS] pg_stat_statements max size clarification

Поиск
Список
Период
Сортировка

[DOCS] pg_stat_statements max size clarification

От
baron@xaprb.com
Дата:
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/9.6/static/pgstatstatements.html
Description:

From https://www.postgresql.org/docs/current/static/pgstatstatements.html

pg_stat_statements.max (integer)
pg_stat_statements.max is the maximum number of statements tracked by the
module (i.e., the maximum number of rows in the pg_stat_statements view). If
more distinct statements than that are observed, information about the
least-executed statements is discarded. The default value is 5000. This
parameter can only be set at server start.

I wonder if "least-executed" is correct. I'm not an expert and haven't
convinced myself of this by examining the code, but I think after N distinct
queryid's have been seen, then any additional ones are ignored. But that may
not be "least-executed" at all. It's "most-recent" instead. I think we need
a new phrase here.

Re: [DOCS] pg_stat_statements max size clarification

От
Peter Geoghegan
Дата:
On Sat, Jul 15, 2017 at 5:31 PM,  <baron@xaprb.com> wrote:
> I wonder if "least-executed" is correct. I'm not an expert and haven't
> convinced myself of this by examining the code, but I think after N distinct
> queryid's have been seen, then any additional ones are ignored. But that may
> not be "least-executed" at all. It's "most-recent" instead. I think we need
> a new phrase here.

It's most executed since tracking for the entry began, with a special
heuristic for queries that take a long time to execute, and might
therefore consistently be evicted before execution finishes and costs
are tallied (see "sticky entries" stuff for full details). Most
executed means the total number of calls, which may not be the best
thing to evict on the basis of, but certainly isn't too bad.

The way it actually works is that either 5% of all entries or 10
entries are evicted (whichever amount is greatest) once
pg_stat_statements.max entries are reached. You're right that this
means that the most marginal of entries cannot be usefully tracked,
but I doubt that that's much of a problem in practice. It's the usual
"recency versus frequency" cache eviction problem, but for query cost
tracking purposes if 5,000 entries or 10,000 entries is truly
insufficient, then pg_stat_statements probably isn't the right tool.

When that many entries seem insufficient that tends to be because
pg_stat_statements is arguably too discriminating about what
constitutes a distinct query/entry, but that's another problem.

--
Peter Geoghegan


Re: [DOCS] pg_stat_statements max size clarification

От
Tom Lane
Дата:
Peter Geoghegan <pg@bowt.ie> writes:
> On Sat, Jul 15, 2017 at 5:31 PM,  <baron@xaprb.com> wrote:
>> I wonder if "least-executed" is correct. I'm not an expert and haven't
>> convinced myself of this by examining the code, but I think after N distinct
>> queryid's have been seen, then any additional ones are ignored. But that may
>> not be "least-executed" at all. It's "most-recent" instead. I think we need
>> a new phrase here.

> It's most executed since tracking for the entry began, with a special
> heuristic for queries that take a long time to execute, and might
> therefore consistently be evicted before execution finishes and costs
> are tallied (see "sticky entries" stuff for full details). Most
> executed means the total number of calls, which may not be the best
> thing to evict on the basis of, but certainly isn't too bad.

> The way it actually works is that either 5% of all entries or 10
> entries are evicted (whichever amount is greatest) once
> pg_stat_statements.max entries are reached. You're right that this
> means that the most marginal of entries cannot be usefully tracked,
> but I doubt that that's much of a problem in practice. It's the usual
> "recency versus frequency" cache eviction problem, but for query cost
> tracking purposes if 5,000 entries or 10,000 entries is truly
> insufficient, then pg_stat_statements probably isn't the right tool.

The short answer, really, is that the algorithm is too complicated to be
worth explaining in the documentation --- and it's subject to change,
anyway.  But "least-executed" is a reasonable short description, since
frequency of use is a major factor in the decisions.  Certainly
"most-recent" is flat out wrong.

I am not sure whether this complaint is actually meant as a bug report
that the algorithm didn't seem to work well on the OP's use case.  If so,
we'd need a lot more details to have any hope of improving it (and the
documentation comments aren't the right submission forum, either).

            regards, tom lane


Re: [DOCS] pg_stat_statements max size clarification

От
Baron Schwartz
Дата:
I will mull over this and see if I can suggest a short phrase that points to the nuances without explaining them. I think the current phrasing draws attention to itself by seeming dubious, which creates a kind of cognitive dissonance for the reader. This report isn't mean to be a bug report on the functionality.

On Sat, Jul 15, 2017 at 10:17 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Peter Geoghegan <pg@bowt.ie> writes:
> On Sat, Jul 15, 2017 at 5:31 PM,  <baron@xaprb.com> wrote:
>> I wonder if &quot;least-executed&quot; is correct. I&#39;m not an expert and haven&#39;t
>> convinced myself of this by examining the code, but I think after N distinct
>> queryid&#39;s have been seen, then any additional ones are ignored. But that may
>> not be &quot;least-executed&quot; at all. It&#39;s &quot;most-recent&quot; instead. I think we need
>> a new phrase here.

> It's most executed since tracking for the entry began, with a special
> heuristic for queries that take a long time to execute, and might
> therefore consistently be evicted before execution finishes and costs
> are tallied (see "sticky entries" stuff for full details). Most
> executed means the total number of calls, which may not be the best
> thing to evict on the basis of, but certainly isn't too bad.

> The way it actually works is that either 5% of all entries or 10
> entries are evicted (whichever amount is greatest) once
> pg_stat_statements.max entries are reached. You're right that this
> means that the most marginal of entries cannot be usefully tracked,
> but I doubt that that's much of a problem in practice. It's the usual
> "recency versus frequency" cache eviction problem, but for query cost
> tracking purposes if 5,000 entries or 10,000 entries is truly
> insufficient, then pg_stat_statements probably isn't the right tool.

The short answer, really, is that the algorithm is too complicated to be
worth explaining in the documentation --- and it's subject to change,
anyway.  But "least-executed" is a reasonable short description, since
frequency of use is a major factor in the decisions.  Certainly
"most-recent" is flat out wrong.

I am not sure whether this complaint is actually meant as a bug report
that the algorithm didn't seem to work well on the OP's use case.  If so,
we'd need a lot more details to have any hope of improving it (and the
documentation comments aren't the right submission forum, either).

                        regards, tom lane