Re: RFC: pg_stat_logmsg

Поиск
Список
Период
Сортировка
От Gurjeet Singh
Тема Re: RFC: pg_stat_logmsg
Дата
Msg-id CABwTF4VzMJaTLrpYtBj4n9r73TFjffzqpPFeUmJPrkY6cmHEAg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: RFC: pg_stat_logmsg  (Masahiko Sawada <sawada.mshk@gmail.com>)
Ответы Re: RFC: pg_stat_logmsg  (Joe Conway <mail@joeconway.com>)
Список pgsql-hackers
On Thu, Jul 6, 2023 at 12:37 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Sat, Jul 1, 2023 at 8:57 AM Joe Conway <mail@joeconway.com> wrote:
> >
> > The basic idea is to mirror how pg_stat_statements works, except the
> > logged messages keyed by filename, lineno, and elevel are saved with a
> > aggregate count. The format string is displayed (similar to a query
> > jumble) for context, along with function name and sqlerrcode.
> >
> >
> > Part of the thinking is that people with fleets of postgres instances
> > can use this to scan for various errors that they care about.
> > Additionally it would be useful to look for "should not happen" errors.

> I'm concerned that displaying the format string could not be helpful
> in some cases. For example, when raising an ERROR while reading WAL
> records, we typically write the error message stored in
> record->errormsg_buf:
>
> in ReadRecord():
>             if (errormsg)
>                 ereport(emode_for_corrupt_record(emode, xlogreader->EndRecPtr),
>                         (errmsg_internal("%s", errormsg) /* already
> translated */ ));
>
> In this case, the error message stored in pg_stat_logmsg is just '%s'.
> The filename and line number columns might help identify the actual
> error but it requires users to read the source code and cannot know
> the actual error message.

I believe that the number of such error messages, the ones with very
little descriptive content, is very low in Postgres code. These kinds
of messages are not prevalent, and must be used when code hits obscure
conditions, like seen in your example above. These are the kinds of
errors that Joe is referring to as "should not happen". In these
cases, even if the error message was descriptive, the user will very
likely have to dive deep into code to find out the real cause.

I feel that the unique combination of file name and line number is
useful information, even in cases where the format string not very
descriptive. So I believe the extension's behaviour in this regard is
acceptable.

In cases where the extension's output is not descriptive enough, the
user can use the filename:lineno pair to look for fully formed error
messages in the actual log files; they may have to make appropriate
changes to log_* parameters, though.

If we still feel strongly about getting the actual message for these
cases, perhaps we can develop a heuristic to identify such messages,
and use either full or a prefix of the 'message' field, instead of
'message_id' field. The heuristic could be: strlen(message_id) is too
short, or that message_id is all/overwhelmingly format specifiers
(e.g. '%s: %d').

The core benefit of this extension is to make it easy for the user to
discover error messages affecting their workload. The user may be
interested in knowing the most common log messages in their server
logs, so that they can work on reducing those errors or warnings. Or
they may be interested in knowing when their workload hits
unexpected/serious error messages, even if they're infrequent. The
data exposed by pg_stat_logmsg view would serve as a starting point,
but I'm guessing it would not be sufficient for their investigation.

On Fri, Jun 30, 2023 at 4:57 PM Joe Conway <mail@joeconway.com> wrote:
> I would love to get
> feedback on:
>
> 1/ the general concept
> 2/ the pg_stat_statement-like implementation
> 3/ contrib vs core vs external project

+1 for the idea; a monitoring system trained at this view can bubble
up problems to users that may otherwise go unnoticed. Piggybacking on,
or using pg_stat_statments implementation as a model seems fine. I
believe making this available as a contrib module hits the right
balance; not all installations may want this, hence in-core, always
collecting data, seems undesirable; if it's an external project, it
won't be discoverable, I see that as a very high bar for someone to
trust it and begin to use it.

Best regards,
Gurjeet
http://Gurje.et



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nathan Bossart
Дата:
Сообщение: Re: pgsql: Fix search_path to a safe value during maintenance operations.
Следующее
От: jian he
Дата:
Сообщение: Re: Add hint message for check_log_destination()