Re: Add session statistics to pg_stat_database

Поиск
Список
Период
Сортировка
От Laurenz Albe
Тема Re: Add session statistics to pg_stat_database
Дата
Msg-id 27530e647d14ef49199694ea3c376935fa7285e8.camel@cybertec.at
обсуждение исходный текст
Ответ на Re: Add session statistics to pg_stat_database  (Magnus Hagander <magnus@hagander.net>)
Ответы Re: Add session statistics to pg_stat_database  (Laurenz Albe <laurenz.albe@cybertec.at>)
Список pgsql-hackers
On Tue, 2020-12-01 at 17:32 +0100, Magnus Hagander wrote:
> > I have changed "connections" to "sessions" and renamed the new
> > column "connections" to "session_count".
> > 
> > I think that most people will understand a session as started after a successful
> > connection.
> 
> Yeah, I agree, and as long as it's consistent we don't need more explanations than that.
> 
> Further int he views, it's a bit strange to have session_count and aborted_session, but I'm not
>  sure what to suggest. "aborted_session_count" seems too long. Maybe just "sessions" instead
>  of "session_count" -- no other counters actually have the "_count" suffix.

"sessions" is fine, I think; I changed the name.

> > > I wonder if there would also be a way to count "sessions that crashed" as well.
> > > That is,the ones that failed in a way that caused the postmaster to restart the system.
> >
> > Sure, a crash count would be useful.  I don't know if it is easy for the stats collector
> > to tell the difference between a start after a backend crash and - say - starting from
> > a base backup.
> > 
> > I think that that would be material for another patch, and I don't think it should go
> > to "pg_stat_database", because a) it might be hard to tell to which database the crashed
> > backend was attached, b) it might be a background process that doesn't belong to a database
> > and c) if the crash were caused by - say - corruption in a shared catalog, it would be
> > misleading
> 
> I'm not sure it is outside the scope of this patch, because I think it might be easier to
>  do than I (and I think you) first thought. We don't need to track which database crashed --
>  if we track all *other* ways a database exits, then crashes are all that remains.
> 
> So in fact, we *almost* have all the data we need already. We have the number of sessions
>  started. We have the number of sessions "aborted". if we also had the number of sessions
>  that were closed normally, then whatever is "left" would be the number of sessions crashed.
>  And we do already, in your patch, send the message in the case of both aborted and
>  non-aborted sessions. So we just need to keep track of both in the statsfile
>  (which we don't now), and we'd more or less have it, wouldn't we?

There is one problem with that: the statistics collector is not guaranteed to get all
messages, right?  If a disconnection statistics UDP datagram doesn't reach the statistics
collector, that connection
would end up being reported as crashed.
That would alarm people unnecessarily and make the crash statistics misleading.

> However, some thinking around that also leads me to another question which is very much
>  in scope for this patch regardless, which is what about shutdown and admin termination.
>  Right now, when you do a "pg_ctl stop" on the database, all sessions count as aborted.
>  Same thing for a pg_terminate_backend(). I wonder if this is also a case that would be
>  useful to track as a separate thing? One could argue that the docs in your patch say
>  aborted means "terminated by something else than a regular client disconnection".
> But that's true for a "shutdown", but not for a crash, so whichever way we go with crashes
>  it's slightly incorrect.

> But thinking from a usability perspective, wouldn't what we want more be something
>  like <closed by correct disconnect>, <closed by abnormal disconnect>, <closed by admin>,
>  <crash>?
> 
> What do you think of adapting it to that?
> 
> Basically, that would change pgStatSessionDisconnectedNormally into instead being an
>  enum of reasons, which could be normal disconnect, abnormal disconnect and admin.
>  And we'd track all those three as separate numbers in the stats file, meaning we could
>  then calculate the crash by subtracting all three from the total number of sessions?

I think at least "closed by admin" might be interesting; I'll have a look.
I don't think we have to specifically count "closed by normal disconnect", because
that should be the rule and could be more or less deduced from the other numbers
(with the uncertainty mentioned above).

> (Let me know if you think the idea could work and would prefer it if I worked up a
>  complete suggestion based on it rather than just spitting ideas)

Thanks for the offer, and I'll get back to it if I get stuck.
But I'm ready to do the grunt work, so that you can spend your precious
committer cycles elsewhere :^)

I'll have a go at "closed by admin", meanwhile here is patch v7 with the renaming
"session_count -> sessions".

Yours,
Laurenz Albe

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: SELECT INTO deprecation
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: Commitfest 2020-11 is closed