Обсуждение: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

Поиск
Список
Период
Сортировка

pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

От
Andres Freund
Дата:
Hi,

Since

commit 960869da0803427d14335bba24393f414b476e2c
Author: Magnus Hagander <magnus@hagander.net>
Date:   2021-01-17 13:34:09 +0100

    Add pg_stat_database counters for sessions and session time

pgstat_report_stat() does another timestamp computation via
pgstat_send_connstats(), despite typically having computed one just a few
lines before.

Given that timestamp computation isn't all that cheap, that's not great. Even
more, that additional timestamp computation makes things *less* accurate:

void
pgstat_report_stat(bool disconnect)
...
    /*
     * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL
     * msec since we last sent one, or the backend is about to exit.
     */
    now = GetCurrentTransactionStopTimestamp();
    if (!disconnect &&
        !TimestampDifferenceExceeds(last_report, now, PGSTAT_STAT_INTERVAL))
        return;

    /* for backends, send connection statistics */
    if (MyBackendType == B_BACKEND)
        pgstat_send_connstats(disconnect, last_report);

    last_report = now;

and then pgstat_send_connstats() does:
    /* session time since the last report */
    TimestampDifference(((last_report == 0) ? MyStartTimestamp : last_report),
                        GetCurrentTimestamp(),
                        &secs, &usecs);
    msg.m_session_time = secs * 1000000 + usecs;

We maintain last_report as GetCurrentTransactionStopTimestamp(), but then use
a separate timestamp in pgstat_send_connstats() to compute the difference from
last_report, which is then set to GetCurrentTransactionStopTimestamp()'s
return value.


I'm also not all that happy with sending yet another UDP packet for this. This
doubles the UDP packets to the stats collector in the common cases (unless
more than TABSTAT_QUANTUM tables have stats to report, or shared tables have
been accessed). We already send plenty of "summary" information via
PgStat_MsgTabstat, one of which is sent unconditionally, why do we need a
separate message for connection stats?


Alternatively we could just not send an update to connection stats every 500ms
for every active connection, but only do so at connection end. The database
stats would only contain stats for disconnected sessions, while the stats for
"live" connections are maintained via backend_status.c.  That'd give us *more*
information for less costs, because we then could see idle/active times for
individual connections.

That'd likely be more change than what we would want to do at this point in
the release cycle though. But I think we ought to do something about the
increased overhead...

Greetings,

Andres Freund



Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

От
Laurenz Albe
Дата:
On Sun, 2021-08-01 at 13:55 -0700, Andres Freund wrote:
> Since
> 
> commit 960869da0803427d14335bba24393f414b476e2c
> Author: Magnus Hagander <magnus@hagander.net>
> Date:   2021-01-17 13:34:09 +0100
> 
>     Add pg_stat_database counters for sessions and session time
> 
> pgstat_report_stat() does another timestamp computation via
> pgstat_send_connstats(), despite typically having computed one just a few
> lines before.
> 
> Given that timestamp computation isn't all that cheap, that's not great. Even
> more, that additional timestamp computation makes things *less* accurate:
> 
> void
> pgstat_report_stat(bool disconnect)
> ...
>     /*
>      * Don't send a message unless it's been at least PGSTAT_STAT_INTERVAL
>      * msec since we last sent one, or the backend is about to exit.
>      */
>     now = GetCurrentTransactionStopTimestamp();
>     if (!disconnect &&
>         !TimestampDifferenceExceeds(last_report, now, PGSTAT_STAT_INTERVAL))
>         return;
> 
>     /* for backends, send connection statistics */
>     if (MyBackendType == B_BACKEND)
>         pgstat_send_connstats(disconnect, last_report);
> 
>     last_report = now;
> 
> and then pgstat_send_connstats() does:
>     /* session time since the last report */
>     TimestampDifference(((last_report == 0) ? MyStartTimestamp : last_report),
>                         GetCurrentTimestamp(),
>                         &secs, &usecs);
>     msg.m_session_time = secs * 1000000 + usecs;
> 
> We maintain last_report as GetCurrentTransactionStopTimestamp(), but then use
> a separate timestamp in pgstat_send_connstats() to compute the difference from
> last_report, which is then set to GetCurrentTransactionStopTimestamp()'s
> return value.

Makes sense to me.  How about passing "now", which was just initialized from
GetCurrentTransactionStopTimestamp(), as additional parameter to
pgstat_send_connstats() and use that value instead of taking the current time?

> I'm also not all that happy with sending yet another UDP packet for this. This
> doubles the UDP packets to the stats collector in the common cases (unless
> more than TABSTAT_QUANTUM tables have stats to report, or shared tables have
> been accessed). We already send plenty of "summary" information via
> PgStat_MsgTabstat, one of which is sent unconditionally, why do we need a
> separate message for connection stats?

Are you suggesting that connection statistics should be shoehorned into
some other statistics message?  That would reduce the number of UDP packets,
but it sounds ugly and confusing to me.

> Alternatively we could just not send an update to connection stats every 500ms
> for every active connection, but only do so at connection end. The database
> stats would only contain stats for disconnected sessions, while the stats for
> "live" connections are maintained via backend_status.c.  That'd give us *more*
> information for less costs, because we then could see idle/active times for
> individual connections.

That was my original take, but if I remember right, Magnus convinced me that
it would be more useful to have statistics for open sessions as well.
With a connection pool, connections can stay open for a very long time,
and the accuracy and usefulness of the statistics would become questionable.

> That'd likely be more change than what we would want to do at this point in
> the release cycle though. But I think we ought to do something about the
> increased overhead...

If you are talking about the extra GetCurrentTimestamp() call, and my first
suggestion is acceptable, that should be simple.

Yours,
Laurenz Albe




Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

От
Andres Freund
Дата:
Hi,

On 2021-08-17 10:44:51 +0200, Laurenz Albe wrote:
> On Sun, 2021-08-01 at 13:55 -0700, Andres Freund wrote:
> > We maintain last_report as GetCurrentTransactionStopTimestamp(), but then use
> > a separate timestamp in pgstat_send_connstats() to compute the difference from
> > last_report, which is then set to GetCurrentTransactionStopTimestamp()'s
> > return value.
> 
> Makes sense to me.  How about passing "now", which was just initialized from
> GetCurrentTransactionStopTimestamp(), as additional parameter to
> pgstat_send_connstats() and use that value instead of taking the current time?

Yes.


> > I'm also not all that happy with sending yet another UDP packet for this. This
> > doubles the UDP packets to the stats collector in the common cases (unless
> > more than TABSTAT_QUANTUM tables have stats to report, or shared tables have
> > been accessed). We already send plenty of "summary" information via
> > PgStat_MsgTabstat, one of which is sent unconditionally, why do we need a
> > separate message for connection stats?
> 
> Are you suggesting that connection statistics should be shoehorned into
> some other statistics message?  That would reduce the number of UDP packets,
> but it sounds ugly and confusing to me.

That ship already has sailed. Look at struct PgStat_MsgTabstat

typedef struct PgStat_MsgTabstat
{
    PgStat_MsgHdr m_hdr;
    Oid            m_databaseid;
    int            m_nentries;
    int            m_xact_commit;
    int            m_xact_rollback;
    PgStat_Counter m_block_read_time;    /* times in microseconds */
    PgStat_Counter m_block_write_time;
    PgStat_TableEntry m_entry[PGSTAT_NUM_TABENTRIES];
} PgStat_MsgTabstat;

Given that we transport number of commits/commits, block read/write time
adding the time the connection was active/inactive doesn't really seem like it
makes things meaningfully worse?


> > Alternatively we could just not send an update to connection stats every 500ms
> > for every active connection, but only do so at connection end. The database
> > stats would only contain stats for disconnected sessions, while the stats for
> > "live" connections are maintained via backend_status.c.  That'd give us *more*
> > information for less costs, because we then could see idle/active times for
> > individual connections.
> 
> That was my original take, but if I remember right, Magnus convinced me that
> it would be more useful to have statistics for open sessions as well.
> With a connection pool, connections can stay open for a very long time,
> and the accuracy and usefulness of the statistics would become questionable.

That's not a contradiction to what I propose. Having the data available via
backend_status.c allows to sum up the data for active connections and the data
for past connections.

I think it's also just cleaner to not constantly report changing preliminary
data as pgstat_send_connstats() does.


> > That'd likely be more change than what we would want to do at this point in
> > the release cycle though. But I think we ought to do something about the
> > increased overhead...
> 
> If you are talking about the extra GetCurrentTimestamp() call, and my first
> suggestion is acceptable, that should be simple.

Doubling the number of UDP messages in common workloads seems also problematic
enough that it should be addressed for 14. It increases the likelihood of
dropping stats messages on busy systems, which can have downstream impacts.

Greetings,

Andres Freund



Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

От
Laurenz Albe
Дата:
On Tue, 2021-08-17 at 02:14 -0700, Andres Freund wrote:
> On 2021-08-17 10:44:51 +0200, Laurenz Albe wrote:
> > On Sun, 2021-08-01 at 13:55 -0700, Andres Freund wrote:
> > > We maintain last_report as GetCurrentTransactionStopTimestamp(), but then use
> > > a separate timestamp in pgstat_send_connstats() to compute the difference from
> > > last_report, which is then set to GetCurrentTransactionStopTimestamp()'s
> > > return value.
> > 
> > Makes sense to me.  How about passing "now", which was just initialized from
> > GetCurrentTransactionStopTimestamp(), as additional parameter to
> > pgstat_send_connstats() and use that value instead of taking the current time?
> 
> Yes.

Here is a patch for that.

> > > I'm also not all that happy with sending yet another UDP packet for this.
> > 
> > Are you suggesting that connection statistics should be shoehorned into
> > some other statistics message?  That would reduce the number of UDP packets,
> > but it sounds ugly and confusing to me.
> 
> That ship already has sailed. Look at struct PgStat_MsgTabstat
> 
> Given that we transport number of commits/commits, block read/write time
> adding the time the connection was active/inactive doesn't really seem like it
> makes things meaningfully worse?

Point taken.

I looked at the other statistics sent in pgstat_report_stat(), and I see
none that are sent unconditionally.  Are you thinking of this:

    /*
     * Send partial messages.  Make sure that any pending xact commit/abort
     * gets counted, even if there are no table stats to send.
     */
    if (regular_msg.m_nentries > 0 ||
        pgStatXactCommit > 0 || pgStatXactRollback > 0)
        pgstat_send_tabstat(®ular_msg);
    if (shared_msg.m_nentries > 0)
        pgstat_send_tabstat(&shared_msg);

I can't think of a way to hack this up that wouldn't make my stomach turn.

> > > Alternatively we could just not send an update to connection stats every 500ms
> > > for every active connection, but only do so at connection end. The database
> > > stats would only contain stats for disconnected sessions, while the stats for
> > > "live" connections are maintained via backend_status.c.
> > 
> > That was my original take, but if I remember right, Magnus convinced me that
> > it would be more useful to have statistics for open sessions as well.
> > With a connection pool, connections can stay open for a very long time,
> > and the accuracy and usefulness of the statistics would become questionable.
> 
> That's not a contradiction to what I propose. Having the data available via
> backend_status.c allows to sum up the data for active connections and the data
> for past connections.
> 
> I think it's also just cleaner to not constantly report changing preliminary
> data as pgstat_send_connstats() does.

Currently, the data are kept in static variables in the backend process.
That would have to change for such an approach, right?

> Doubling the number of UDP messages in common workloads seems also problematic
> enough that it should be addressed for 14.

Ok, but I don't know how to go about it.

Yours,
Laurenz Albe

Вложения

Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

От
Justin Pryzby
Дата:
On Tue, Aug 17, 2021 at 02:14:20AM -0700, Andres Freund wrote:
> Doubling the number of UDP messages in common workloads seems also problematic
> enough that it should be addressed for 14. It increases the likelihood of
> dropping stats messages on busy systems, which can have downstream impacts.

I think by "common workloads" you mean one with many, shortlived sessions.

That does sounds like a concern, and I added this as an Opened Item.

-- 
Justin



Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

От
Kyotaro Horiguchi
Дата:
At Wed, 18 Aug 2021 05:16:38 +0200, Laurenz Albe <laurenz.albe@cybertec.at> wrote in 
> On Tue, 2021-08-17 at 02:14 -0700, Andres Freund wrote:
> > On 2021-08-17 10:44:51 +0200, Laurenz Albe wrote:
> > > On Sun, 2021-08-01 at 13:55 -0700, Andres Freund wrote:
> > > > We maintain last_report as GetCurrentTransactionStopTimestamp(), but then use
> > > > a separate timestamp in pgstat_send_connstats() to compute the difference from
> > > > last_report, which is then set to GetCurrentTransactionStopTimestamp()'s
> > > > return value.
> > > 
> > > Makes sense to me.  How about passing "now", which was just initialized from
> > > GetCurrentTransactionStopTimestamp(), as additional parameter to
> > > pgstat_send_connstats() and use that value instead of taking the current time?
> > 
> > Yes.
> 
> Here is a patch for that.

FWIW, looks good to me.

> > > > I'm also not all that happy with sending yet another UDP packet for this.
> > > 
> > > Are you suggesting that connection statistics should be shoehorned into
> > > some other statistics message?  That would reduce the number of UDP packets,
> > > but it sounds ugly and confusing to me.
> > 
> > That ship already has sailed. Look at struct PgStat_MsgTabstat
> > 
> > Given that we transport number of commits/commits, block read/write time
> > adding the time the connection was active/inactive doesn't really seem like it
> > makes things meaningfully worse?
> 
> Point taken.
> 
> I looked at the other statistics sent in pgstat_report_stat(), and I see
> none that are sent unconditionally.  Are you thinking of this:

IIUC, that means that pg_stat_report sends at least one
PgStat_MsgTabstat struct for the database stats purpose if any stats
are sent.  So the connection stats can piggy-back on the packet.

>     /*
>      * Send partial messages.  Make sure that any pending xact commit/abort
>      * gets counted, even if there are no table stats to send.
>      */
>     if (regular_msg.m_nentries > 0 ||
>         pgStatXactCommit > 0 || pgStatXactRollback > 0)
>         pgstat_send_tabstat(®ular_msg);
>     if (shared_msg.m_nentries > 0)
>         pgstat_send_tabstat(&shared_msg);
> 
> I can't think of a way to hack this up that wouldn't make my stomach turn.

No need to change the condition. It's sufficient that the connection
stats are sent at the same time with transaction stats are sent.

> > > > Alternatively we could just not send an update to connection stats every 500ms
> > > > for every active connection, but only do so at connection end. The database
> > > > stats would only contain stats for disconnected sessions, while the stats for
> > > > "live" connections are maintained via backend_status.c.
> > > 
> > > That was my original take, but if I remember right, Magnus convinced me that
> > > it would be more useful to have statistics for open sessions as well.
> > > With a connection pool, connections can stay open for a very long time,
> > > and the accuracy and usefulness of the statistics would become questionable.
> > 
> > That's not a contradiction to what I propose. Having the data available via
> > backend_status.c allows to sum up the data for active connections and the data
> > for past connections.
> > 
> > I think it's also just cleaner to not constantly report changing preliminary
> > data as pgstat_send_connstats() does.
> 
> Currently, the data are kept in static variables in the backend process.
> That would have to change for such an approach, right?

Total session time can be summarized from beentry any time, but I'm
not sure how we can summarize active/idle time.. Anyway it's not
needed if the attached works.

> > Doubling the number of UDP messages in common workloads seems also problematic
> > enough that it should be addressed for 14.
> 
> Ok, but I don't know how to go about it.

The attached is a heavy-WIP on:

- remove redundant gettimeofday().
- avoid sending dedicate UCP packet for connection stats.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index a3c35bdf60..f48ee80fc7 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -329,11 +329,11 @@ static bool pgstat_db_requested(Oid databaseid);
 static PgStat_StatReplSlotEntry *pgstat_get_replslot_entry(NameData name, bool create_it);
 static void pgstat_reset_replslot(PgStat_StatReplSlotEntry *slotstats, TimestampTz ts);
 
-static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
+static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg, bool disconnect,
+                                TimestampTz last_report, TimestampTz now);
 static void pgstat_send_funcstats(void);
 static void pgstat_send_slru(void);
 static HTAB *pgstat_collect_oids(Oid catalogid, AttrNumber anum_oid);
-static void pgstat_send_connstats(bool disconnect, TimestampTz last_report);
 
 static PgStat_TableStatus *get_tabstat_entry(Oid rel_id, bool isshared);
 
@@ -365,7 +365,6 @@ static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
 static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int len);
 static void pgstat_recv_deadlock(PgStat_MsgDeadlock *msg, int len);
 static void pgstat_recv_checksum_failure(PgStat_MsgChecksumFailure *msg, int len);
-static void pgstat_recv_connstat(PgStat_MsgConn *msg, int len);
 static void pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len);
 static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
 
@@ -889,12 +888,6 @@ pgstat_report_stat(bool disconnect)
         !TimestampDifferenceExceeds(last_report, now, PGSTAT_STAT_INTERVAL))
         return;
 
-    /* for backends, send connection statistics */
-    if (MyBackendType == B_BACKEND)
-        pgstat_send_connstats(disconnect, last_report);
-
-    last_report = now;
-
     /*
      * Destroy pgStatTabHash before we start invalidating PgStat_TableEntry
      * entries it points to.  (Should we fail partway through the loop below,
@@ -946,7 +939,7 @@ pgstat_report_stat(bool disconnect)
                    sizeof(PgStat_TableCounts));
             if (++this_msg->m_nentries >= PGSTAT_NUM_TABENTRIES)
             {
-                pgstat_send_tabstat(this_msg);
+                pgstat_send_tabstat(this_msg, disconnect, last_report, now);
                 this_msg->m_nentries = 0;
             }
         }
@@ -962,9 +955,11 @@ pgstat_report_stat(bool disconnect)
      */
     if (regular_msg.m_nentries > 0 ||
         pgStatXactCommit > 0 || pgStatXactRollback > 0)
-        pgstat_send_tabstat(®ular_msg);
+        pgstat_send_tabstat(®ular_msg, disconnect, last_report, now);
     if (shared_msg.m_nentries > 0)
-        pgstat_send_tabstat(&shared_msg);
+        pgstat_send_tabstat(&shared_msg, disconnect, last_report, now);
+
+    last_report = now;
 
     /* Now, send function statistics */
     pgstat_send_funcstats();
@@ -978,9 +973,15 @@ pgstat_report_stat(bool disconnect)
 
 /*
  * Subroutine for pgstat_report_stat: finish and send a tabstat message
+ *
+ *  The last three parameters are required only on normal backends.
+ *    The parameter "disconnect" will be true when the backend exits.
+ *    "last_report" is the last time we were called (0 if never).
+ *    "now" is the current time.
  */
 static void
-pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg)
+pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg, bool disconnect,
+                    TimestampTz last_report, TimestampTz now)
 {
     int            n;
     int            len;
@@ -989,6 +990,42 @@ pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg)
     if (pgStatSock == PGINVALID_SOCKET)
         return;
 
+    /*
+     * for backends, send connection statistics
+     *
+     * This should run no frequently than once per a transaction.
+     */
+    if (OidIsValid(tsmsg->m_databaseid) && MyBackendType == B_BACKEND &&
+        (pgStatXactCommit > 0 || pgStatXactRollback > 0))
+    {
+        long        secs;
+        int            usecs;
+
+        /* session time since the last report */
+        TimestampDifference(
+            ((last_report == 0) ? MyStartTimestamp : last_report), now,
+            &secs, &usecs);
+        tsmsg->m_session_time = secs * 1000000 + usecs;
+        tsmsg->m_disconnect =
+            disconnect ? pgStatSessionEndCause : DISCONNECT_NOT_YET;
+        tsmsg->m_active_time = pgStatActiveTime;
+        tsmsg->m_idle_in_xact_time = pgStatTransactionIdleTime;
+            
+        pgStatActiveTime = 0;
+        pgStatTransactionIdleTime = 0;
+
+        /* report a new session only the first time */
+        tsmsg->m_count = (last_report == 0) ? 1 : 0;
+    }
+    else
+    {
+        tsmsg->m_session_time = 0;
+        tsmsg->m_disconnect = 0;
+        tsmsg->m_active_time = 0;
+        tsmsg->m_idle_in_xact_time = 0;
+        tsmsg->m_count = 0;
+    }
+
     /*
      * Report and reset accumulated xact commit/rollback and I/O timings
      * whenever we send a normal tabstat message
@@ -1378,48 +1415,6 @@ pgstat_drop_relation(Oid relid)
 #endif                            /* NOT_USED */
 
 
-/* ----------
- * pgstat_send_connstats() -
- *
- *    Tell the collector about session statistics.
- *    The parameter "disconnect" will be true when the backend exits.
- *    "last_report" is the last time we were called (0 if never).
- * ----------
- */
-static void
-pgstat_send_connstats(bool disconnect, TimestampTz last_report)
-{
-    PgStat_MsgConn msg;
-    long        secs;
-    int            usecs;
-
-    if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
-        return;
-
-    pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_CONNECTION);
-    msg.m_databaseid = MyDatabaseId;
-
-    /* session time since the last report */
-    TimestampDifference(((last_report == 0) ? MyStartTimestamp : last_report),
-                        GetCurrentTimestamp(),
-                        &secs, &usecs);
-    msg.m_session_time = secs * 1000000 + usecs;
-
-    msg.m_disconnect = disconnect ? pgStatSessionEndCause : DISCONNECT_NOT_YET;
-
-    msg.m_active_time = pgStatActiveTime;
-    pgStatActiveTime = 0;
-
-    msg.m_idle_in_xact_time = pgStatTransactionIdleTime;
-    pgStatTransactionIdleTime = 0;
-
-    /* report a new session only the first time */
-    msg.m_count = (last_report == 0) ? 1 : 0;
-
-    pgstat_send(&msg, sizeof(PgStat_MsgConn));
-}
-
-
 /* ----------
  * pgstat_reset_counters() -
  *
@@ -3459,10 +3454,6 @@ PgstatCollectorMain(int argc, char *argv[])
                     pgstat_recv_replslot(&msg.msg_replslot, len);
                     break;
 
-                case PGSTAT_MTYPE_CONNECTION:
-                    pgstat_recv_connstat(&msg.msg_conn, len);
-                    break;
-
                 default:
                     break;
             }
@@ -4898,6 +4889,27 @@ pgstat_recv_tabstat(PgStat_MsgTabstat *msg, int len)
     dbentry->n_block_read_time += msg->m_block_read_time;
     dbentry->n_block_write_time += msg->m_block_write_time;
 
+    dbentry->n_sessions += msg->m_count;
+    dbentry->total_session_time += msg->m_session_time;
+    dbentry->total_active_time += msg->m_active_time;
+    dbentry->total_idle_in_xact_time += msg->m_idle_in_xact_time;
+    switch (msg->m_disconnect)
+    {
+        case DISCONNECT_NOT_YET:
+        case DISCONNECT_NORMAL:
+            /* we don't collect these */
+            break;
+        case DISCONNECT_CLIENT_EOF:
+            dbentry->n_sessions_abandoned++;
+            break;
+        case DISCONNECT_FATAL:
+            dbentry->n_sessions_fatal++;
+            break;
+        case DISCONNECT_KILLED:
+            dbentry->n_sessions_killed++;
+            break;
+    }
+
     /*
      * Process all table entries in the message.
      */
@@ -5557,41 +5569,6 @@ pgstat_recv_replslot(PgStat_MsgReplSlot *msg, int len)
     }
 }
 
-/* ----------
- * pgstat_recv_connstat() -
- *
- *  Process connection information.
- * ----------
- */
-static void
-pgstat_recv_connstat(PgStat_MsgConn *msg, int len)
-{
-    PgStat_StatDBEntry *dbentry;
-
-    dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
-
-    dbentry->n_sessions += msg->m_count;
-    dbentry->total_session_time += msg->m_session_time;
-    dbentry->total_active_time += msg->m_active_time;
-    dbentry->total_idle_in_xact_time += msg->m_idle_in_xact_time;
-    switch (msg->m_disconnect)
-    {
-        case DISCONNECT_NOT_YET:
-        case DISCONNECT_NORMAL:
-            /* we don't collect these */
-            break;
-        case DISCONNECT_CLIENT_EOF:
-            dbentry->n_sessions_abandoned++;
-            break;
-        case DISCONNECT_FATAL:
-            dbentry->n_sessions_fatal++;
-            break;
-        case DISCONNECT_KILLED:
-            dbentry->n_sessions_killed++;
-            break;
-    }
-}
-
 /* ----------
  * pgstat_recv_tempfile() -
  *
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 509849c7ff..5364943b55 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -81,7 +81,6 @@ typedef enum StatMsgType
     PGSTAT_MTYPE_DEADLOCK,
     PGSTAT_MTYPE_CHECKSUMFAILURE,
     PGSTAT_MTYPE_REPLSLOT,
-    PGSTAT_MTYPE_CONNECTION,
 } StatMsgType;
 
 /* ----------
@@ -289,6 +288,11 @@ typedef struct PgStat_MsgTabstat
     int            m_nentries;
     int            m_xact_commit;
     int            m_xact_rollback;
+    PgStat_Counter m_count;
+    PgStat_Counter m_session_time;
+    PgStat_Counter m_active_time;
+    PgStat_Counter m_idle_in_xact_time;
+    SessionEndType m_disconnect;
     PgStat_Counter m_block_read_time;    /* times in microseconds */
     PgStat_Counter m_block_write_time;
     PgStat_TableEntry m_entry[PGSTAT_NUM_TABENTRIES];
@@ -652,22 +656,6 @@ typedef struct PgStat_MsgChecksumFailure
     TimestampTz m_failure_time;
 } PgStat_MsgChecksumFailure;
 
-/* ----------
- * PgStat_MsgConn            Sent by the backend to update connection statistics.
- * ----------
- */
-typedef struct PgStat_MsgConn
-{
-    PgStat_MsgHdr m_hdr;
-    Oid            m_databaseid;
-    PgStat_Counter m_count;
-    PgStat_Counter m_session_time;
-    PgStat_Counter m_active_time;
-    PgStat_Counter m_idle_in_xact_time;
-    SessionEndType m_disconnect;
-} PgStat_MsgConn;
-
-
 /* ----------
  * PgStat_Msg                    Union over all possible messages.
  * ----------
@@ -700,7 +688,6 @@ typedef union PgStat_Msg
     PgStat_MsgTempFile msg_tempfile;
     PgStat_MsgChecksumFailure msg_checksumfailure;
     PgStat_MsgReplSlot msg_replslot;
-    PgStat_MsgConn msg_conn;
 } PgStat_Msg;
 


Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

От
Laurenz Albe
Дата:
On Tue, 2021-08-24 at 15:12 +0900, Kyotaro Horiguchi wrote:
> At Wed, 18 Aug 2021 05:16:38 +0200, Laurenz Albe <laurenz.albe@cybertec.at> wrote in 
> > On Tue, 2021-08-17 at 02:14 -0700, Andres Freund wrote:
> > > > > I'm also not all that happy with sending yet another UDP packet for this.
> > > > 
> > > > Are you suggesting that connection statistics should be shoehorned into
> > > > some other statistics message?  That would reduce the number of UDP packets,
> > > > but it sounds ugly and confusing to me.
> > > 
> > > That ship already has sailed. Look at struct PgStat_MsgTabstat
> > > 
> > > Given that we transport number of commits/commits, block read/write time
> > > adding the time the connection was active/inactive doesn't really seem like it
> > > makes things meaningfully worse?
> > 
> > Point taken.
> > 
> > I looked at the other statistics sent in pgstat_report_stat(), and I see
> > none that are sent unconditionally.
> 
> IIUC, that means that pg_stat_report sends at least one
> PgStat_MsgTabstat struct for the database stats purpose if any stats
> are sent.  So the connection stats can piggy-back on the packet.
> 
> No need to change the condition. It's sufficient that the connection
> stats are sent at the same time with transaction stats are sent.

> > > Doubling the number of UDP messages in common workloads seems also problematicenough that it should be addressed
for14.
 
> > 
> > Ok, but I don't know how to go about it.
> 
> The attached is a heavy-WIP on:
> 
> - remove redundant gettimeofday().
> - avoid sending dedicate UCP packet for connection stats.

Thank you.

Perhaps I misread that, but doesn't that mean that the session statistics
could be sent several times?  "pgstat_send_tabstat()" could be called more than
once, right?

Yours,
Laurenz Albe




Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

От
Kyotaro Horiguchi
Дата:
At Tue, 24 Aug 2021 12:34:25 +0200, Laurenz Albe <laurenz.albe@cybertec.at> wrote in 
> On Tue, 2021-08-24 at 15:12 +0900, Kyotaro Horiguchi wrote:
> > 
> > - remove redundant gettimeofday().
> > - avoid sending dedicate UCP packet for connection stats.
> 
> Thank you.
> 
> Perhaps I misread that, but doesn't that mean that the session statistics
> could be sent several times?  "pgstat_send_tabstat()" could be called more than
> once, right?

Yes, it can be called two or more times for a call to
pgstat_report_stat, but the patch causes only the first of them convey
effective connection stats numbers. This is the same way as how
transaction stats are treated.  If no table activities have taken
place at all, pgStatXactCommit/Rollback are not consumed then the
following condition:

>    /*
>     * Send partial messages.  Make sure that any pending xact commit/abort
>     * gets counted, even if there are no table stats to send.
>     */
>    if (regular_msg.m_nentries > 0 ||
>        pgStatXactCommit > 0 || pgStatXactRollback > 0)

leads to a call to pgstat_send_tabstat() and it sends a tabstat
message without a table stats, which is a "partial message".

In this logic the condition term (pgStatXactCommit > 0 ||
pgStatXactRollback > 0) acts as a single-shot trigger.

So we can piggy-back on the condition to send something only once.

The patch sets "0" = (DISCONNECT_NOT_YET) to m_disconnect as the
"not-effective" number, but it would be better to add
DISCONNECT_something to express that state.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

От
Kyotaro Horiguchi
Дата:
At Wed, 25 Aug 2021 12:51:58 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> On Wed, Aug 25, 2021 at 10:12:41AM +0900, Kyotaro Horiguchi wrote:
> > Yes, it can be called two or more times for a call to
> > pgstat_report_stat, but the patch causes only the first of them convey
> > effective connection stats numbers. This is the same way as how
> > transaction stats are treated.  If no table activities have taken
> > place at all, pgStatXactCommit/Rollback are not consumed then the
> > following condition:
> 
> I was looking at this WIP patch, and plugging in the connection
> statistics to the table-access statistics looks like the wrong
> abstraction to me.  I find much cleaner the approach of HEAD to use a
> separate API to report this information, as of
> pgstat_send_connstats().
> 
> As of the two problems discussed on this thread, aka the increased
> number of UDP packages and the extra timestamp computations, it seems
> to me that we had better combine the following ideas for HEAD and 14,
> for now:
> - Avoid the extra timestamp computation as proposed by Laurenz in [1]
> - Throttle the frequency where the connection stat packages are sent,
> as of [2].
> 
> Magnus, this open item is assigned to you as the committer of
> 960869d.  Could you comment on those issues?
> 
> [1]: https://www.postgresql.org/message-id/4095ceb328780d1bdba77ac6d9785fd7577ed047.camel@cybertec.at
> [2]: https://www.postgresql.org/message-id/20210801205501.nyxzxoelqoo4x2qc@alap3.anarazel.de

About [2], we need to maintain session active/idel times on additional
menbers in backend status.  Letting gpgstat_report_activity to
directly scribble on backend status array would work?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

От
Kyotaro Horiguchi
Дата:
At Wed, 25 Aug 2021 13:21:52 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> At Wed, 25 Aug 2021 12:51:58 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> > - Throttle the frequency where the connection stat packages are sent,
> > as of [2].
> > 
> > Magnus, this open item is assigned to you as the committer of
> > 960869d.  Could you comment on those issues?
> > 
> > [1]: https://www.postgresql.org/message-id/4095ceb328780d1bdba77ac6d9785fd7577ed047.camel@cybertec.at
> > [2]: https://www.postgresql.org/message-id/20210801205501.nyxzxoelqoo4x2qc@alap3.anarazel.de
> 
> About [2], we need to maintain session active/idel times on additional
> menbers in backend status.  Letting gpgstat_report_activity to
> directly scribble on backend status array would work?

So the attached is roughly that (just a PoC, of course).

- accumulate active and idle time on backend status array.
  (pgstat_report_activity)

- pgstat_get_db_session_time() and friends read pgstat file then sum
  up relevant members in backend status array. (So it scans on the
  array for every number of every database X().

- The function pgstat_send_connstats is called only at the end of a
  connection.  It reads backend status element then send the numbers
  to stats collector.  pgstat_shutdown_hook needs to be moved from
  on_shmem_exit to before_shmem_exit to read MyBEEntry.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 7fcc3f6ded..47973f1e30 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -244,8 +244,6 @@ static int    pgStatXactCommit = 0;
 static int    pgStatXactRollback = 0;
 PgStat_Counter pgStatBlockReadTime = 0;
 PgStat_Counter pgStatBlockWriteTime = 0;
-PgStat_Counter pgStatActiveTime = 0;
-PgStat_Counter pgStatTransactionIdleTime = 0;
 SessionEndType pgStatSessionEndCause = DISCONNECT_NORMAL;
 
 /* Record that's written to 2PC state file when pgstat state is persisted */
@@ -323,7 +321,7 @@ static void pgstat_send_tabstat(PgStat_MsgTabstat *tsmsg);
 static void pgstat_send_funcstats(void);
 static void pgstat_send_slru(void);
 static HTAB *pgstat_collect_oids(Oid catalogid, AttrNumber anum_oid);
-static void pgstat_send_connstats(bool disconnect, TimestampTz last_report);
+static void pgstat_send_connstats(TimestampTz now);
 
 static PgStat_TableStatus *get_tabstat_entry(Oid rel_id, bool isshared);
 
@@ -876,10 +874,8 @@ pgstat_report_stat(bool disconnect)
         return;
 
     /* for backends, send connection statistics */
-    if (MyBackendType == B_BACKEND)
-        pgstat_send_connstats(disconnect, last_report);
-
-    last_report = now;
+    if (MyBackendType == B_BACKEND && disconnect)
+        pgstat_send_connstats(now);
 
     /*
      * Destroy pgStatTabHash before we start invalidating PgStat_TableEntry
@@ -1368,39 +1364,41 @@ pgstat_drop_relation(Oid relid)
  * pgstat_send_connstats() -
  *
  *    Tell the collector about session statistics.
- *    The parameter "disconnect" will be true when the backend exits.
- *    "last_report" is the last time we were called (0 if never).
  * ----------
  */
 static void
-pgstat_send_connstats(bool disconnect, TimestampTz last_report)
+pgstat_send_connstats(TimestampTz now)
 {
     PgStat_MsgConn msg;
-    long        secs;
-    int            usecs;
+    volatile PgBackendStatus *beentry = MyBEEntry;
 
     if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
         return;
 
     pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_CONNECTION);
     msg.m_databaseid = MyDatabaseId;
+    msg.m_disconnect = pgStatSessionEndCause;
 
-    /* session time since the last report */
-    TimestampDifference(((last_report == 0) ? MyStartTimestamp : last_report),
-                        GetCurrentTimestamp(),
-                        &secs, &usecs);
-    msg.m_session_time = secs * 1000000 + usecs;
+    msg.m_count = 1;
 
-    msg.m_disconnect = disconnect ? pgStatSessionEndCause : DISCONNECT_NOT_YET;
+    if (pgstat_track_activities && beentry)
+    {
+        msg.m_session_time =
+            (PgStat_Counter)(now - beentry->st_proc_start_timestamp) * 1000;
+        msg.m_active_time = beentry->st_session_active_time;
+        msg.m_idle_in_xact_time = beentry->st_session_idle_time;
 
-    msg.m_active_time = pgStatActiveTime;
-    pgStatActiveTime = 0;
-
-    msg.m_idle_in_xact_time = pgStatTransactionIdleTime;
-    pgStatTransactionIdleTime = 0;
-
-    /* report a new session only the first time */
-    msg.m_count = (last_report == 0) ? 1 : 0;
+        PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+        beentry->st_session_active_time = 0;
+        beentry->st_session_idle_time = 0;
+        PGSTAT_END_WRITE_ACTIVITY(beentry);
+    }
+    else
+    {
+        msg.m_session_time = 0;
+        msg.m_active_time = 0;
+        msg.m_idle_in_xact_time = 0;
+    }
 
     pgstat_send(&msg, sizeof(PgStat_MsgConn));
 }
@@ -2877,7 +2875,7 @@ pgstat_initialize(void)
     prevWalUsage = pgWalUsage;
 
     /* Set up a process-exit hook to clean up */
-    on_shmem_exit(pgstat_shutdown_hook, 0);
+    before_shmem_exit(pgstat_shutdown_hook, 0);
 }
 
 /* ------------------------------------------------------------
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index e19c4506ef..0f0eecc3b4 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -336,6 +336,8 @@ pgstat_bestart(void)
     lbeentry.st_activity_start_timestamp = 0;
     lbeentry.st_state_start_timestamp = 0;
     lbeentry.st_xact_start_timestamp = 0;
+    lbeentry.st_session_active_time = 0;
+    lbeentry.st_session_idle_time = 0;
     lbeentry.st_databaseid = MyDatabaseId;
 
     /* We have userid for client-backends, wal-sender and bgworker processes */
@@ -590,9 +592,9 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
 
         if (beentry->st_state == STATE_RUNNING ||
             beentry->st_state == STATE_FASTPATH)
-            pgstat_count_conn_active_time(secs * 1000000 + usecs);
+            beentry->st_session_active_time += secs * 1000000 + usecs;
         else
-            pgstat_count_conn_txn_idle_time(secs * 1000000 + usecs);
+            beentry->st_session_idle_time += secs * 1000000 + usecs;
     }
 
     /*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index f0e09eae4d..62b60077c1 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1635,6 +1635,56 @@ pg_stat_get_db_blk_write_time(PG_FUNCTION_ARGS)
     PG_RETURN_FLOAT8(result);
 }
 
+/* XXXXX */
+static double
+_pg_stat_active_session_stats(Oid dbid, int type)
+{
+    int            num_backends = pgstat_fetch_stat_numbackends();
+    int            curr_backend;
+    TimestampTz    now = GetCurrentTimestamp();
+    double        result = 0.0;
+    int            nsessions = 0;
+
+
+    /* Add session time of active backends */
+    /* 1-based index */
+    for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
+    {
+        LocalPgBackendStatus *local_beentry;
+        PgBackendStatus *beentry;
+
+        local_beentry = pgstat_fetch_stat_local_beentry(curr_backend);
+        if (!local_beentry)
+            continue;
+
+        beentry = &local_beentry->backendStatus;
+
+        if (beentry->st_databaseid != dbid)
+            continue;
+
+        switch (type) /* define enum !*/
+        {
+            case 0: /* SESSION TIME */
+                result +=
+                    (double)(now - beentry->st_proc_start_timestamp) / 1000;
+                break;
+            case 1: /* SESSION ACTIVE TIME */
+                result += (double)beentry->st_session_active_time / 1000;
+                break;
+            case 2: /* SESSION IDLE TIME */
+                result += (double)beentry->st_session_idle_time / 1000;
+                break;
+            case 3: /* SESSION NUMBER */
+                nsessions++;
+        }
+    }
+
+    if (type < 3)
+        return result;
+
+    return (double)nsessions;
+}
+
 Datum
 pg_stat_get_db_session_time(PG_FUNCTION_ARGS)
 {
@@ -1646,6 +1696,8 @@ pg_stat_get_db_session_time(PG_FUNCTION_ARGS)
     if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) != NULL)
         result = ((double) dbentry->total_session_time) / 1000.0;
 
+    result += _pg_stat_active_session_stats(dbid, 0);
+
     PG_RETURN_FLOAT8(result);
 }
 
@@ -1660,6 +1712,8 @@ pg_stat_get_db_active_time(PG_FUNCTION_ARGS)
     if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) != NULL)
         result = ((double) dbentry->total_active_time) / 1000.0;
 
+    result += _pg_stat_active_session_stats(dbid, 1);
+
     PG_RETURN_FLOAT8(result);
 }
 
@@ -1674,6 +1728,8 @@ pg_stat_get_db_idle_in_transaction_time(PG_FUNCTION_ARGS)
     if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) != NULL)
         result = ((double) dbentry->total_idle_in_xact_time) / 1000.0;
 
+    result += _pg_stat_active_session_stats(dbid, 2);
+
     PG_RETURN_FLOAT8(result);
 }
 
@@ -1687,6 +1743,7 @@ pg_stat_get_db_sessions(PG_FUNCTION_ARGS)
     if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) != NULL)
         result = (int64) (dbentry->n_sessions);
 
+    result += (int64)_pg_stat_active_session_stats(dbid, 3);
     PG_RETURN_INT64(result);
 }
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index f779b48b8c..9311111690 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -931,13 +931,6 @@ extern PgStat_MsgWal WalStats;
 extern PgStat_Counter pgStatBlockReadTime;
 extern PgStat_Counter pgStatBlockWriteTime;
 
-/*
- * Updated by pgstat_count_conn_*_time macros, called by
- * pgstat_report_activity().
- */
-extern PgStat_Counter pgStatActiveTime;
-extern PgStat_Counter pgStatTransactionIdleTime;
-
 
 /*
  * Updated by the traffic cop and in errfinish()
@@ -1039,10 +1032,6 @@ extern void pgstat_initstats(Relation rel);
     (pgStatBlockReadTime += (n))
 #define pgstat_count_buffer_write_time(n)                            \
     (pgStatBlockWriteTime += (n))
-#define pgstat_count_conn_active_time(n)                            \
-    (pgStatActiveTime += (n))
-#define pgstat_count_conn_txn_idle_time(n)                            \
-    (pgStatTransactionIdleTime += (n))
 
 extern void pgstat_count_heap_insert(Relation rel, PgStat_Counter n);
 extern void pgstat_count_heap_update(Relation rel, bool hot);
diff --git a/src/include/utils/backend_status.h b/src/include/utils/backend_status.h
index 8042b817df..ebd5755247 100644
--- a/src/include/utils/backend_status.h
+++ b/src/include/utils/backend_status.h
@@ -124,6 +124,10 @@ typedef struct PgBackendStatus
     TimestampTz st_activity_start_timestamp;
     TimestampTz st_state_start_timestamp;
 
+    /* Session active/idle time in microsecnods */
+    int64        st_session_active_time;
+    int64        st_session_idle_time;
+
     /* Database OID, owning user's OID, connection client address */
     Oid            st_databaseid;
     Oid            st_userid;

Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

От
Andres Freund
Дата:
Hi,

On 2021-08-20 14:27:20 -0500, Justin Pryzby wrote:
> On Tue, Aug 17, 2021 at 02:14:20AM -0700, Andres Freund wrote:
> > Doubling the number of UDP messages in common workloads seems also problematic
> > enough that it should be addressed for 14. It increases the likelihood of
> > dropping stats messages on busy systems, which can have downstream impacts.
> 
> I think by "common workloads" you mean one with many, shortlived sessions.

You don't need short-lived sessions. You just need sessions that don't
process queries all the time (so that there's only one or a few queries
within each PGSTAT_STAT_INTERVAL). The connection stats aren't sent once
per session, they're sent once per PGSTAT_STAT_INTERVAL.

Greetings,

Andres Freund



Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

От
Andres Freund
Дата:
Hi,

On 2021-08-25 12:51:58 +0900, Michael Paquier wrote:
> I was looking at this WIP patch, and plugging in the connection
> statistics to the table-access statistics looks like the wrong
> abstraction to me.  I find much cleaner the approach of HEAD to use a
> separate API to report this information, as of
> pgstat_send_connstats().

As I said before, this ship has long sailed:

typedef struct PgStat_MsgTabstat
{
    PgStat_MsgHdr m_hdr;
    Oid            m_databaseid;
    int            m_nentries;
    int            m_xact_commit;
    int            m_xact_rollback;
    PgStat_Counter m_block_read_time;    /* times in microseconds */
    PgStat_Counter m_block_write_time;
    PgStat_TableEntry m_entry[PGSTAT_NUM_TABENTRIES];
} PgStat_MsgTabstat;


> As of the two problems discussed on this thread, aka the increased
> number of UDP packages and the extra timestamp computations, it seems
> to me that we had better combine the following ideas for HEAD and 14,
> for now:
> - Avoid the extra timestamp computation as proposed by Laurenz in [1]
> - Throttle the frequency where the connection stat packages are sent,
> as of [2].

I think in that case we'd have to do the bigger redesign and move "live"
connection stats to backend_status.c...

Greetings,

Andres Freund



Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

От
Michael Paquier
Дата:
On Wed, Aug 25, 2021 at 01:20:03AM -0700, Andres Freund wrote:
> On 2021-08-25 12:51:58 +0900, Michael Paquier wrote:
> As I said before, this ship has long sailed:
>
> typedef struct PgStat_MsgTabstat
> {
>     PgStat_MsgHdr m_hdr;
>     Oid            m_databaseid;
>     int            m_nentries;
>     int            m_xact_commit;
>     int            m_xact_rollback;
>     PgStat_Counter m_block_read_time;    /* times in microseconds */
>     PgStat_Counter m_block_write_time;
>     PgStat_TableEntry m_entry[PGSTAT_NUM_TABENTRIES];
> } PgStat_MsgTabstat;

Well, I kind of misread what you meant upthread then.
PgStat_MsgTabstat has a name a bit misleading, especially if you
assign connection stats to it.

>> As of the two problems discussed on this thread, aka the increased
>> number of UDP packages and the extra timestamp computations, it seems
>> to me that we had better combine the following ideas for HEAD and 14,
>> for now:
>> - Avoid the extra timestamp computation as proposed by Laurenz in [1]
>> - Throttle the frequency where the connection stat packages are sent,
>> as of [2].
>
> I think in that case we'd have to do the bigger redesign and move "live"
> connection stats to backend_status.c...

Hmm.  A redesign is not really an option for 14 at this stage.  And I
am not really comfortable with the latest proposal from upthread to
plug in that to pgstat_send_tabstat to report things once per
transaction, either.  It really looks like this needs more thoughts,
and it would mean that a revert may be the most appropriate choice
for the moment.  That's the last-resort option, surely, but we are
post-beta3 so there is no much margin left.
--
Michael

Вложения

Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

От
Andres Freund
Дата:
Hi,

On August 31, 2021 6:33:15 PM PDT, Michael Paquier <michael@paquier.xyz> wrote:
>On Tue, Aug 31, 2021 at 04:55:35AM +0200, Laurenz Albe wrote:
>> In the view of that, how about doubling PGSTAT_STAT_INTERVAL to 1000
>> milliseconds?  That would mean slightly less up-to-date statistics, but
>> I doubt that that will be a problem.  And it should even out the increase
>> in statistics messages, except in the case of lots of short-lived
>> sessions.  But in that scenario you cannot have session statistics
>> without lots of extra messages, and such a workload has enough performance
>> problems as it is, so I don't think we have to specifically worry about it.
>
>Perhaps we could do that.  Now, increasing an interval for the sake of
>balancing the extra load created by a feature while impacting the
>whole set of stats is not really appealing.

I think it's not helpful. Still increases the number of messages substantially in workloads with a lot of connections
doingoccasional queries. Which is common. 

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

От
Laurenz Albe
Дата:
On Tue, 2021-08-31 at 18:55 -0700, Andres Freund wrote:
> > > On Tue, Aug 31, 2021 at 04:55:35AM +0200, Laurenz Albe wrote:In the view of that, how about doubling
PGSTAT_STAT_INTERVALto 1000
 
> > 
> > > milliseconds?  That would mean slightly less up-to-date statistics, but
> > > I doubt that that will be a problem.
>
> I think it's not helpful. Still increases the number of messages substantially in workloads
> with a lot of connections doing occasional queries. Which is common.

How come?  If originally you send table statistics every 500ms, and now you send
table statistics and session statistics every second, that should amount to the
same thing.  Where is my misunderstanding?

Yours,
Laurenz Albe




Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

От
Laurenz Albe
Дата:
On Wed, 2021-09-01 at 10:33 +0900, Michael Paquier wrote:
> On Tue, Aug 31, 2021 at 04:55:35AM +0200, Laurenz Albe wrote:
> > In the view of that, how about doubling PGSTAT_STAT_INTERVAL to 1000
> > milliseconds?
> 
> Perhaps we could do that.  Now, increasing an interval for the sake of
> balancing the extra load created by a feature while impacting the
> whole set of stats is not really appealing.

I agree.  But if the best fix is too invasive at this point, the
alternatives are reverting the patch or choosing a less appealing
solution.

Yours,
Laurenz Albe




Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

От
Andres Freund
Дата:
On 2021-09-01 05:39:14 +0200, Laurenz Albe wrote:
> On Tue, 2021-08-31 at 18:55 -0700, Andres Freund wrote:
> > > > On Tue, Aug 31, 2021 at 04:55:35AM +0200, Laurenz Albe wrote:In the view of that, how about doubling
PGSTAT_STAT_INTERVALto 1000
 
> > > 
> > > > milliseconds?  That would mean slightly less up-to-date statistics, but
> > > > I doubt that that will be a problem.
> >
> > I think it's not helpful. Still increases the number of messages substantially in workloads
> > with a lot of connections doing occasional queries. Which is common.
> 
> How come?  If originally you send table statistics every 500ms, and now you send
> table statistics and session statistics every second, that should amount to the
> same thing.  Where is my misunderstanding?

Consider the case of one query a second.



Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

От
Andres Freund
Дата:
On 2021-08-27 13:57:45 +0900, Michael Paquier wrote:
> On Wed, Aug 25, 2021 at 01:20:03AM -0700, Andres Freund wrote:
> > On 2021-08-25 12:51:58 +0900, Michael Paquier wrote:
> > As I said before, this ship has long sailed:
> > 
> > typedef struct PgStat_MsgTabstat
> > {
> >     PgStat_MsgHdr m_hdr;
> >     Oid            m_databaseid;
> >     int            m_nentries;
> >     int            m_xact_commit;
> >     int            m_xact_rollback;
> >     PgStat_Counter m_block_read_time;    /* times in microseconds */
> >     PgStat_Counter m_block_write_time;
> >     PgStat_TableEntry m_entry[PGSTAT_NUM_TABENTRIES];
> > } PgStat_MsgTabstat;
> 
> Well, I kind of misread what you meant upthread then.
> PgStat_MsgTabstat has a name a bit misleading, especially if you
> assign connection stats to it.

ISTM we should just do this fairly obvious change. Given that we already
transport commit / rollback / IO stats, I don't see why the connection stats
change anything to a meaningful degree. I'm fairly baffled why that's not the
obvious thing to do for v14.



Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

От
Laurenz Albe
Дата:
On Tue, 2021-08-31 at 21:16 -0700, Andres Freund wrote:
> On 2021-09-01 05:39:14 +0200, Laurenz Albe wrote:
> > On Tue, 2021-08-31 at 18:55 -0700, Andres Freund wrote:
> > > > > On Tue, Aug 31, 2021 at 04:55:35AM +0200, Laurenz Albe wrote:In the
> > > > > view of that, how about doubling PGSTAT_STAT_INTERVAL to 1000
> > > > > milliseconds?  That would mean slightly less up-to-date statistics, but
> > > > > I doubt that that will be a problem.
> > > 
> > > I think it's not helpful. Still increases the number of messages substantially in workloads
> > > with a lot of connections doing occasional queries. Which is common.
> > 
> > How come?  If originally you send table statistics every 500ms, and now you send
> > table statistics and session statistics every second, that should amount to the
> > same thing.  Where is my misunderstanding?
> 
> Consider the case of one query a second.

I guess I am too stupid.  I don't see it.

Yours,
Laurenz Albe




Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

От
Laurenz Albe
Дата:
On Wed, 2021-09-01 at 10:56 +0200, Laurenz Albe wrote:
> On Tue, 2021-08-31 at 21:16 -0700, Andres Freund wrote:
> > On 2021-09-01 05:39:14 +0200, Laurenz Albe wrote:
> > > On Tue, 2021-08-31 at 18:55 -0700, Andres Freund wrote:
> > > > > > On Tue, Aug 31, 2021 at 04:55:35AM +0200, Laurenz Albe wrote:In the
> > > > > > view of that, how about doubling PGSTAT_STAT_INTERVAL to 1000
> > > > > > milliseconds?  That would mean slightly less up-to-date statistics, but
> > > > > > I doubt that that will be a problem.
> > > > 
> > > > I think it's not helpful. Still increases the number of messages substantially in workloads
> > > > with a lot of connections doing occasional queries. Which is common.
> > > 
> > > How come?  If originally you send table statistics every 500ms, and now you send
> > > table statistics and session statistics every second, that should amount to the
> > > same thing.  Where is my misunderstanding?
> > 
> > Consider the case of one query a second.
> 
> I guess I am too stupid.  I don't see it.

Finally got it.  That would send a message every second, and with connection statistics,
twice as many.

Here is my next suggestion for a band-aid to mitigate this problem:
Introduce a second, much longer interval for reporting session statistics.

--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -77,6 +77,8 @@
 #define PGSTAT_STAT_INTERVAL    500 /* Minimum time between stats file
                                      * updates; in milliseconds. */
 
+#define PGSTAT_CONSTAT_INTERVAL    60000    /* interval to report connection statistics */
+
 #define PGSTAT_RETRY_DELAY        10    /* How long to wait between checks for a
                                      * new file; in milliseconds. */
 
@@ -889,8 +891,13 @@ pgstat_report_stat(bool disconnect)
         !TimestampDifferenceExceeds(last_report, now, PGSTAT_STAT_INTERVAL))
         return;
 
-    /* for backends, send connection statistics */
-    if (MyBackendType == B_BACKEND)
+    /*
+     * For backends, send connection statistics, but only every
+     * PGSTAT_CONSTAT_INTERVAL or when the backend terminates.
+     */
+    if (MyBackendType == B_BACKEND &&
+        (TimestampDifferenceExceeds(last_report, now, PGSTAT_CONSTAT_INTERVAL) ||
+         disconnect))
         pgstat_send_connstats(disconnect, last_report, now);
 
     last_report = now;

That should keep the extra load moderate, except for workloads with lots of tiny connections
(for which this may be the least of their problems).

Yours,
Laurenz Albe




Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

От
Laurenz Albe
Дата:
On Fri, 2021-09-03 at 17:04 -0700, Andres Freund wrote:
> Hi,
> 
> On 2021-08-31 21:56:50 -0700, Andres Freund wrote:
> > On 2021-08-27 13:57:45 +0900, Michael Paquier wrote:
> > > On Wed, Aug 25, 2021 at 01:20:03AM -0700, Andres Freund wrote:
> > > > On 2021-08-25 12:51:58 +0900, Michael Paquier wrote:
> > > > As I said before, this ship has long sailed:
> > > > 
> > > > typedef struct PgStat_MsgTabstat
> > > > {
> > > >         PgStat_MsgHdr m_hdr;
> > > >         Oid                     m_databaseid;
> > > >         int                     m_nentries;
> > > >         int                     m_xact_commit;
> > > >         int                     m_xact_rollback;
> > > >         PgStat_Counter m_block_read_time;       /* times in microseconds */
> > > >         PgStat_Counter m_block_write_time;
> > > >         PgStat_TableEntry m_entry[PGSTAT_NUM_TABENTRIES];
> > > > } PgStat_MsgTabstat;
> > > 
> > > Well, I kind of misread what you meant upthread then.
> > > PgStat_MsgTabstat has a name a bit misleading, especially if you
> > > assign connection stats to it.
> > 
> > ISTM we should just do this fairly obvious change. Given that we already
> > transport commit / rollback / IO stats, I don't see why the connection stats
> > change anything to a meaningful degree. I'm fairly baffled why that's not the
> > obvious thing to do for v14.
> 
> Here's how I think that would look like.

Thank you!

> While writing up this draft, I found
> two more issues:
> 
> - On windows / 32 bit systems, the session time would overflow if idle for
>   longer than ~4300s. long is only 32 bit. Easy to fix obviously.

Oops, yes.  Thanks for spotting that.

> - Right now walsenders, including database connected walsenders, are not
>   reported in connection stats. That doesn't seem quite right to me.

I think that walsenders not only use a different protocol, but often have
session characteristics quite different from normal backends.
For example, they are always "active", even when they are doing nothing.
Therefore, I think it is confusing to report them together with normal
database sessions.

If at all, walsender statistics should be reported separately.

> In the patch I made the message for connecting an explicitly reported message,
> that seems cleaner, because it then happens at a clearly defined point. I
> didn't do the same for disconnecting, but perhaps that would be better? Then
> we could get rid of the whole pgStatSessionEndCause variable.

I see your point, but I am not certain if it is worth adding yet another message
for a small thing like that.  I have no strong opinion on that though.


Reading your patch, I am still confused about the following:
There are potentially several calls to "pgstat_send_tabstat" in "pgstat_report_stat".
It seems to me that if it were called more than once, session statistics would
be reported and counted several times, which would be wrong.

Yours,
Laurenz Albe




Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

От
Laurenz Albe
Дата:
On Mon, 2021-09-06 at 00:22 -0700, Andres Freund wrote:
> On 2021-09-06 09:12:58 +0200, Laurenz Albe wrote:
> > Reading your patch, I am still confused about the following:
> > There are potentially several calls to "pgstat_send_tabstat" in "pgstat_report_stat".
> > It seems to me that if it were called more than once, session statistics would
> > be reported and counted several times, which would be wrong.
> 
> The accumulator variables are set to zero after being sent. So no data should
> be sent multiple times as far as I can tell.

Ah, I missed that.  Then I see no problem with that approach.

Yours,
Laurenz Albe




Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

От
Laurenz Albe
Дата:
On Fri, 2021-09-03 at 17:04 -0700, Andres Freund wrote:
> Here's how I think that would look like. While writing up this draft, I found
> two more issues:
> 
> - On windows / 32 bit systems, the session time would overflow if idle for
>   longer than ~4300s. long is only 32 bit. Easy to fix obviously.
> - Right now walsenders, including database connected walsenders, are not
>   reported in connection stats. That doesn't seem quite right to me.
> 
> In the patch I made the message for connecting an explicitly reported message,
> that seems cleaner, because it then happens at a clearly defined point. I
> didn't do the same for disconnecting, but perhaps that would be better? Then
> we could get rid of the whole pgStatSessionEndCause variable.

I have gone over your patch and made the following changes:

- cache the last report time in a static variable pgLastSessionReportTime
- add a comment to explain why we only track normal backends
- pgindent
- an attempt at a commit message

Yours,
Laurenz Albe

Вложения

Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

От
Andrew Dunstan
Дата:
On 9/8/21 12:11 AM, Laurenz Albe wrote:
> On Fri, 2021-09-03 at 17:04 -0700, Andres Freund wrote:
>> Here's how I think that would look like. While writing up this draft, I found
>> two more issues:
>>
>> - On windows / 32 bit systems, the session time would overflow if idle for
>>   longer than ~4300s. long is only 32 bit. Easy to fix obviously.
>> - Right now walsenders, including database connected walsenders, are not
>>   reported in connection stats. That doesn't seem quite right to me.
>>
>> In the patch I made the message for connecting an explicitly reported message,
>> that seems cleaner, because it then happens at a clearly defined point. I
>> didn't do the same for disconnecting, but perhaps that would be better? Then
>> we could get rid of the whole pgStatSessionEndCause variable.
> I have gone over your patch and made the following changes:
>
> - cache the last report time in a static variable pgLastSessionReportTime
> - add a comment to explain why we only track normal backends
> - pgindent
> - an attempt at a commit message
>

Hi,


this is an open item for release 14. Is someone going to commit?


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

От
Laurenz Albe
Дата:
On Thu, 2021-09-16 at 02:22 -0700, Andres Freund wrote:
> I pushed this. The only substantive change I made is that I moved the
> MyBackendType == B_BACKEND check into a new pgstat_should_report_connstat(),
> and called that from pgstat_report_connect() and
> pgstat_report_disconnect(). Otherwise we'd report connect/disconnect for
> walsenders, but not the session time, which seems even weirder.

Oh, right, I didn't think of that.

Thanks for all the work you put into this.

> I'm not convinced that ignorign walsenders is a good policy decision, but I
> think that's a fair judgement call, and we can evolve that in future
> releases...

The use case that is on my mind is to figure out the percentage of time
when transactions are idle to be able to estimate the connection pool size.
walsenders, for which you typically don't want to use a connection pool,
would skew the statistics.

Yours,
Laurenz Albe




Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

От
Magnus Hagander
Дата:
On Wed, Sep 15, 2021 at 10:48 PM Andres Freund <andres@anarazel.de> wrote:
>
> On 2021-09-15 10:47:33 -0400, Andrew Dunstan wrote:
> > this is an open item for release 14. Is someone going to commit?
>
> Will do. Although I do wish the original committer would have chimed in at
> some point...

Crap. My apologies for this -- I had somehow managed to press the
"mute" button in gmail on tihs thread when I cleaned things up after
returning from my vacation, so I hadn't spotted anything in it until I
saw your commit happening a short wihle ago. I have no idea how I
managed to do that, but clearly I did. Sorry!

AIUI the commit has taken care of the urgent things for 14, correct? I
see there are some discussions points left and I'll review the thread
and see if I have any input on those as well a bit later, of course,
but just so I'm not missing something else that's urgent?

Again, my apologies for completely missing this!

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

От
Andres Freund
Дата:
Hi,

On 2021-09-16 16:38:04 +0200, Magnus Hagander wrote:
> On Wed, Sep 15, 2021 at 10:48 PM Andres Freund <andres@anarazel.de> wrote:
> >
> > On 2021-09-15 10:47:33 -0400, Andrew Dunstan wrote:
> > > this is an open item for release 14. Is someone going to commit?
> >
> > Will do. Although I do wish the original committer would have chimed in at
> > some point...
> 
> Crap. My apologies for this -- I had somehow managed to press the
> "mute" button in gmail on tihs thread when I cleaned things up after
> returning from my vacation, so I hadn't spotted anything in it until I
> saw your commit happening a short wihle ago. I have no idea how I
> managed to do that, but clearly I did. Sorry!

Hah, that explains that ;)


> AIUI the commit has taken care of the urgent things for 14, correct? I
> see there are some discussions points left and I'll review the thread
> and see if I have any input on those as well a bit later, of course,
> but just so I'm not missing something else that's urgent?

I think so.

Greetings,

Andres Freund