Re: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?
Дата
Msg-id 9765.1421615369@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Better way of dealing with pgstat wait timeout during buildfarm runs?  (Noah Misch <noah@leadboat.com>)
Ответы Re: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?  (Noah Misch <noah@leadboat.com>)
Re: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Noah Misch <noah@leadboat.com> writes:
> On Sat, Dec 27, 2014 at 07:55:05PM -0500, Tom Lane wrote:
>> To get back to that original complaint about buildfarm runs failing,
>> I notice that essentially all of those failures are coming from "wait
>> timeout" warnings reported by manual VACUUM commands.  Now, VACUUM itself
>> has no need to read the stats files.  What's actually causing these
>> messages is failure to get a timely response in pgstat_vacuum_stat().
>> So let me propose a drastic solution: let's dike out this bit in vacuum.c:
>>
>> /*
>> * Send info about dead objects to the statistics collector, unless we are
>> * in autovacuum --- autovacuum.c does this for itself.
>> */
>> if ((vacstmt->options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
>>     pgstat_vacuum_stat();
>>
>> This would have the effect of transferring all responsibility for
>> dead-stats-entry cleanup to autovacuum.  For ordinary users, I think
>> that'd be just fine.  It might be less fine though for people who
>> disable autovacuum, if there still are any.

> Like Robert, I don't care for that.

I obviously failed to make it clear enough that that was meant as a straw
man, lets-think-outside-the-box proposal.

>> Or, much more simply, we could conclude that it's not that important
>> if pgstat_vacuum_stat() is unable to get up-to-date stats, and rejigger
>> the code so that we don't bleat when the file-reading request comes from
>> that source.  This should be a back-patchable fix, whereas #2 above might
>> be a bit too complicated for that.

> This, however, feels like a clear improvement.  When every VACUUM does
> pgstat_vacuum_stat(), it doesn't matter if any given VACUUM misses removing
> entries that became obsolete in the preceding second or so.  In fact, rather
> than merely not bleating when the wait times out, let's not wait at all.  I
> don't favor VACUUM of a small table taking 12s because it spent 2s on the
> table and 10s waiting to garbage collect a piping-hot stats file.

I think that would be going too far: if an autovac wakes up in the dead of
night, it should not use an ancient stats file merely because no one else
has asked for stats recently.  But if it never waits at all, it'll be
at the mercy of whatever is already on disk.

After looking at the code, the minimum-change alternative would be more or
less as attached: first, get rid of the long-obsolete proposition that
autovacuum workers need fresher-than-usual stats; second, allow
pgstat_vacuum_stat to accept stats that are moderately stale (the number
given below allows them to be up to 50 seconds old); and third, suppress
wait-timeout warnings when the call is from pgstat_vacuum_stat.  The third
point is what we need to avoid unnecessary buildfarm failures.  The second
point addresses the idea that we don't need to stress the stats collector
too much for this.

            regards, tom lane

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 0cf4988..b030e1c 100644
*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
*************** static void pgstat_write_statsfiles(bool
*** 259,265 ****
  static void pgstat_write_db_statsfile(PgStat_StatDBEntry *dbentry, bool permanent);
  static HTAB *pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep);
  static void pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent);
! static void backend_read_statsfile(void);
  static void pgstat_read_current_status(void);

  static bool pgstat_write_statsfile_needed(void);
--- 259,265 ----
  static void pgstat_write_db_statsfile(PgStat_StatDBEntry *dbentry, bool permanent);
  static HTAB *pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep);
  static void pgstat_read_db_statsfile(Oid databaseid, HTAB *tabhash, HTAB *funchash, bool permanent);
! static void backend_read_statsfile(bool for_vacuum_stat);
  static void pgstat_read_current_status(void);

  static bool pgstat_write_statsfile_needed(void);
*************** pgstat_vacuum_stat(void)
*** 951,959 ****

      /*
       * If not done for this transaction, read the statistics collector stats
!      * file into some hash tables.
       */
!     backend_read_statsfile();

      /*
       * Read pg_database and make a list of OIDs of all existing databases
--- 951,962 ----

      /*
       * If not done for this transaction, read the statistics collector stats
!      * file into some hash tables.  We are willing to accept stats that are
!      * more stale than usual here.  (Since VACUUM never runs in a transaction
!      * block, this cannot result in accepting stale stats that would be used
!      * for other purposes later in the transaction.)
       */
!     backend_read_statsfile(true);

      /*
       * Read pg_database and make a list of OIDs of all existing databases
*************** pgstat_fetch_stat_dbentry(Oid dbid)
*** 2182,2188 ****
       * If not done for this transaction, read the statistics collector stats
       * file into some hash tables.
       */
!     backend_read_statsfile();

      /*
       * Lookup the requested database; return NULL if not found
--- 2185,2191 ----
       * If not done for this transaction, read the statistics collector stats
       * file into some hash tables.
       */
!     backend_read_statsfile(false);

      /*
       * Lookup the requested database; return NULL if not found
*************** pgstat_fetch_stat_tabentry(Oid relid)
*** 2213,2219 ****
       * If not done for this transaction, read the statistics collector stats
       * file into some hash tables.
       */
!     backend_read_statsfile();

      /*
       * Lookup our database, then look in its table hash table.
--- 2216,2222 ----
       * If not done for this transaction, read the statistics collector stats
       * file into some hash tables.
       */
!     backend_read_statsfile(false);

      /*
       * Lookup our database, then look in its table hash table.
*************** pgstat_fetch_stat_funcentry(Oid func_id)
*** 2265,2271 ****
      PgStat_StatFuncEntry *funcentry = NULL;

      /* load the stats file if needed */
!     backend_read_statsfile();

      /* Lookup our database, then find the requested function.  */
      dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
--- 2268,2274 ----
      PgStat_StatFuncEntry *funcentry = NULL;

      /* load the stats file if needed */
!     backend_read_statsfile(false);

      /* Lookup our database, then find the requested function.  */
      dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
*************** pgstat_fetch_stat_numbackends(void)
*** 2350,2356 ****
  PgStat_ArchiverStats *
  pgstat_fetch_stat_archiver(void)
  {
!     backend_read_statsfile();

      return &archiverStats;
  }
--- 2353,2359 ----
  PgStat_ArchiverStats *
  pgstat_fetch_stat_archiver(void)
  {
!     backend_read_statsfile(false);

      return &archiverStats;
  }
*************** pgstat_fetch_stat_archiver(void)
*** 2367,2373 ****
  PgStat_GlobalStats *
  pgstat_fetch_global(void)
  {
!     backend_read_statsfile();

      return &globalStats;
  }
--- 2370,2376 ----
  PgStat_GlobalStats *
  pgstat_fetch_global(void)
  {
!     backend_read_statsfile(false);

      return &globalStats;
  }
*************** done:
*** 4341,4349 ****
   * If not already done, read the statistics collector stats file into
   * some hash tables.  The results will be kept until pgstat_clear_snapshot()
   * is called (typically, at end of transaction).
   */
  static void
! backend_read_statsfile(void)
  {
      TimestampTz min_ts = 0;
      TimestampTz ref_ts = 0;
--- 4344,4357 ----
   * If not already done, read the statistics collector stats file into
   * some hash tables.  The results will be kept until pgstat_clear_snapshot()
   * is called (typically, at end of transaction).
+  *
+  * If for_vacuum_stat is true, we're doing this for pgstat_vacuum_stat,
+  * which does not really need up-to-date stats: it's only trying to flush
+  * obsolete stats entries.  So we accept older stats than we normally would,
+  * and don't bleat about failure to get fresh stats either.
   */
  static void
! backend_read_statsfile(bool for_vacuum_stat)
  {
      TimestampTz min_ts = 0;
      TimestampTz ref_ts = 0;
*************** backend_read_statsfile(void)
*** 4376,4385 ****
              /*
               * We set the minimum acceptable timestamp to PGSTAT_STAT_INTERVAL
               * msec before now.  This indirectly ensures that the collector
!              * needn't write the file more often than PGSTAT_STAT_INTERVAL. In
!              * an autovacuum worker, however, we want a lower delay to avoid
!              * using stale data, so we use PGSTAT_RETRY_DELAY (since the
!              * number of workers is low, this shouldn't be a problem).
               *
               * We don't recompute min_ts after sleeping, except in the
               * unlikely case that cur_ts went backwards.  So we might end up
--- 4384,4392 ----
              /*
               * We set the minimum acceptable timestamp to PGSTAT_STAT_INTERVAL
               * msec before now.  This indirectly ensures that the collector
!              * needn't write the file more often than PGSTAT_STAT_INTERVAL.
!              * For vacuum_stat purposes, though, up to 100 times the normal
!              * age is considered acceptable.
               *
               * We don't recompute min_ts after sleeping, except in the
               * unlikely case that cur_ts went backwards.  So we might end up
*************** backend_read_statsfile(void)
*** 4390,4398 ****
               * actually accept.
               */
              ref_ts = cur_ts;
!             if (IsAutoVacuumWorkerProcess())
                  min_ts = TimestampTzPlusMilliseconds(ref_ts,
!                                                      -PGSTAT_RETRY_DELAY);
              else
                  min_ts = TimestampTzPlusMilliseconds(ref_ts,
                                                       -PGSTAT_STAT_INTERVAL);
--- 4397,4405 ----
               * actually accept.
               */
              ref_ts = cur_ts;
!             if (for_vacuum_stat)
                  min_ts = TimestampTzPlusMilliseconds(ref_ts,
!                                                 -100 * PGSTAT_STAT_INTERVAL);
              else
                  min_ts = TimestampTzPlusMilliseconds(ref_ts,
                                                       -PGSTAT_STAT_INTERVAL);
*************** backend_read_statsfile(void)
*** 4441,4447 ****
          pg_usleep(PGSTAT_RETRY_DELAY * 1000L);
      }

!     if (count >= PGSTAT_POLL_LOOP_COUNT)
          elog(WARNING, "pgstat wait timeout");

      /*
--- 4448,4454 ----
          pg_usleep(PGSTAT_RETRY_DELAY * 1000L);
      }

!     if (count >= PGSTAT_POLL_LOOP_COUNT && !for_vacuum_stat)
          elog(WARNING, "pgstat wait timeout");

      /*

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: proposal: row_to_array function
Следующее
От: Tom Lane
Дата:
Сообщение: Reducing buildfarm disk usage: remove temp installs when done