Re: Adding CI to our tree

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Adding CI to our tree
Дата
Msg-id 647439.1642622744@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Adding CI to our tree  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Adding CI to our tree  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
I wrote:
> This test attempt revealed another problem too: the standby never
> shut down, and thus the calling "make" never quit, until I intervened
> manually.  I'm not sure why.  I see that Cluster::promote uses
> system_or_bail() to run "pg_ctl promote" ... could it be that
> BAIL_OUT causes the normal script END hooks to not get run?
> But it seems like we'd have noticed that long ago.

I failed to reproduce any failure in the promote step, and I now
think I was mistaken and it happened during the standby's initial
start.  I can reproduce that very easily by setting PGCTLTIMEOUT
to a second or two; with fsync enabled, it takes the standby more
than that to reach a consistent state.  And the cause of that
is obvious: Cluster::start thinks that if "pg_ctl start" failed,
there couldn't possibly be a postmaster running.  So it doesn't
bother to update self->_pid, and then the END hook thinks there
is nothing to do.

Now, leaving an idle postmaster hanging around isn't a mortal sin,
since it'll go away by itself shortly after the next cycle of
testing does an "rm -rf" on its data directory.  But it's ugly,
and conceivably it could cause problems for later testing on
machines with limited shmem or semaphore space.

The attached simple fix gets rid of this problem.  Any objections?

            regards, tom lane

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 7af0f8db13..fd0738d12d 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -845,6 +845,11 @@ sub start
     {
         print "# pg_ctl start failed; logfile:\n";
         print PostgreSQL::Test::Utils::slurp_file($self->logfile);
+
+        # pg_ctl could have timed out, so check to see if there's a pid file;
+        # without this, we fail to shut down the new postmaster later.
+        $self->_update_pid(-1);
+
         BAIL_OUT("pg_ctl start failed") unless $params{fail_ok};
         return 0;
     }
@@ -1123,7 +1128,10 @@ archive_command = '$copy_command'
     return;
 }

-# Internal method
+# Internal method to update $self->{_pid}
+# $is_running = 1: pid file should be there
+# $is_running = 0: pid file should NOT be there
+# $is_running = -1: we aren't sure
 sub _update_pid
 {
     my ($self, $is_running) = @_;
@@ -1138,7 +1146,7 @@ sub _update_pid
         close $pidfile;

         # If we found a pidfile when there shouldn't be one, complain.
-        BAIL_OUT("postmaster.pid unexpectedly present") unless $is_running;
+        BAIL_OUT("postmaster.pid unexpectedly present") if $is_running == 0;
         return;
     }

@@ -1146,7 +1154,7 @@ sub _update_pid
     print "# No postmaster PID for node \"$name\"\n";

     # Complain if we expected to find a pidfile.
-    BAIL_OUT("postmaster.pid unexpectedly not present") if $is_running;
+    BAIL_OUT("postmaster.pid unexpectedly not present") if $is_running == 1;
     return;
 }


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Bossart, Nathan"
Дата:
Сообщение: Re: do only critical work during single-user vacuum?
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: Schema variables - new implementation for Postgres 15