Обсуждение: [MASSMAIL]Cluster::restart dumping logs when stop fails
Hi, I recently started to be bothered by regress_* logs after some kinds of test failures containing the whole log of a test failure. E.g. in https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2024-04-06%2016%3A28%3A38 ... ### Restarting node "standby" # Running: pg_ctl -w -D /home/bf/bf-build/serinus/HEAD/pgsql.build/testrun/recovery/035_standby_logical_decoding/data/t_035_standby_logical_decoding_standby_data/pgdata -l /home/bf/bf-build/serinus/HEAD/pgsql.build/testrun/recovery/035_standby_logical_decoding/log/035_standby_logical_decoding_standby.log restart waiting for server to shut down........................................................................................................................................................................................................................................................................................................................................................................... failed pg_ctl: server does not shut down # pg_ctl restart failed; logfile: 2024-04-06 16:33:37.496 UTC [2628363][postmaster][:0] LOG: starting PostgreSQL 17devel on x86_64-linux, compiled by gcc-14.0.1,64-bit 2024-04-06 16:33:37.503 UTC [2628363][postmaster][:0] LOG: listening on Unix socket "/tmp/55kikMaTyW/.s.PGSQL.63274" <many lines> Looks like the printing of the entire log was added in: commit 33774978c78175095da9e6c276e8bcdb177725f8 Author: Daniel Gustafsson <dgustafsson@postgresql.org> Date: 2023-09-22 13:35:37 +0200 Avoid using internal test methods in SSL tests It might be useful to print a few lines, but the whole log files can be several megabytes worth of output. In the buildfarm that leads to the same information being collected multiple times, and locally it makes it hard to see where the "normal" contents of regress_log* continue. Greetings, Andres Freund
> On 6 Apr 2024, at 23:44, Andres Freund <andres@anarazel.de> wrote: > It might be useful to print a few lines, but the whole log files can be > several megabytes worth of output. The non-context aware fix would be to just print the last 1024 (or something) bytes from the logfile: diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm index 54e1008ae5..53d4751ffc 100644 --- a/src/test/perl/PostgreSQL/Test/Cluster.pm +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm @@ -951,8 +951,8 @@ sub start if ($ret != 0) { - print "# pg_ctl start failed; logfile:\n"; - print PostgreSQL::Test::Utils::slurp_file($self->logfile); + print "# pg_ctl start failed; logfile excerpt:\n"; + print substr PostgreSQL::Test::Utils::slurp_file($self->logfile), -1024; # pg_ctl could have timed out, so check to see if there's a pid file; # otherwise our END block will fail to shut down the new postmaster. Would that be a reasonable fix? -- Daniel Gustafsson
Hi, On 2024-04-07 00:19:35 +0200, Daniel Gustafsson wrote: > > On 6 Apr 2024, at 23:44, Andres Freund <andres@anarazel.de> wrote: > > > It might be useful to print a few lines, but the whole log files can be > > several megabytes worth of output. > > The non-context aware fix would be to just print the last 1024 (or something) > bytes from the logfile: That'd be better, yes. I'd mainly log the path to the logfile though, that's probably at least as helpful for actually investigating the issue? > diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm > index 54e1008ae5..53d4751ffc 100644 > --- a/src/test/perl/PostgreSQL/Test/Cluster.pm > +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm > @@ -951,8 +951,8 @@ sub start > > if ($ret != 0) > { > - print "# pg_ctl start failed; logfile:\n"; > - print PostgreSQL::Test::Utils::slurp_file($self->logfile); > + print "# pg_ctl start failed; logfile excerpt:\n"; > + print substr PostgreSQL::Test::Utils::slurp_file($self->logfile), -1024; > > # pg_ctl could have timed out, so check to see if there's a pid file; > # otherwise our END block will fail to shut down the new postmaster. That's probably unnecessary optimization, but it seems a tad silly to read an entire, potentially sizable, file to just use the last 1k. Not sure if the way slurp_file() uses seek supports negative ofsets, the docs read to me like that may only be supported with SEEK_END. Greetings, Andres Freund
> On 7 Apr 2024, at 02:49, Andres Freund <andres@anarazel.de> wrote: > On 2024-04-07 00:19:35 +0200, Daniel Gustafsson wrote: >>> On 6 Apr 2024, at 23:44, Andres Freund <andres@anarazel.de> wrote: >> The non-context aware fix would be to just print the last 1024 (or something) >> bytes from the logfile: > > That'd be better, yes. I'd mainly log the path to the logfile though, that's > probably at least as helpful for actually investigating the issue? IIRC this was modelled around how it used to be earlier, in v14 in the pre-refactored version we print the full logfile. How about doing the below backpatched down to 15? diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm index 54e1008ae5..a2f9409842 100644 --- a/src/test/perl/PostgreSQL/Test/Cluster.pm +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm @@ -951,8 +951,7 @@ sub start if ($ret != 0) { - print "# pg_ctl start failed; logfile:\n"; - print PostgreSQL::Test::Utils::slurp_file($self->logfile); + print "# pg_ctl start failed; see logfile for details: " . $self->logfile . "\n"; # pg_ctl could have timed out, so check to see if there's a pid file; # otherwise our END block will fail to shut down the new postmaster. @@ -1090,8 +1089,7 @@ sub restart if ($ret != 0) { - print "# pg_ctl restart failed; logfile:\n"; - print PostgreSQL::Test::Utils::slurp_file($self->logfile); + print "# pg_ctl restart failed; see logfile for details: " . $self->logfile . "\n"; # pg_ctl could have timed out, so check to see if there's a pid file; # otherwise our END block will fail to shut down the new postmaster. -- Daniel Gustafsson
On 2024-04-06 Sa 20:49, Andres Freund wrote:
That's probably unnecessary optimization, but it seems a tad silly to read an entire, potentially sizable, file to just use the last 1k. Not sure if the way slurp_file() uses seek supports negative ofsets, the docs read to me like that may only be supported with SEEK_END.
We should enhance slurp_file() so it uses SEEK_END if the offset is negative. It would be a trivial patch:
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm index 42d5a50dc8..8256573957 100644 --- a/src/test/perl/PostgreSQL/Test/Utils.pm +++ b/src/test/perl/PostgreSQL/Test/Utils.pm @@ -524,7 +524,7 @@ sub slurp_file if (defined($offset)) { - seek($fh, $offset, SEEK_SET) + seek($fh, $offset, ($offset < 0 ? SEEK_END : SEEK_SET)) or croak "could not seek \"$filename\": $!"; }
cheers
andrew
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
> On 7 Apr 2024, at 14:51, Andrew Dunstan <andrew@dunslane.net> wrote: > On 2024-04-06 Sa 20:49, Andres Freund wrote: >> That's probably unnecessary optimization, but it seems a tad silly to read an >> entire, potentially sizable, file to just use the last 1k. Not sure if the way >> slurp_file() uses seek supports negative ofsets, the docs read to me like that >> may only be supported with SEEK_END. > > We should enhance slurp_file() so it uses SEEK_END if the offset is negative. Absolutely agree. Reading the thread I think Andres argues for not printing anything at all in this case but we should support negative offsets anyways, it will fort sure come in handy. -- Daniel Gustafsson
On 2024-04-07 16:52:05 +0200, Daniel Gustafsson wrote: > > On 7 Apr 2024, at 14:51, Andrew Dunstan <andrew@dunslane.net> wrote: > > On 2024-04-06 Sa 20:49, Andres Freund wrote: > > >> That's probably unnecessary optimization, but it seems a tad silly to read an > >> entire, potentially sizable, file to just use the last 1k. Not sure if the way > >> slurp_file() uses seek supports negative ofsets, the docs read to me like that > >> may only be supported with SEEK_END. > > > > We should enhance slurp_file() so it uses SEEK_END if the offset is negative. > > Absolutely agree. Reading the thread I think Andres argues for not printing > anything at all in this case but we should support negative offsets anyways, it > will fort sure come in handy. I'm ok with printing path + some content or just the path. - Andres
> On 7 Apr 2024, at 18:28, Andres Freund <andres@anarazel.de> wrote: > > On 2024-04-07 16:52:05 +0200, Daniel Gustafsson wrote: >>> On 7 Apr 2024, at 14:51, Andrew Dunstan <andrew@dunslane.net> wrote: >>> On 2024-04-06 Sa 20:49, Andres Freund wrote: >> >>>> That's probably unnecessary optimization, but it seems a tad silly to read an >>>> entire, potentially sizable, file to just use the last 1k. Not sure if the way >>>> slurp_file() uses seek supports negative ofsets, the docs read to me like that >>>> may only be supported with SEEK_END. >>> >>> We should enhance slurp_file() so it uses SEEK_END if the offset is negative. >> >> Absolutely agree. Reading the thread I think Andres argues for not printing >> anything at all in this case but we should support negative offsets anyways, it >> will fort sure come in handy. > > I'm ok with printing path + some content or just the path. I think printing the last 512 bytes or so would be a good approach, I'll take care of it later tonight. That would be a backpatchable change IMHO. -- Daniel Gustafsson
On 2024-04-07 18:51:40 +0200, Daniel Gustafsson wrote: > > On 7 Apr 2024, at 18:28, Andres Freund <andres@anarazel.de> wrote: > > > > On 2024-04-07 16:52:05 +0200, Daniel Gustafsson wrote: > >>> On 7 Apr 2024, at 14:51, Andrew Dunstan <andrew@dunslane.net> wrote: > >>> On 2024-04-06 Sa 20:49, Andres Freund wrote: > >> > >>>> That's probably unnecessary optimization, but it seems a tad silly to read an > >>>> entire, potentially sizable, file to just use the last 1k. Not sure if the way > >>>> slurp_file() uses seek supports negative ofsets, the docs read to me like that > >>>> may only be supported with SEEK_END. > >>> > >>> We should enhance slurp_file() so it uses SEEK_END if the offset is negative. > >> > >> Absolutely agree. Reading the thread I think Andres argues for not printing > >> anything at all in this case but we should support negative offsets anyways, it > >> will fort sure come in handy. > > > > I'm ok with printing path + some content or just the path. > > I think printing the last 512 bytes or so would be a good approach, I'll take > care of it later tonight. That would be a backpatchable change IMHO. +1 - thanks for quickly improving this.
> On 7 Apr 2024, at 18:51, Daniel Gustafsson <daniel@yesql.se> wrote: >> On 7 Apr 2024, at 18:28, Andres Freund <andres@anarazel.de> wrote: >> I'm ok with printing path + some content or just the path. > > I think printing the last 512 bytes or so would be a good approach, I'll take > care of it later tonight. That would be a backpatchable change IMHO. In the end I opted for just printing the path to avoid introducing logic (at this point in the cycle) for ensuring that a negative offset doesn't exceed the filesize. Since it changes behavior I haven't backpatched it, but can do that if we think it's more of a fix than a change (I think it can be argued either way, I have no strong feelings). -- Daniel Gustafsson