Обсуждение: [MASSMAIL]Cluster::restart dumping logs when stop fails

Поиск
Список
Период
Сортировка

[MASSMAIL]Cluster::restart dumping logs when stop fails

От
Andres Freund
Дата:
Hi,

I recently started to be bothered by regress_* logs after some kinds of test
failures containing the whole log of a test failure. E.g. in
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus&dt=2024-04-06%2016%3A28%3A38

...
### Restarting node "standby"
# Running: pg_ctl -w -D
/home/bf/bf-build/serinus/HEAD/pgsql.build/testrun/recovery/035_standby_logical_decoding/data/t_035_standby_logical_decoding_standby_data/pgdata
-l
/home/bf/bf-build/serinus/HEAD/pgsql.build/testrun/recovery/035_standby_logical_decoding/log/035_standby_logical_decoding_standby.log
restart
waiting for server to shut
down...........................................................................................................................................................................................................................................................................................................................................................................
failed
pg_ctl: server does not shut down
# pg_ctl restart failed; logfile:
2024-04-06 16:33:37.496 UTC [2628363][postmaster][:0] LOG:  starting PostgreSQL 17devel on x86_64-linux, compiled by
gcc-14.0.1,64-bit
 
2024-04-06 16:33:37.503 UTC [2628363][postmaster][:0] LOG:  listening on Unix socket "/tmp/55kikMaTyW/.s.PGSQL.63274"
<many lines>


Looks like the printing of the entire log was added in:

commit 33774978c78175095da9e6c276e8bcdb177725f8
Author: Daniel Gustafsson <dgustafsson@postgresql.org>
Date:   2023-09-22 13:35:37 +0200

    Avoid using internal test methods in SSL tests


It might be useful to print a few lines, but the whole log files can be
several megabytes worth of output.  In the buildfarm that leads to the same
information being collected multiple times, and locally it makes it hard to
see where the "normal" contents of regress_log* continue.

Greetings,

Andres Freund



Re: Cluster::restart dumping logs when stop fails

От
Daniel Gustafsson
Дата:
> On 6 Apr 2024, at 23:44, Andres Freund <andres@anarazel.de> wrote:

> It might be useful to print a few lines, but the whole log files can be
> several megabytes worth of output.

The non-context aware fix would be to just print the last 1024 (or something)
bytes from the logfile:

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 54e1008ae5..53d4751ffc 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -951,8 +951,8 @@ sub start

     if ($ret != 0)
     {
-        print "# pg_ctl start failed; logfile:\n";
-        print PostgreSQL::Test::Utils::slurp_file($self->logfile);
+        print "# pg_ctl start failed; logfile excerpt:\n";
+        print substr PostgreSQL::Test::Utils::slurp_file($self->logfile), -1024;

         # pg_ctl could have timed out, so check to see if there's a pid file;
         # otherwise our END block will fail to shut down the new postmaster.


Would that be a reasonable fix?

--
Daniel Gustafsson




Re: Cluster::restart dumping logs when stop fails

От
Andres Freund
Дата:
Hi,

On 2024-04-07 00:19:35 +0200, Daniel Gustafsson wrote:
> > On 6 Apr 2024, at 23:44, Andres Freund <andres@anarazel.de> wrote:
> 
> > It might be useful to print a few lines, but the whole log files can be
> > several megabytes worth of output.
> 
> The non-context aware fix would be to just print the last 1024 (or something)
> bytes from the logfile:

That'd be better, yes. I'd mainly log the path to the logfile though, that's
probably at least as helpful for actually investigating the issue?

> diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
> index 54e1008ae5..53d4751ffc 100644
> --- a/src/test/perl/PostgreSQL/Test/Cluster.pm
> +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
> @@ -951,8 +951,8 @@ sub start
>  
>      if ($ret != 0)
>      {
> -        print "# pg_ctl start failed; logfile:\n";
> -        print PostgreSQL::Test::Utils::slurp_file($self->logfile);
> +        print "# pg_ctl start failed; logfile excerpt:\n";
> +        print substr PostgreSQL::Test::Utils::slurp_file($self->logfile), -1024;
>  
>          # pg_ctl could have timed out, so check to see if there's a pid file;
>          # otherwise our END block will fail to shut down the new postmaster.

That's probably unnecessary optimization, but it seems a tad silly to read an
entire, potentially sizable, file to just use the last 1k. Not sure if the way
slurp_file() uses seek supports negative ofsets, the docs read to me like that
may only be supported with SEEK_END.

Greetings,

Andres Freund



Re: Cluster::restart dumping logs when stop fails

От
Daniel Gustafsson
Дата:
> On 7 Apr 2024, at 02:49, Andres Freund <andres@anarazel.de> wrote:
> On 2024-04-07 00:19:35 +0200, Daniel Gustafsson wrote:
>>> On 6 Apr 2024, at 23:44, Andres Freund <andres@anarazel.de> wrote:

>> The non-context aware fix would be to just print the last 1024 (or something)
>> bytes from the logfile:
>
> That'd be better, yes. I'd mainly log the path to the logfile though, that's
> probably at least as helpful for actually investigating the issue?

IIRC this was modelled around how it used to be earlier, in v14 in the
pre-refactored version we print the full logfile.

How about doing the below backpatched down to 15?

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 54e1008ae5..a2f9409842 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -951,8 +951,7 @@ sub start

     if ($ret != 0)
     {
-        print "# pg_ctl start failed; logfile:\n";
-        print PostgreSQL::Test::Utils::slurp_file($self->logfile);
+        print "# pg_ctl start failed; see logfile for details: " . $self->logfile . "\n";

         # pg_ctl could have timed out, so check to see if there's a pid file;
         # otherwise our END block will fail to shut down the new postmaster.
@@ -1090,8 +1089,7 @@ sub restart

     if ($ret != 0)
     {
-        print "# pg_ctl restart failed; logfile:\n";
-        print PostgreSQL::Test::Utils::slurp_file($self->logfile);
+        print "# pg_ctl restart failed; see logfile for details: " . $self->logfile . "\n";

         # pg_ctl could have timed out, so check to see if there's a pid file;
         # otherwise our END block will fail to shut down the new postmaster.

--
Daniel Gustafsson




Re: Cluster::restart dumping logs when stop fails

От
Andrew Dunstan
Дата:


On 2024-04-06 Sa 20:49, Andres Freund wrote:

That's probably unnecessary optimization, but it seems a tad silly to read an
entire, potentially sizable, file to just use the last 1k. Not sure if the way
slurp_file() uses seek supports negative ofsets, the docs read to me like that
may only be supported with SEEK_END.


We should enhance slurp_file() so it uses SEEK_END if the offset is negative. It would be a trivial patch:


diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 42d5a50dc8..8256573957 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -524,7 +524,7 @@ sub slurp_file
 
    if (defined($offset))
    {
-       seek($fh, $offset, SEEK_SET)
+       seek($fh, $offset, ($offset < 0 ? SEEK_END : SEEK_SET))
          or croak "could not seek \"$filename\": $!";
    }

cheers


andrew


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Cluster::restart dumping logs when stop fails

От
Daniel Gustafsson
Дата:
> On 7 Apr 2024, at 14:51, Andrew Dunstan <andrew@dunslane.net> wrote:
> On 2024-04-06 Sa 20:49, Andres Freund wrote:

>> That's probably unnecessary optimization, but it seems a tad silly to read an
>> entire, potentially sizable, file to just use the last 1k. Not sure if the way
>> slurp_file() uses seek supports negative ofsets, the docs read to me like that
>> may only be supported with SEEK_END.
>
> We should enhance slurp_file() so it uses SEEK_END if the offset is negative.

Absolutely agree.  Reading the thread I think Andres argues for not printing
anything at all in this case but we should support negative offsets anyways, it
will fort sure come in handy.

--
Daniel Gustafsson




Re: Cluster::restart dumping logs when stop fails

От
Andres Freund
Дата:
On 2024-04-07 16:52:05 +0200, Daniel Gustafsson wrote:
> > On 7 Apr 2024, at 14:51, Andrew Dunstan <andrew@dunslane.net> wrote:
> > On 2024-04-06 Sa 20:49, Andres Freund wrote:
> 
> >> That's probably unnecessary optimization, but it seems a tad silly to read an
> >> entire, potentially sizable, file to just use the last 1k. Not sure if the way
> >> slurp_file() uses seek supports negative ofsets, the docs read to me like that
> >> may only be supported with SEEK_END.
> > 
> > We should enhance slurp_file() so it uses SEEK_END if the offset is negative.
> 
> Absolutely agree.  Reading the thread I think Andres argues for not printing
> anything at all in this case but we should support negative offsets anyways, it
> will fort sure come in handy.

I'm ok with printing path + some content or just the path.

- Andres



Re: Cluster::restart dumping logs when stop fails

От
Daniel Gustafsson
Дата:
> On 7 Apr 2024, at 18:28, Andres Freund <andres@anarazel.de> wrote:
>
> On 2024-04-07 16:52:05 +0200, Daniel Gustafsson wrote:
>>> On 7 Apr 2024, at 14:51, Andrew Dunstan <andrew@dunslane.net> wrote:
>>> On 2024-04-06 Sa 20:49, Andres Freund wrote:
>>
>>>> That's probably unnecessary optimization, but it seems a tad silly to read an
>>>> entire, potentially sizable, file to just use the last 1k. Not sure if the way
>>>> slurp_file() uses seek supports negative ofsets, the docs read to me like that
>>>> may only be supported with SEEK_END.
>>>
>>> We should enhance slurp_file() so it uses SEEK_END if the offset is negative.
>>
>> Absolutely agree.  Reading the thread I think Andres argues for not printing
>> anything at all in this case but we should support negative offsets anyways, it
>> will fort sure come in handy.
>
> I'm ok with printing path + some content or just the path.

I think printing the last 512 bytes or so would be a good approach, I'll take
care of it later tonight. That would be a backpatchable change IMHO.

--
Daniel Gustafsson




Re: Cluster::restart dumping logs when stop fails

От
Andres Freund
Дата:
On 2024-04-07 18:51:40 +0200, Daniel Gustafsson wrote:
> > On 7 Apr 2024, at 18:28, Andres Freund <andres@anarazel.de> wrote:
> > 
> > On 2024-04-07 16:52:05 +0200, Daniel Gustafsson wrote:
> >>> On 7 Apr 2024, at 14:51, Andrew Dunstan <andrew@dunslane.net> wrote:
> >>> On 2024-04-06 Sa 20:49, Andres Freund wrote:
> >> 
> >>>> That's probably unnecessary optimization, but it seems a tad silly to read an
> >>>> entire, potentially sizable, file to just use the last 1k. Not sure if the way
> >>>> slurp_file() uses seek supports negative ofsets, the docs read to me like that
> >>>> may only be supported with SEEK_END.
> >>> 
> >>> We should enhance slurp_file() so it uses SEEK_END if the offset is negative.
> >> 
> >> Absolutely agree.  Reading the thread I think Andres argues for not printing
> >> anything at all in this case but we should support negative offsets anyways, it
> >> will fort sure come in handy.
> > 
> > I'm ok with printing path + some content or just the path.
> 
> I think printing the last 512 bytes or so would be a good approach, I'll take
> care of it later tonight. That would be a backpatchable change IMHO.

+1 - thanks for quickly improving this.



Re: Cluster::restart dumping logs when stop fails

От
Daniel Gustafsson
Дата:
> On 7 Apr 2024, at 18:51, Daniel Gustafsson <daniel@yesql.se> wrote:
>> On 7 Apr 2024, at 18:28, Andres Freund <andres@anarazel.de> wrote:

>> I'm ok with printing path + some content or just the path.
>
> I think printing the last 512 bytes or so would be a good approach, I'll take
> care of it later tonight. That would be a backpatchable change IMHO.

In the end I opted for just printing the path to avoid introducing logic (at
this point in the cycle) for ensuring that a negative offset doesn't exceed the
filesize.  Since it changes behavior I haven't backpatched it, but can do that
if we think it's more of a fix than a change (I think it can be argued either
way, I have no strong feelings).

--
Daniel Gustafsson