Re: Improve error reporting in 027_stream_regress test
От | Nazir Bilal Yavuz |
---|---|
Тема | Re: Improve error reporting in 027_stream_regress test |
Дата | |
Msg-id | CAN55FZ3oQmX0NLPg=VK7qT08+hP5gYo-rAYP9WZR1rCMSovdjw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Improve error reporting in 027_stream_regress test (Michael Paquier <michael@paquier.xyz>) |
Ответы |
Re: Improve error reporting in 027_stream_regress test
|
Список | pgsql-hackers |
Hi, Thanks for looking into this! On Wed, 16 Jul 2025 at 04:39, Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Jul 01, 2025 at 10:57:11AM +0300, Nazir Bilal Yavuz wrote: > > On Mon, 30 Jun 2025 at 18:01, Andres Freund <andres@anarazel.de> wrote: > >> One thing I don't yet like is that I think we should report if the primary is > >> alive *before* reporting the diff and skipping reporting it if the primary > >> crashed. It's not interesting to report a long diff if the server crashed IMO. > > Right. That's just more noise one needs to dig into to find out > what's wrong. > > > I agree with you. So, the current logic is: > > > > If primary is not alive: Do not report anything. > > Okay. > > > If only primary is alive: Report the entire diff file. > > Hmm. Is that actually useful as we know that the standby has been > stalen down when running the test? Even if we report something, we > could always trim the output, like the standby case. I guess you are right. Also, I tried killing standby while the 027_stream_regress test was running and the test did not fail; instead it continued until timeout. If that is always the case, then it makes sense to report if and only if both primary and standby are alive. > > If both primary and standby are alive: Report entire diff file and add > > head+tail of diff to the failure message. > > +=item $node->is_alive() > + > +Check if the node is alive. > + > +=cut > + > +sub is_alive { > + my ($self) = @_; > + > + my $host = $self->host; > + my $port = $self->port; > + my $null = File::Spec->devnull; > + > + my $cmd = "pg_isready -h $host -p $port"; > + return !system("$cmd >$null 2>&1"); > +} > > It's strange to re-add a dependency to system() while we have wrappers > around it, like system_log() in Utils.pm. system_log() logs which command is run. I actually do not want to log the command because it needs to be run after the regression tests and before the 'regression tests pass'. If I log the command it looks like this: # All 228 tests passed. # Running: pg_isready -h /tmp/EXCcSldGZE -p 10493 >/dev/null 2>&1 # Running: pg_isready -h /tmp/EXCcSldGZE -p 10494 >/dev/null 2>&1 (21.939s) ok 2 - regression tests pass (0.000s) ok 3 - primary is alive after the regression tests (0.000s) ok 4 - standby is alive after the regression tests I think it looks like these pg_isready commands run randomly. > You may want to add a "local %ENV = $self->_get_env();" to make sure > that we use a command related to the build of the node initialized. You are right, I will apply this. > +sub regression_log_helper > +{ > + my ($diff_file, $lines_to_dump) = @_; > + my @lines; > + > + open my $fh, '<', $diff_file or die "couldn't open file: $diff_file\n"; > + > + # Read all lines to process them below > + while (my $line = <$fh>) > + { > + push @lines, $line; > + } > + close $fh; > > We could use that as a routine of its own in Utils.pm? It's not > really something only for diff files, more something that we can use > to trim the contents of a file, which could be the tail of a server > log file as well.. It's always difficult to measure what a "good" > number of lines is, but perhaps we could use an environment variable > to make that configurable rather than enforce a policy of 50 lines > because we think it's good enough? I think this is a good idea. How about a function called file_trim() with three arguments: the file name, a mode ('head' or 'tail'), and the number of lines to trim from that end. This approach may require reading the file more than once, but it is easy to use. > The fact that we expect the primary and the standby to be alive once > the tests are run and the addition of two tests related to them seems > unrelated to the diff report improvements, so I'd suggest to do both > things separately. Correct, I will separate them. -- Regards, Nazir Bilal Yavuz Microsoft
В списке pgsql-hackers по дате отправления: