Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical() at walsender.c:2762

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical() at walsender.c:2762
Дата
Msg-id 20200528.170436.1361384430172307883.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical () at walsender.c:2762  (Vladimir Sitnikov <sitnikov.vladimir@gmail.com>)
Ответы Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical ()at walsender.c:2762  (Vladimir Sitnikov <sitnikov.vladimir@gmail.com>)
Список pgsql-hackers
At Thu, 28 May 2020 09:07:04 +0300, Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote in 
> Pgjdbc test suite identified a SIGSEGV in the recent HEAD builds of
> PostgreSQL, Ubuntu 14.04.5 LTS
> 
> Here's a call stack:
> https://travis-ci.org/github/pgjdbc/pgjdbc/jobs/691794110#L7484
> The crash is consistent, and it reproduces 100% of the cases so far.
> 
> The CI history shows that HEAD was good at 11 May 13:27 UTC, and it became
> bad by 19 May 14:00 UTC,
> so the regression was introduced somewhere in-between.
> 
> Does that ring any bells?

Thanks for the report.  It is surely a bug since the server crashes,
on the other hand Pgjdbc seems doing bad, too.

It seems to me that that crash means Pgjdbc is initiating a logical
replication connection to start physical replication.

> In case you wonder:
> 
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  XLogSendPhysical () at walsender.c:2762
> 2762 if (!WALRead(xlogreader,
> (gdb) #0  XLogSendPhysical () at walsender.c:2762
>         SendRqstPtr = 133473640
>         startptr = 133473240
>         endptr = 133473640
>         nbytes = 400
>         segno = 1
>         errinfo = {wre_errno = 988942240, wre_off = 2, wre_req = -1,
>           wre_read = -1, wre_seg = {ws_file = 4714224,
>             ws_segno = 140729887364688, ws_tli = 0}}
>         __func__ = "XLogSendPhysical"

I see the probably the same symptom by the following steps with the
current HEAD.

psql 'host=/tmp replication=database'
=# START_REPLICATION 0/1;
<serer crashes>

Physical replication is not assumed to be started on a logical
replication connection. The attached would fix that.  The patch adds
two tests.  One for this case and another for the reverse.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 1f94c98f7459ca8a4942246325815a3e0a91caa4 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com>
Date: Thu, 28 May 2020 15:55:30 +0900
Subject: [PATCH v1] Fix crash when starting physical replication on logical
 connection

It is an illegal operation to try starting physical replication on a
logical replication session.  We should properly warn the client
instead of crashing.
---
 src/backend/replication/walsender.c         |  5 +++++
 src/test/recovery/t/001_stream_rep.pl       | 14 +++++++++++---
 src/test/recovery/t/006_logical_decoding.pl | 10 +++++++++-
 3 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 86847cbb54..7b79c75311 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -589,6 +589,11 @@ StartReplication(StartReplicationCmd *cmd)
     StringInfoData buf;
     XLogRecPtr    FlushPtr;
 
+    if (am_db_walsender)
+        ereport(ERROR,
+                (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                 errmsg("cannot initiate physical replication on a logical replication connection")));
+
     if (ThisTimeLineID == 0)
         ereport(ERROR,
                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index 0c316c1808..0b69b7d8d1 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 35;
+use Test::More tests => 36;
 
 # Initialize master node
 my $node_master = get_new_node('master');
@@ -18,6 +18,14 @@ my $backup_name = 'my_backup';
 # Take backup
 $node_master->backup($backup_name);
 
+# Check if logical-rep session properly refuses to start physical-rep
+my ($ret, $stdout, $stderr) =
+  $node_master->psql('template1',
+                     qq[START_REPLICATION PHYSICAL 0/1],
+                     replication=>'database');
+ok($stderr =~ /ERROR:  cannot initiate physical replication on a logical replication connection/,
+   "check if physical replication is rejected on logical-rep session");
+
 # Create streaming standby linking to master
 my $node_standby_1 = get_new_node('standby_1');
 $node_standby_1->init_from_backup($node_master, $backup_name,
@@ -94,7 +102,7 @@ sub test_target_session_attrs
 
     # The client used for the connection does not matter, only the backend
     # point does.
-    my ($ret, $stdout, $stderr) =
+    ($ret, $stdout, $stderr) =
       $node1->psql('postgres', 'SHOW port;',
         extra_params => [ '-d', $connstr ]);
     is( $status == $ret && $stdout eq $target_node->port,
@@ -136,7 +144,7 @@ my $connstr_rep    = "$connstr_common replication=1";
 my $connstr_db     = "$connstr_common replication=database dbname=postgres";
 
 # Test SHOW ALL
-my ($ret, $stdout, $stderr) = $node_master->psql(
+($ret, $stdout, $stderr) = $node_master->psql(
     'postgres', 'SHOW ALL;',
     on_error_die => 1,
     extra_params => [ '-d', $connstr_rep ]);
diff --git a/src/test/recovery/t/006_logical_decoding.pl b/src/test/recovery/t/006_logical_decoding.pl
index ee05535b1c..eefde8c3f1 100644
--- a/src/test/recovery/t/006_logical_decoding.pl
+++ b/src/test/recovery/t/006_logical_decoding.pl
@@ -7,7 +7,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 13;
+use Test::More tests => 14;
 use Config;
 
 # Initialize master node
@@ -36,6 +36,14 @@ ok( $stderr =~
       m/replication slot "test_slot" was not created in this database/,
     "Logical decoding correctly fails to start");
 
+# Check if physical-rep session properly refuses to start logical-decoding
+($result, $stdout, $stderr) =
+  $node_master->psql('template1',
+                     qq[START_REPLICATION SLOT s1 LOGICAL 0/1],
+                     replication=>'true');
+ok($stderr =~ /ERROR:  logical decoding requires a database connection/,
+   "check if logical decoding is refused on physical-rep connection");
+
 $node_master->safe_psql('postgres',
     qq[INSERT INTO decoding_test(x,y) SELECT s, s::text FROM generate_series(1,10) s;]
 );
-- 
2.18.2


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Noah Misch
Дата:
Сообщение: Re: Resolving the python 2 -> python 3 mess
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Fix compilation failure against LLVM 11