Обсуждение: Server crash in pg_replication_slot_advance function
Hi,
While playing around with newly implemented function
'pg_replication_slot_advance' , found a server crash .
Please refer the testcase scenario -
centos@centos-cpula bin]$ ./psql postgres
psql (11devel)
Type "help" for help.
postgres=# SELECT * FROM
pg_create_logical_replication_slot('regression_slot1', 'test_decoding',
true);
slot_name | lsn
------------------+-----------
regression_slot1 | 0/1636AE8
(1 row)
postgres=# select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database |
temporary | active | active_pid | xmin | catalog_xmin | restart_lsn |
confirmed_flush_lsn
------------------+---------------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
regression_slot1 | test_decoding | logical | 13220 | postgres |
t | t | 25910 | | 557 | 0/1636AB0 |
0/1636AE8
(1 row)
postgres=# select pg_switch_wal();
pg_switch_wal
---------------
0/1636B38
(1 row)
postgres=# select pg_switch_wal();
pg_switch_wal
---------------
0/2000078
(1 row)
postgres=# SELECT end_lsn FROM
pg_replication_slot_advance('regression_slot1', '0/2000078');
end_lsn
-----------
0/2000078
(1 row)
postgres=# select * from pg_replication_slots;
2018-02-16 08:09:57.989 GMT [25910] LOG: statement: select * from
pg_replication_slots;
slot_name | plugin | slot_type | datoid | database |
temporary | active | active_pid | xmin | catalog_xmin | restart_lsn |
confirmed_flush_lsn
------------------+---------------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
regression_slot1 | test_decoding | logical | 13220 | postgres |
t | t | 25910 | | 557 | 0/2000028 |
0/2000078
(1 row)
postgres=# SELECT end_lsn FROM
pg_replication_slot_advance('regression_slot1', '0/2000071');
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
!>
Stack trace -
Core was generated by `postgres: centos postgres [local]
SELECT '.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000008519a8 in pg_replication_slot_advance
(fcinfo=0x7fff5c9c4630) at slotfuncs.c:479
479 ereport(ERROR,
Missing separate debuginfos, use: debuginfo-install
glibc-2.12-1.192.el6.x86_64 keyutils-libs-1.4-5.el6.x86_64
krb5-libs-1.10.3-57.el6.x86_64 libcom_err-1.41.12-22.el6.x86_64
libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-48.el6_8.4.x86_64
zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000000008519a8 in pg_replication_slot_advance
(fcinfo=0x7fff5c9c4630) at slotfuncs.c:479
#1 0x00000000006defdc in ExecMakeTableFunctionResult
(setexpr=0x20f7db0, econtext=0x20f7aa8, argContext=0x20fd690,
expectedDesc=0x20f92d0, randomAccess=0 '\000') at execSRF.c:231
#2 0x00000000006f1782 in FunctionNext (node=0x20f7990) at
nodeFunctionscan.c:95
#3 0x00000000006de80d in ExecScanFetch (node=0x20f7990,
accessMtd=0x6f16cb <FunctionNext>, recheckMtd=0x6f1abd
<FunctionRecheck>) at execScan.c:95
#4 0x00000000006de8ad in ExecScan (node=0x20f7990, accessMtd=0x6f16cb
<FunctionNext>, recheckMtd=0x6f1abd <FunctionRecheck>) at execScan.c:162
#5 0x00000000006f1b0a in ExecFunctionScan (pstate=0x20f7990) at
nodeFunctionscan.c:270
#6 0x00000000006dcd66 in ExecProcNodeFirst (node=0x20f7990) at
execProcnode.c:446
#7 0x00000000006d3c05 in ExecProcNode (node=0x20f7990) at
../../../src/include/executor/executor.h:246
#8 0x00000000006d6559 in ExecutePlan (estate=0x20f7778,
planstate=0x20f7990, use_parallel_mode=0 '\000', operation=CMD_SELECT,
sendTuples=1 '\001', numberTuples=0,
direction=ForwardScanDirection, dest=0x20fd288, execute_once=1
'\001') at execMain.c:1721
#9 0x00000000006d41d7 in standard_ExecutorRun (queryDesc=0x2065878,
direction=ForwardScanDirection, count=0, execute_once=1 '\001') at
execMain.c:361
#10 0x00000000006d3ff3 in ExecutorRun (queryDesc=0x2065878,
direction=ForwardScanDirection, count=0, execute_once=1 '\001') at
execMain.c:304
#11 0x00000000008afeeb in PortalRunSelect (portal=0x20a5838, forward=1
'\001', count=0, dest=0x20fd288) at pquery.c:932
#12 0x00000000008afb79 in PortalRun (portal=0x20a5838,
count=9223372036854775807, isTopLevel=1 '\001', run_once=1 '\001',
dest=0x20fd288, altdest=0x20fd288,
completionTag=0x7fff5c9c4f00 "") at pquery.c:773
#13 0x00000000008a9ba0 in exec_simple_query (query_string=0x2040438
"SELECT end_lsn FROM pg_replication_slot_advance('regression_slot1',
'0/2000071');") at postgres.c:1120
#14 0x00000000008ade34 in PostgresMain (argc=1, argv=0x206bec0,
dbname=0x206bd20 "postgres", username=0x203cf58 "centos") at postgres.c:4144
#15 0x000000000080d11a in BackendRun (port=0x2063d00) at postmaster.c:4412
#16 0x000000000080c88e in BackendStartup (port=0x2063d00) at
postmaster.c:4084
#17 0x0000000000808cbb in ServerLoop () at postmaster.c:1757
#18 0x0000000000808379 in PostmasterMain (argc=3, argv=0x203aeb0) at
postmaster.c:1365
#19 0x000000000073d3d0 in main (argc=3, argv=0x203aeb0) at main.c:228
(gdb)
--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company
On Fri, Feb 16, 2018 at 1:44 PM, tushar <tushar.ahuja@enterprisedb.com> wrote:
> Hi,
[....]
> postgres=# SELECT end_lsn FROM
> pg_replication_slot_advance('regression_slot1', '0/2000071');
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> !>
>
I am able to reproduce this on the latest master head, the problem is in the
following hunk of pg_replication_slot_advance() where oldest LSN value
is accessed after releasing replication slot:
476 if (moveto < startlsn)
477 {
478 ReplicationSlotRelease();
479 ereport(ERROR,
480 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
481 errmsg("cannot move slot to %X/%X, minimum is %X/%X",
482 (uint32) (moveto >> 32), (uint32) moveto,
483 (uint32)
(MyReplicationSlot->data.confirmed_flush >> 32),
484 (uint32)
(MyReplicationSlot->data.confirmed_flush))));
485 }
486
Regards,
Amul
On Fri, Feb 16, 2018 at 3:06 PM, amul sul <sulamul@gmail.com> wrote:
> On Fri, Feb 16, 2018 at 1:44 PM, tushar <tushar.ahuja@enterprisedb.com> wrote:
>> Hi,
> [....]
>> postgres=# SELECT end_lsn FROM
>> pg_replication_slot_advance('regression_slot1', '0/2000071');
>> server closed the connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>> !>
>>
>
> I am able to reproduce this on the latest master head, the problem is in the
> following hunk of pg_replication_slot_advance() where oldest LSN value
> is accessed after releasing replication slot:
>
>
> 476 if (moveto < startlsn)
> 477 {
> 478 ReplicationSlotRelease();
> 479 ereport(ERROR,
> 480 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> 481 errmsg("cannot move slot to %X/%X, minimum is %X/%X",
> 482 (uint32) (moveto >> 32), (uint32) moveto,
> 483 (uint32)
> (MyReplicationSlot->data.confirmed_flush >> 32),
> 484 (uint32)
> (MyReplicationSlot->data.confirmed_flush))));
> 485 }
> 486
>
Attached patch proposes a required fix.
Regards,
Amul
Вложения
On 02/16/2018 03:25 PM, amul sul wrote: > Attached patch proposes a required fix. Thanks, Issue seems to be fixed with this patch , now getting an expected error -ERROR: cannot move slot to 0/2000090, minimum is 0/2000098 -- regards,tushar EnterpriseDB https://www.enterprisedb.com/ The Enterprise PostgreSQL Company
On 16/02/18 10:55, amul sul wrote:
> On Fri, Feb 16, 2018 at 3:06 PM, amul sul <sulamul@gmail.com> wrote:
>> On Fri, Feb 16, 2018 at 1:44 PM, tushar <tushar.ahuja@enterprisedb.com> wrote:
>>> Hi,
>> [....]
>>> postgres=# SELECT end_lsn FROM
>>> pg_replication_slot_advance('regression_slot1', '0/2000071');
>>> server closed the connection unexpectedly
>>> This probably means the server terminated abnormally
>>> before or while processing the request.
>>> !>
>>>
>>
>> I am able to reproduce this on the latest master head, the problem is in the
>> following hunk of pg_replication_slot_advance() where oldest LSN value
>> is accessed after releasing replication slot:
>>
>>
>> 476 if (moveto < startlsn)
>> 477 {
>> 478 ReplicationSlotRelease();
>> 479 ereport(ERROR,
>> 480 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>> 481 errmsg("cannot move slot to %X/%X, minimum is %X/%X",
>> 482 (uint32) (moveto >> 32), (uint32) moveto,
>> 483 (uint32)
>> (MyReplicationSlot->data.confirmed_flush >> 32),
>> 484 (uint32)
>> (MyReplicationSlot->data.confirmed_flush))));
>> 485 }
>> 486
>>
Yeah, of course we can't use MyReplicationSlot after calling
ReplicationSlotRelease().
>
> Attached patch proposes a required fix.
>
Looks correct to me.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Feb 16, 2018 at 6:55 PM, amul sul <sulamul@gmail.com> wrote:
> On Fri, Feb 16, 2018 at 3:06 PM, amul sul <sulamul@gmail.com> wrote:
>> On Fri, Feb 16, 2018 at 1:44 PM, tushar <tushar.ahuja@enterprisedb.com> wrote:
>>> Hi,
>> [....]
>>> postgres=# SELECT end_lsn FROM
>>> pg_replication_slot_advance('regression_slot1', '0/2000071');
>>> server closed the connection unexpectedly
>>> This probably means the server terminated abnormally
>>> before or while processing the request.
>>> !>
>>>
>>
>> I am able to reproduce this on the latest master head, the problem is in the
>> following hunk of pg_replication_slot_advance() where oldest LSN value
>> is accessed after releasing replication slot:
>>
>>
>> 476 if (moveto < startlsn)
>> 477 {
>> 478 ReplicationSlotRelease();
>> 479 ereport(ERROR,
>> 480 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>> 481 errmsg("cannot move slot to %X/%X, minimum is %X/%X",
>> 482 (uint32) (moveto >> 32), (uint32) moveto,
>> 483 (uint32)
>> (MyReplicationSlot->data.confirmed_flush >> 32),
>> 484 (uint32)
>> (MyReplicationSlot->data.confirmed_flush))));
>> 485 }
>> 486
>>
>
> Attached patch proposes a required fix.
>
The change looks good to me. Thank you.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Petr Jelinek wrote: > > Attached patch proposes a required fix. > > > > Looks correct to me. Masahiko Sawada wrote: > The change looks good to me. Thank you. Thanks, pushed. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services