Re: [HACKERS] SUBSCRIPTION command hangs up, segfault occurs in theserver process and to cancel the execution.

Поиск
Список
Период
Сортировка
От Fujii Masao
Тема Re: [HACKERS] SUBSCRIPTION command hangs up, segfault occurs in theserver process and to cancel the execution.
Дата
Msg-id CAHGQGwH=DaJXXd7gQkFA9XTzdY9jLhs3xt2-aYf=jHMhFZsL7w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] SUBSCRIPTION command hangs up, segfault occurs in theserver process and to cancel the execution.  (Masahiko Sawada <sawada.mshk@gmail.com>)
Список pgsql-hackers
On Wed, Feb 22, 2017 at 1:54 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Tue, Feb 21, 2017 at 7:11 AM, nuko yokohama <nuko.yokohama@gmail.com> wrote:
>> Hi.
>> While verifying the logical replication of PostgreSQL 10-devel, I found the
>> following problem.

Thanks for the problem report! This problem was also reported before.
https://www.postgresql.org/message-id/A1E9CB90-1FAC-4CAD-8DBA-9AA62A6E97C5@postgrespro.ru

>> * When you run the SUBSCRIPTION command against a table in the same
>> PostgreSQL server, hang up.
>> * Canceling the hung SUBSCRIPTION command with CTRL + C causes a server
>> process occurs Segfault, and the PostgreSQL server to restart.
>>
>> --------
>> [Steps to Reproduce]
>>
>> $ createdb db1 -U postgres
>> $ createdb db2 -U postgres
>> $ psql -U postgres db1 -c "CREATE TABLE test (id int primary key, data
>> text)"
>> $ psql -U postgres db2 -c "CREATE TABLE test (id int primary key, data
>> text)"
>> $ psql -U postgres db1 -c "CREATE PUBLICATION pub_db1_test FOR TABLE test"
>> $ psql -U postgres db2 -c "CREATE SUBSCRIPTION sub_db2_test CONNECTION
>> 'dbname=db1 port=5432 user=postgres' PUBLICATION pub_db1_test"
>>
>> The SUBSCRIPTION command does not end, it hangs up.
>> At this time, the following logs are output in the server log.
>>
>> 2017-02-21 06:58:05.082 JST [22060] LOG:  logical decoding found initial
>> starting point at 0/1C06660
>> 2017-02-21 06:58:05.082 JST [22060] DETAIL:  1 transaction needs to finish.
>>
>> Suspending psql (running SUBSCRIPTION) with CTRL + C causes a Segfault in
>> the server process.
>> At this time, the following message is output to the server log.
>>
>> 2017-02-21 07:01:00.246 JST [22059] ERROR:  canceling statement due to user
>> request
>> 2017-02-21 07:01:00.246 JST [22059] STATEMENT:  CREATE SUBSCRIPTION
>> sub_db2_test CONNECTION 'dbname=db1 port=5432 user=postgres' PUBLICATION
>> pub_db1_test
>> 2017-02-21 07:01:01.006 JST [21445] LOG:  server process (PID 22060) was
>> terminated by signal 11: Segmentation fault
>> 2017-02-21 07:01:01.007 JST [21445] LOG:  terminating any other active
>> server processes
>>
>> --------
>> [Environment]
>>
>> postgres=# SELECT version();
>>                                                   version
>> ------------------------------------------------------------------------------------------------------------
>>  PostgreSQL 10devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.2
>> 20140120 (Red Hat 4.8.2-16), 64-bit
>> (1 row)
>>
>> postgres=# SHOW wal_level;
>>  wal_level
>> -----------
>>  logical
>> (1 row)
>>
>> postgres=# SHOW max_wal_senders;;
>>  max_wal_senders
>> -----------------
>>  10
>> (1 row)
>>
>> postgres=# SHOW max_replication_slots;
>>  max_replication_slots
>> -----------------------
>>  10
>> (1 row)
>>
>> postgres=# TABLE pg_hba_file_rules ;
>>  line_number | type  |   database    | user_name  |  address  |
>> netmask                 | auth_method | options | error
>>
-------------+-------+---------------+------------+-----------+-----------------------------------------+-------------+---------+-------
>>            2 | local | {all}         | {all}      |           |
>> | trust       |         |
>>            4 | host  | {all}         | {all}      | 127.0.0.1 |
>> 255.255.255.255                         | trust       |         |
>>            6 | host  | {all}         | {all}      | ::1       |
>> ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff | trust       |         |
>>            9 | local | {replication} | {postgres} |           |
>> | trust       |         |
>> (4 rows)
>>
>>
>
> As log message says, the reason why CREATE SUBSCRIPTION hangs up is
> that it's waiting for another transaction finish.
>
> Regarding SEGV, it occurs by wal sender process creating replication
> slot. When SEV occurs the backtrace is as follows.
>
> (gdb) bt
> #0  0x00000000006e3d47 in resetStringInfo (str=0xea1c60) at stringinfo.c:96
> #1  0x00000000007e4738 in ProcessRepliesIfAny () at walsender.c:1347
> #2  0x00000000007e42c5 in WalSndWaitForWal (loc=23400264) at walsender.c:1142
> #3  0x00000000007e38f6 in logical_read_xlog_page (state=0x2732008,
> targetPagePtr=23396352, reqLen=3912, targetRecPtr=23400240,
> cur_page=0x2733d58 "\225\320\005", pageTLI=0x27328b4) at
> walsender.c:717
> #4  0x00000000005458c8 in ReadPageInternal (state=0x2732008,
> pageptr=23396352, reqLen=3912) at xlogreader.c:556
> #5  0x0000000000545068 in XLogReadRecord (state=0x2732008,
> RecPtr=23400240, errormsg=0x7fff7cf34cf8) at xlogreader.c:255
> #6  0x00000000007d09db in DecodingContextFindStartpoint
> (ctx=0x2731d48) at logical.c:432
> #7  0x00000000007e3a87 in CreateReplicationSlot (cmd=0x26edfa0) at
> walsender.c:796
> #8  0x00000000007e45ae in exec_replication_command
> (cmd_string=0x268b5f8 "CREATE_REPLICATION_SLOT \"sub_db2_test\"
> LOGICAL pgoutput") at walsender.c:1272
> #9  0x0000000000846783 in PostgresMain (argc=1, argv=0x2697f38,
> dbname=0x2697e68 "db1", username=0x2697e40 "masahiko") at
> postgres.c:4064
> #10 0x00000000007b34e0 in BackendRun (port=0x268f4b0) at postmaster.c:4310
> #11 0x00000000007b2c60 in BackendStartup (port=0x268f4b0) at postmaster.c:3982
> #12 0x00000000007af2e0 in ServerLoop () at postmaster.c:1722
> #13 0x00000000007ae9a4 in PostmasterMain (argc=5, argv=0x266c020) at
> postmaster.c:1330
> #14 0x00000000006f487a in main (argc=5, argv=0x266c020) at main.c:228
>
> I guess that the cause of SEGV is that reply_message is reset without
> initialize by initStringInfo. The reply_message and other buffers are
> initialized at beggning of WalSndLoop but resetStringInfo can be
> called by WalSndWaitForWal->ProcessRepliesIfAny without calling
> WalSndLoop when creating replication slot. To fix it, we can
> initialize reply_message in exec_replication_command like follows.

Yes. I pushed the patch which makes walsender always initialize
all three buffers.

Regards,

-- 
Fujii Masao



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fujii Masao
Дата:
Сообщение: Re: [HACKERS] Walsender crash
Следующее
От: Fujii Masao
Дата:
Сообщение: Re: [HACKERS] Resolved typo in a comment