Обсуждение: Repeated requests for feedback in logical standby

Поиск
Список
Период
Сортировка

Repeated requests for feedback in logical standby

От
Magnus Hagander
Дата:
I've been running some tests, and come across a weird behavior on shutdown
with logical replication, 9.5.2. It seems that on shutdown, a lot of extra
requests for feedback are sent from the walsender.

On my laptop I get about 3-15 requests (different each time), and on a
server with a bit more power but still very low activity, the number is
simply huge.

Basically what happens is:

* Replication works fine
* I have *zero* activity on the master since it got started, not even a
single psql
* I issue a fast shutdown on the master
* Nothing happens at all for a while
* Once pg_recvlogical times out and sends a status message things wake up
* The server sends a whole bunch of requests for status, which
pg_recvlogical responds to
* Eventually things shut down (on the proper server, I've had to kill
things because it takes too long - I'm unsure if it ever shuts down)


Log output from pg_recvlogical:

$ ./pg_recvlogical -v -p 5495 -S trivialslot -o include_transaction=on -v
-d postgres --start -f - -P test_decoding 2>&1 | ts '[%Y-%m-%d %H:%M:%S]'
[2016-04-12 17:14:31] pg_recvlogical: starting log streaming at 0/0 (slot
trivialslot)
[2016-04-12 17:14:31] pg_recvlogical: streaming initiated
[2016-04-12 17:14:31] pg_recvlogical: confirming write up to 0/0, flush to
0/0 (slot trivialslot)
[2016-04-12 17:14:41] pg_recvlogical: confirming write up to 0/18F9DE88,
flush to 0/18F9DE88 (slot trivialslot)
[2016-04-12 17:14:51] pg_recvlogical: confirming write up to 0/18F9DEF8,
flush to 0/18F9DEF8 (slot trivialslot)
[2016-04-12 17:14:51] pg_recvlogical: confirming write up to 0/18F9DFA0,
flush to 0/18F9DFA0 (slot trivialslot)
[2016-04-12 17:14:51] pg_recvlogical: confirming write up to 0/18F9DFA0,
flush to 0/18F9DFA0 (slot trivialslot)
[2016-04-12 17:14:51] pg_recvlogical: confirming write up to 0/18F9DFA0,
flush to 0/18F9DFA0 (slot trivialslot)
[2016-04-12 17:14:51] pg_recvlogical: confirming write up to 0/18F9DFA0,
flush to 0/18F9DFA0 (slot trivialslot)
[2016-04-12 17:14:51] pg_recvlogical: confirming write up to 0/18F9DFA0,
flush to 0/18F9DFA0 (slot trivialslot)
[2016-04-12 17:14:51] pg_recvlogical: disconnected; waiting 5 seconds to
try again


In this example, the server has:
2016-04-12 17:14:45 CEST LOG:  received fast shutdown request
2016-04-12 17:14:45 CEST LOG:  aborting any active transactions
2016-04-12 17:14:45 CEST LOG:  autovacuum launcher shutting down
2016-04-12 17:14:45 CEST LOG:  shutting down
2016-04-12 17:14:45 CEST LOG:  database system is shut down


Note how it took a while (6 seconds) for pg_recvlogical to pick up that
anything was happening - looks like something might not have been flushed?

When running on the server where there's more data it was very clear though
- if I break the walsender with gdb, pg_recvlogical stops spitting out
debugging messages, and as soon as I "cont" in gdb it starts spitting them
out again, so there's a clear connection between the two. And it takes a
*lot* more than a few seconds.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Repeated requests for feedback in logical standby

От
Nick Cleaton
Дата:
On 12 April 2016 at 16:18, Magnus Hagander <magnus@hagander.net> wrote:
>
> I've been running some tests, and come across a weird behavior on shutdown with logical replication, 9.5.2. It seems
thaton shutdown, a lot of extra requests for feedback are sent from the walsender. 
>
> On my laptop I get about 3-15 requests (different each time), and on a server with a bit more power but still very
lowactivity, the number is simply huge. 
>
> Basically what happens is:
>
> * Replication works fine
> * I have *zero* activity on the master since it got started, not even a single psql
> * I issue a fast shutdown on the master
> * Nothing happens at all for a while
> * Once pg_recvlogical times out and sends a status message things wake up
> * The server sends a whole bunch of requests for status, which pg_recvlogical responds to
> * Eventually things shut down (on the proper server, I've had to kill things because it takes too long - I'm unsure
ifit ever shuts down) 


That sounds similar to something I've seen with wal-based replication:

http://www.postgresql.org/message-id/flat/CAFgz3kus=rC_avEgBV=+hRK5HYJ8vXskJRh8yEAbahJGTzF2VQ@mail.gmail.com#CAFgz3kus=rC_avEgBV=+hRK5HYJ8vXskJRh8yEAbahJGTzF2VQ@mail.gmail.com

Nick

Re: Repeated requests for feedback in logical standby

От
Magnus Hagander
Дата:
On Thu, Apr 14, 2016 at 7:34 AM, Nick Cleaton <nick@cleaton.net> wrote:

> On 12 April 2016 at 16:18, Magnus Hagander <magnus@hagander.net> wrote:
> >
> > I've been running some tests, and come across a weird behavior on
> shutdown with logical replication, 9.5.2. It seems that on shutdown, a lot
> of extra requests for feedback are sent from the walsender.
> >
> > On my laptop I get about 3-15 requests (different each time), and on a
> server with a bit more power but still very low activity, the number is
> simply huge.
> >
> > Basically what happens is:
> >
> > * Replication works fine
> > * I have *zero* activity on the master since it got started, not even a
> single psql
> > * I issue a fast shutdown on the master
> > * Nothing happens at all for a while
> > * Once pg_recvlogical times out and sends a status message things wake up
> > * The server sends a whole bunch of requests for status, which
> pg_recvlogical responds to
> > * Eventually things shut down (on the proper server, I've had to kill
> things because it takes too long - I'm unsure if it ever shuts down)
>
>
> That sounds similar to something I've seen with wal-based replication:
>
>
http://www.postgresql.org/message-id/flat/CAFgz3kus=rC_avEgBV=+hRK5HYJ8vXskJRh8yEAbahJGTzF2VQ@mail.gmail.com#CAFgz3kus=rC_avEgBV=+hRK5HYJ8vXskJRh8yEAbahJGTzF2VQ@mail.gmail.com


Agreed, that sounds very similar.

I don't know the code enough to say whether that's a proper fix though, or
if it might have other sideeffects.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Repeated requests for feedback in logical standby

От
Magnus Hagander
Дата:
On Thu, Apr 14, 2016 at 12:30 PM, Magnus Hagander <magnus@hagander.net>
wrote:

>
>
> On Thu, Apr 14, 2016 at 7:34 AM, Nick Cleaton <nick@cleaton.net> wrote:
>
>> On 12 April 2016 at 16:18, Magnus Hagander <magnus@hagander.net> wrote:
>> >
>> > I've been running some tests, and come across a weird behavior on
>> shutdown with logical replication, 9.5.2. It seems that on shutdown, a lot
>> of extra requests for feedback are sent from the walsender.
>> >
>> > On my laptop I get about 3-15 requests (different each time), and on a
>> server with a bit more power but still very low activity, the number is
>> simply huge.
>> >
>> > Basically what happens is:
>> >
>> > * Replication works fine
>> > * I have *zero* activity on the master since it got started, not even a
>> single psql
>> > * I issue a fast shutdown on the master
>> > * Nothing happens at all for a while
>> > * Once pg_recvlogical times out and sends a status message things wake
>> up
>> > * The server sends a whole bunch of requests for status, which
>> pg_recvlogical responds to
>> > * Eventually things shut down (on the proper server, I've had to kill
>> things because it takes too long - I'm unsure if it ever shuts down)
>>
>>
>> That sounds similar to something I've seen with wal-based replication:
>>
>>
http://www.postgresql.org/message-id/flat/CAFgz3kus=rC_avEgBV=+hRK5HYJ8vXskJRh8yEAbahJGTzF2VQ@mail.gmail.com#CAFgz3kus=rC_avEgBV=+hRK5HYJ8vXskJRh8yEAbahJGTzF2VQ@mail.gmail.com
>
>
> Agreed, that sounds very similar.
>
> I don't know the code enough to say whether that's a proper fix though, or
> if it might have other sideeffects.
>
>

I should add that yes, this patch does seem to remove the problem for me as
well. But I can't confirm if it breaks something else. Hopefully someone
who has worked more on that code can?

I also still see the problem that "pg_ctrl stop -m fast" on the master does
*nothing* until the client sends one message. I tried increasing the
feedback interval to 2 minutes, and that increased the shutdown time to 2
minutes as well. Which can't be right - clearly the shutdown is failing to
actually make the walsender send the request and shut down. This happens
both with and without the suggested patch.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Repeated requests for feedback in logical standby

От
Marko Tiikkaja
Дата:
On 14/04/16 12:51, Magnus Hagander wrote:
> I should add that yes, this patch does seem to remove the problem for me as
> well. But I can't confirm if it breaks something else.

Did you also see whether it fixes the problem on the bigger box which
seemed to not shut down at all?



.m

Re: Repeated requests for feedback in logical standby

От
Magnus Hagander
Дата:
On Thu, Apr 14, 2016 at 1:14 PM, Marko Tiikkaja <marko@joh.to> wrote:

> On 14/04/16 12:51, Magnus Hagander wrote:
>
>> I should add that yes, this patch does seem to remove the problem for me
>> as
>> well. But I can't confirm if it breaks something else.
>>
>
> Did you also see whether it fixes the problem on the bigger box which
> seemed to not shut down at all?


Not yet, but those symptoms sound even more like those of Nick. I have it
fixing two different installations that I have locally here though.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Repeated requests for feedback in logical standby

От
Nick Cleaton
Дата:
On 14 April 2016 at 11:30, Magnus Hagander <magnus@hagander.net> wrote:
>>
http://www.postgresql.org/message-id/flat/CAFgz3kus=rC_avEgBV=+hRK5HYJ8vXskJRh8yEAbahJGTzF2VQ@mail.gmail.com#CAFgz3kus=rC_avEgBV=+hRK5HYJ8vXskJRh8yEAbahJGTzF2VQ@mail.gmail.com
>
>
> Agreed, that sounds very similar.
>
> I don't know the code enough to say whether that's a proper fix though, or
> if it might have other sideeffects.

Certainly. My "this fixes it for me" was intended more as a piece of
diagnostic information than as a proposed change to the code.

Re: Repeated requests for feedback in logical standby

От
Michael Paquier
Дата:
On Thu, Apr 14, 2016 at 8:54 PM, Nick Cleaton <nick@cleaton.net> wrote:
> Certainly. My "this fixes it for me" was intended more as a piece of
> diagnostic information than as a proposed change to the code.

As I am just bumping on it. For the archive's sake, a fix has been
pushed as 5945501 (commit log references this thread as well).
--
Michael

Re: Repeated requests for feedback in logical standby

От
Magnus Hagander
Дата:
On Thu, Apr 14, 2016 at 1:21 PM, Magnus Hagander <magnus@hagander.net>
wrote:

>
>
> On Thu, Apr 14, 2016 at 1:14 PM, Marko Tiikkaja <marko@joh.to> wrote:
>
>> On 14/04/16 12:51, Magnus Hagander wrote:
>>
>>> I should add that yes, this patch does seem to remove the problem for me
>>> as
>>> well. But I can't confirm if it breaks something else.
>>>
>>
>> Did you also see whether it fixes the problem on the bigger box which
>> seemed to not shut down at all?
>
>
> Not yet, but those symptoms sound even more like those of Nick. I have it
> fixing two different installations that I have locally here though.
>
>
Finally got around to test this properly. And no, it does not work - 9.5.3
with logical replication. I captured a couple of stack traces, and they all
look like this:

#0  0x00007f8fa933e6b0 in __read_nocancel () at
../sysdeps/unix/syscall-template.S:81
#1  0x00005641a49aded3 in read (__nbytes=8192, __buf=0x5641a53d9090,
__fd=6) at /usr/include/x86_64-linux-gnu/bits/unistd.h:44
#2  XLogRead (buf=buf@entry=0x5641a53d9090 "\207\320\002",
startptr=startptr@entry=1056964608, count=count@entry=8192)
    at
/build/postgresql-9.5-h9JXst/postgresql-9.5-9.5.3/build/../src/backend/replication/walsender.c:2124
#3  0x00005641a49af0db in logical_read_xlog_page (state=<optimized out>,
targetPagePtr=1056964608, reqLen=<optimized out>, targetRecPtr=<optimized
out>,
    cur_page=0x5641a53d9090 "\207\320\002", pageTLI=<optimized out>)
    at
/build/postgresql-9.5-h9JXst/postgresql-9.5-9.5.3/build/../src/backend/replication/walsender.c:772
#4  0x00005641a480f4d2 in ReadPageInternal (state=state@entry=0x5641a53d6b08,
pageptr=pageptr@entry=1062813696, reqLen=reqLen@entry=360)
    at
/build/postgresql-9.5-h9JXst/postgresql-9.5-9.5.3/build/../src/backend/access/transam/xlogreader.c:524
#5  0x00005641a480fed1 in XLogReadRecord (state=0x5641a53d6b08,
RecPtr=1062814032, errormsg=errormsg@entry=0x7ffe3bd8ee48)
    at
/build/postgresql-9.5-h9JXst/postgresql-9.5-9.5.3/build/../src/backend/access/transam/xlogreader.c:243
#6  0x00005641a49ae17b in XLogSendLogical () at
/build/postgresql-9.5-h9JXst/postgresql-9.5-9.5.3/build/../src/backend/replication/walsender.c:2437
#7  0x00005641a49afc63 in WalSndLoop (send_data=send_data@entry=0x5641a49ae150
<XLogSendLogical>)
    at
/build/postgresql-9.5-h9JXst/postgresql-9.5-9.5.3/build/../src/backend/replication/walsender.c:1852
#8  0x00005641a49b0aa2 in StartLogicalReplication (cmd=<optimized out>)
    at
/build/postgresql-9.5-h9JXst/postgresql-9.5-9.5.3/build/../src/backend/replication/walsender.c:1015
#9  exec_replication_command (cmd_string=cmd_string@entry=0x5641a53a3c50
"START_REPLICATION SLOT \"nineone\" LOGICAL 0/3F5244F8
(\"include_transaction\" 'on')")
    at
/build/postgresql-9.5-h9JXst/postgresql-9.5-9.5.3/build/../src/backend/replication/walsender.c:1344
#10 0x00005641a49ed8b4 in PostgresMain (argc=<optimized out>,
argv=argv@entry=0x5641a5335118, dbname=0x5641a5335030 "db",
username=<optimized out>)
    at
/build/postgresql-9.5-h9JXst/postgresql-9.5-9.5.3/build/../src/backend/tcop/postgres.c:4028
#11 0x00005641a47925b8 in BackendRun (port=0x5641a5354bb0) at
/build/postgresql-9.5-h9JXst/postgresql-9.5-9.5.3/build/../src/backend/postmaster/postmaster.c:4239
#12 BackendStartup (port=0x5641a5354bb0) at
/build/postgresql-9.5-h9JXst/postgresql-9.5-9.5.3/build/../src/backend/postmaster/postmaster.c:3913
#13 ServerLoop () at
/build/postgresql-9.5-h9JXst/postgresql-9.5-9.5.3/build/../src/backend/postmaster/postmaster.c:1684
#14 0x00005641a4993e7b in PostmasterMain (argc=5, argv=<optimized out>)
    at
/build/postgresql-9.5-h9JXst/postgresql-9.5-9.5.3/build/../src/backend/postmaster/postmaster.c:1292
#15 0x00005641a47932b2 in main (argc=5, argv=0x5641a53341f0) at
/build/postgresql-9.5-h9JXst/postgresql-9.5-9.5.3/build/../src/backend/main/main.c:228


They all come out at approximately the same place there.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/