Re: [GENERAL] wal_retrieve_retry_interval

Поиск
Список
Период
Сортировка
От Ludovic Vaugeois-Pepin
Тема Re: [GENERAL] wal_retrieve_retry_interval
Дата
Msg-id CAAJDx8Oi0tD+sriCf-7tTorOvRy0syPoEke34phXLfFLFMeNHg@mail.gmail.com
обсуждение исходный текст
Ответ на [GENERAL] wal_retrieve_retry_interval  (Ludovic Vaugeois-Pepin <ludovicvp@gmail.com>)
Ответы Re: [GENERAL] wal_retrieve_retry_interval  (Michael Paquier <michael.paquier@gmail.com>)
Список pgsql-general
On Sun, May 28, 2017 at 9:49 AM, Ludovic Vaugeois-Pepin
<ludovicvp@gmail.com> wrote:
>
>
> On Sun, May 28, 2017 at 6:54 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
>>
>> On Sat, May 27, 2017 at 2:40 PM, Ludovic Vaugeois-Pepin
>> <ludovicvp@gmail.com> wrote:
>> > Say, with 9.6.2,  a hot_standby fails to connect to a replication slot:
>> >     FATAL:  could not start WAL streaming: ERROR:  replication slot "test3"
>> > does not exist
>> > or
>> >     FATAL:  could not connect to the primary server: FATAL:  the database
>> > system is starting up
>> >
>> > Is there a way to reduce the time it takes until the next attempt? I
>> > assumed, wrongly I think, that this would be wal_retrieve_retry_interval,
>> > but it seems that it won't make a difference. I tried setting it to 3s, but
>> > it seems to take 15s still. Here are two log samples:
>>
>> Could you double-check your configuration? If I set
>> wal_retrieve_retry_interval to 1s on a standby, I am able to see a
>> connection attempt from a WAL receiver happening with this interval of
>> time in the case of repetitive failures.
>
>
> It really is set at 3s on all servers (master and standbies) earlier in the "deployment" process at the same time
"listen_addresses","hot_standby", and others are set. This doesn't seem to happen every time I run tests. I increased
loggingto DEBUG1. This is what I got when the problem occurred again. Note that the empty line is there in the log
file.
>
> < 2017-05-28 09:29:36.127 CEST > LOG:  database system was shut down in recovery at 2017-05-28 09:29:03 CEST
> < 2017-05-28 09:29:36.127 CEST > LOG:  entering standby mode
> < 2017-05-28 09:29:36.127 CEST > DEBUG:  checkpoint record is at 0/80002B8
> < 2017-05-28 09:29:36.128 CEST > DEBUG:  redo record is at 0/80002B8; shutdown TRUE
> < 2017-05-28 09:29:36.128 CEST > DEBUG:  next transaction ID: 0:2535; next OID: 18660
> < 2017-05-28 09:29:36.128 CEST > DEBUG:  next MultiXactId: 1; next MultiXactOffset: 0
> < 2017-05-28 09:29:36.128 CEST > DEBUG:  oldest unfrozen transaction ID: 1750, in database 1
> < 2017-05-28 09:29:36.128 CEST > DEBUG:  oldest MultiXactId: 1, in database 1
> < 2017-05-28 09:29:36.128 CEST > DEBUG:  commit timestamp Xid oldest/newest: 0/0
> < 2017-05-28 09:29:36.128 CEST > DEBUG:  transaction ID wrap limit is 2147485397, limited by database with OID 1
> < 2017-05-28 09:29:36.128 CEST > DEBUG:  MultiXactId wrap limit is 2147483648, limited by database with OID 1
> < 2017-05-28 09:29:36.128 CEST > DEBUG:  starting up replication slots
> < 2017-05-28 09:29:36.128 CEST > DEBUG:  resetting unlogged relations: cleanup 1 init 0
> < 2017-05-28 09:29:36.129 CEST > DEBUG:  initializing for hot standby
> < 2017-05-28 09:29:36.129 CEST > DEBUG:  recovery snapshots are now enabled
> < 2017-05-28 09:29:36.129 CEST > LOG:  consistent recovery state reached at 0/8000328
> < 2017-05-28 09:29:36.129 CEST > LOG:  invalid record length at 0/8000328: wanted 24, got 0
> < 2017-05-28 09:29:36.129 CEST > LOG:  database system is ready to accept read only connections
> < 2017-05-28 09:29:36.135 CEST > FATAL:  could not connect to the primary server: FATAL:  the database system is
startingup 
>
> < 2017-05-28 09:29:36.135 CEST > DEBUG:  invalid record length at 0/8000328: wanted 24, got 0
> < 2017-05-28 09:29:51.153 CEST > DEBUG:  invalid record length at 0/8000328: wanted 24, got 0
> < 2017-05-28 09:29:51.158 CEST > LOG:  fetching timeline history file for timeline 4 from primary server
> < 2017-05-28 09:29:51.160 CEST > LOG:  started streaming WAL from primary at 0/8000000 on timeline 3
> < 2017-05-28 09:29:51.160 CEST > LOG:  replication terminated by primary server
> < 2017-05-28 09:29:51.160 CEST > DETAIL:  End of WAL reached on timeline 3 at 0/8000328.
> < 2017-05-28 09:29:51.161 CEST > DEBUG:  walreceiver ended streaming and awaits new instructions
> < 2017-05-28 09:29:51.161 CEST > LOG:  new target timeline is 4
> < 2017-05-28 09:29:51.161 CEST > DEBUG:  invalid record length at 0/8000328: wanted 24, got 0
> < 2017-05-28 09:29:51.161 CEST > LOG:  restarted WAL streaming at 0/8000000 on timeline 4
> < 2017-05-28 09:29:51.191 CEST > LOG:  redo starts at 0/8000328


And this is the log with log_min_messages DEBUG5:


< 2017-05-28 14:48:10.108 CEST > LOG:  invalid record length at
0/8000398: wanted 24, got 0
< 2017-05-28 14:48:10.108 CEST > DEBUG:  switched WAL source from
archive to stream after failure
< 2017-05-28 14:48:10.110 CEST > DEBUG:  checkpointer updated shared
memory configuration values
< 2017-05-28 14:48:10.110 CEST > LOG:  database system is ready to
accept read only connections
< 2017-05-28 14:48:10.112 CEST > DEBUG:  find_in_dynamic_libpath:
trying "/usr/pgsql-9.6/lib/libpqwalreceiver"
< 2017-05-28 14:48:10.112 CEST > DEBUG:  find_in_dynamic_libpath:
trying "/usr/pgsql-9.6/lib/libpqwalreceiver.so"
< 2017-05-28 14:48:10.126 CEST > FATAL:  could not connect to the
primary server: FATAL:  the database system is starting up

< 2017-05-28 14:48:10.126 CEST > DEBUG:  shmem_exit(1): 1
before_shmem_exit callbacks to make
< 2017-05-28 14:48:10.126 CEST > DEBUG:  shmem_exit(1): 4
on_shmem_exit callbacks to make
< 2017-05-28 14:48:10.127 CEST > DEBUG:  proc_exit(1): 2 callbacks to make
< 2017-05-28 14:48:10.127 CEST > DEBUG:  exit(1)
< 2017-05-28 14:48:10.127 CEST > DEBUG:  shmem_exit(-1): 0
before_shmem_exit callbacks to make
< 2017-05-28 14:48:10.127 CEST > DEBUG:  shmem_exit(-1): 0
on_shmem_exit callbacks to make
< 2017-05-28 14:48:10.127 CEST > DEBUG:  proc_exit(-1): 0 callbacks to make
< 2017-05-28 14:48:10.127 CEST > DEBUG:  switched WAL source from
stream to archive after failure
< 2017-05-28 14:48:10.127 CEST > DEBUG:  invalid record length at
0/8000398: wanted 24, got 0
< 2017-05-28 14:48:10.127 CEST > DEBUG:  switched WAL source from
archive to stream after failure
< 2017-05-28 14:48:10.130 CEST > DEBUG:  reaping dead processes
< 2017-05-28 14:48:11.062 CEST > DEBUG:  forked new backend, pid=3184 socket=12
< 2017-05-28 14:48:11.063 CEST > DEBUG:  postgres child[3184]: starting with (
< 2017-05-28 14:48:11.063 CEST > DEBUG:         postgres
< 2017-05-28 14:48:11.063 CEST > DEBUG:  )
< 2017-05-28 14:48:11.063 CEST > DEBUG:  InitPostgres
< 2017-05-28 14:48:11.063 CEST > DEBUG:  my backend ID is 2
< 2017-05-28 14:48:11.064 CEST > DEBUG:  StartTransaction
< 2017-05-28 14:48:11.064 CEST > DEBUG:  name: unnamed; blockState:
   DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1,
children:
< 2017-05-28 14:48:11.075 CEST > DEBUG:  CommitTransaction
< 2017-05-28 14:48:11.075 CEST > DEBUG:  name: unnamed; blockState:
   STARTED; state: INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1,
children:
< 2017-05-28 14:48:11.077 CEST > DEBUG:  shmem_exit(0): 1
before_shmem_exit callbacks to make
< 2017-05-28 14:48:11.077 CEST > DEBUG:  shmem_exit(0): 6
on_shmem_exit callbacks to make
< 2017-05-28 14:48:11.077 CEST > DEBUG:  proc_exit(0): 3 callbacks to make
< 2017-05-28 14:48:11.077 CEST > DEBUG:  exit(0)
< 2017-05-28 14:48:11.077 CEST > DEBUG:  shmem_exit(-1): 0
before_shmem_exit callbacks to make
< 2017-05-28 14:48:11.077 CEST > DEBUG:  shmem_exit(-1): 0
on_shmem_exit callbacks to make
< 2017-05-28 14:48:11.077 CEST > DEBUG:  proc_exit(-1): 0 callbacks to make
< 2017-05-28 14:48:11.081 CEST > DEBUG:  reaping dead processes
< 2017-05-28 14:48:11.081 CEST > DEBUG:  server process (PID 3184)
exited with exit code 0
< 2017-05-28 14:48:11.095 CEST > DEBUG:  forked new backend, pid=3186 socket=12
< 2017-05-28 14:48:11.096 CEST > DEBUG:  postgres child[3186]: starting with (
< 2017-05-28 14:48:11.096 CEST > DEBUG:         postgres
< 2017-05-28 14:48:11.096 CEST > DEBUG:  )

...

< 2017-05-28 14:48:24.985 CEST > DEBUG:  postgres child[3430]: starting with (
< 2017-05-28 14:48:24.985 CEST > DEBUG:         postgres
< 2017-05-28 14:48:24.985 CEST > DEBUG:  )
< 2017-05-28 14:48:24.985 CEST > DEBUG:  InitPostgres
< 2017-05-28 14:48:24.985 CEST > DEBUG:  my backend ID is 2
< 2017-05-28 14:48:24.986 CEST > DEBUG:  StartTransaction
< 2017-05-28 14:48:24.986 CEST > DEBUG:  name: unnamed; blockState:
   DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1,
children:
< 2017-05-28 14:48:24.989 CEST > DEBUG:  CommitTransaction
< 2017-05-28 14:48:24.989 CEST > DEBUG:  name: unnamed; blockState:
   STARTED; state: INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1,
children:
< 2017-05-28 14:48:24.989 CEST > DEBUG:  StartTransactionCommand
< 2017-05-28 14:48:24.989 CEST > DEBUG:  StartTransaction
< 2017-05-28 14:48:24.989 CEST > DEBUG:  name: unnamed; blockState:
   DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1,
children:
< 2017-05-28 14:48:24.992 CEST > DEBUG:  CommitTransactionCommand
< 2017-05-28 14:48:24.992 CEST > DEBUG:  CommitTransaction
< 2017-05-28 14:48:24.992 CEST > DEBUG:  name: unnamed; blockState:
   STARTED; state: INPROGR, xid/subid/cid: 0/1/0, nestlvl: 1,
children:
< 2017-05-28 14:48:24.993 CEST > DEBUG:  shmem_exit(0): 1
before_shmem_exit callbacks to make
< 2017-05-28 14:48:24.993 CEST > DEBUG:  shmem_exit(0): 6
on_shmem_exit callbacks to make
< 2017-05-28 14:48:24.994 CEST > DEBUG:  proc_exit(0): 3 callbacks to make
< 2017-05-28 14:48:24.994 CEST > DEBUG:  exit(0)
< 2017-05-28 14:48:24.994 CEST > DEBUG:  shmem_exit(-1): 0
before_shmem_exit callbacks to make
< 2017-05-28 14:48:24.994 CEST > DEBUG:  shmem_exit(-1): 0
on_shmem_exit callbacks to make
< 2017-05-28 14:48:24.994 CEST > DEBUG:  proc_exit(-1): 0 callbacks to make
< 2017-05-28 14:48:24.996 CEST > DEBUG:  reaping dead processes
< 2017-05-28 14:48:24.996 CEST > DEBUG:  server process (PID 3430)
exited with exit code 0
< 2017-05-28 14:48:25.144 CEST > DEBUG:  switched WAL source from
stream to archive after failure
< 2017-05-28 14:48:25.144 CEST > DEBUG:  invalid record length at
0/8000398: wanted 24, got 0
< 2017-05-28 14:48:25.144 CEST > DEBUG:  switched WAL source from
archive to stream after failure
< 2017-05-28 14:48:25.147 CEST > DEBUG:  find_in_dynamic_libpath:
trying "/usr/pgsql-9.6/lib/libpqwalreceiver"
< 2017-05-28 14:48:25.147 CEST > DEBUG:  find_in_dynamic_libpath:
trying "/usr/pgsql-9.6/lib/libpqwalreceiver.so"
< 2017-05-28 14:48:25.161 CEST > LOG:  fetching timeline history file
for timeline 4 from primary server
< 2017-05-28 14:48:25.166 CEST > LOG:  started streaming WAL from
primary at 0/8000000 on timeline 3
< 2017-05-28 14:48:25.166 CEST > DEBUG:  sendtime 2017-05-28
14:48:25.16509+02 receipttime 2017-05-28 14:48:25.166539+02
replication apply delay (N/A) transfer latency 1 ms
< 2017-05-28 14:48:25.166 CEST > LOG:  replication terminated by primary server
< 2017-05-28 14:48:25.166 CEST > DETAIL:  End of WAL reached on
timeline 3 at 0/8000398.
< 2017-05-28 14:48:25.166 CEST > DEBUG:  sending write 0/8000398 flush
0/8000398 apply 0/8000398
< 2017-05-28 14:48:25.167 CEST > DEBUG:  walreceiver ended streaming
and awaits new instructions
< 2017-05-28 14:48:25.168 CEST > LOG:  new target timeline is 4
< 2017-05-28 14:48:25.168 CEST > DEBUG:  switched WAL source from
stream to archive after failure
< 2017-05-28 14:48:25.168 CEST > DEBUG:  invalid record length at
0/8000398: wanted 24, got 0
< 2017-05-28 14:48:25.168 CEST > DEBUG:  switched WAL source from
archive to stream after failure
< 2017-05-28 14:48:25.169 CEST > LOG:  restarted WAL streaming at
0/8000000 on timeline 4
< 2017-05-28 14:48:25.170 CEST > DEBUG:  sendtime 2017-05-28
14:48:25.168651+02 receipttime 2017-05-28 14:48:25.170063+02
replication apply delay (N/A) transfer latency 1 ms
< 2017-05-28 14:48:25.170 CEST > DEBUG:  creating and filling new WAL file
< 2017-05-28 14:48:25.225 CEST > DEBUG:  done creating and filling new WAL file
< 2017-05-28 14:48:25.225 CEST > DEBUG:  sending write 0/80006C8 flush
0/8000398 apply 0/8000398
< 2017-05-28 14:48:25.226 CEST > DEBUG:  sending write 0/80006C8 flush
0/80006C8 apply 0/8000398
< 2017-05-28 14:48:25.227 CEST > DEBUG:  sending hot standby feedback
xmin 2535 epoch 0


В списке pgsql-general по дате отправления:

Предыдущее
От: Neil Anderson
Дата:
Сообщение: [GENERAL] Help with terminology to describe what my software does please?
Следующее
От: Adrian Klaver
Дата:
Сообщение: Re: [GENERAL] Help with terminology to describe what my software doesplease?