Fwd: [GENERAL] wal_retrieve_retry_interval

Поиск
Список
Период
Сортировка
От Ludovic Vaugeois-Pepin
Тема Fwd: [GENERAL] wal_retrieve_retry_interval
Дата
Msg-id CAAJDx8NVFvdgucum_pEmmBYSc-togzx_E_d2nughMeTTV9zSAw@mail.gmail.com
обсуждение исходный текст
Ответ на [GENERAL] wal_retrieve_retry_interval  (Ludovic Vaugeois-Pepin <ludovicvp@gmail.com>)
Список pgsql-general

On Sun, May 28, 2017 at 6:54 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Sat, May 27, 2017 at 2:40 PM, Ludovic Vaugeois-Pepin
<ludovicvp@gmail.com> wrote:
> Say, with 9.6.2,  a hot_standby fails to connect to a replication slot:
>     FATAL:  could not start WAL streaming: ERROR:  replication slot "test3"
> does not exist
> or
>     FATAL:  could not connect to the primary server: FATAL:  the database
> system is starting up
>
> Is there a way to reduce the time it takes until the next attempt? I
> assumed, wrongly I think, that this would be wal_retrieve_retry_interval,
> but it seems that it won't make a difference. I tried setting it to 3s, but
> it seems to take 15s still. Here are two log samples:

Could you double-check your configuration? If I set
wal_retrieve_retry_interval to 1s on a standby, I am able to see a
connection attempt from a WAL receiver happening with this interval of
time in the case of repetitive failures.

It really is set at 3s on all servers (master and standbies) earlier in the "deployment" process at the same time "listen_addresses", "hot_standby", and others are set. This doesn't seem to happen every time I run tests. I increased logging to DEBUG1. This is what I got when the problem occurred again. Note that the empty line is there in the log file.

< 2017-05-28 09:29:36.127 CEST > LOG:  database system was shut down in recovery at 2017-05-28 09:29:03 CEST
< 2017-05-28 09:29:36.127 CEST > LOG:  entering standby mode
< 2017-05-28 09:29:36.127 CEST > DEBUG:  checkpoint record is at 0/80002B8
< 2017-05-28 09:29:36.128 CEST > DEBUG:  redo record is at 0/80002B8; shutdown TRUE
< 2017-05-28 09:29:36.128 CEST > DEBUG:  next transaction ID: 0:2535; next OID: 18660
< 2017-05-28 09:29:36.128 CEST > DEBUG:  next MultiXactId: 1; next MultiXactOffset: 0
< 2017-05-28 09:29:36.128 CEST > DEBUG:  oldest unfrozen transaction ID: 1750, in database 1
< 2017-05-28 09:29:36.128 CEST > DEBUG:  oldest MultiXactId: 1, in database 1
< 2017-05-28 09:29:36.128 CEST > DEBUG:  commit timestamp Xid oldest/newest: 0/0
< 2017-05-28 09:29:36.128 CEST > DEBUG:  transaction ID wrap limit is 2147485397, limited by database with OID 1
< 2017-05-28 09:29:36.128 CEST > DEBUG:  MultiXactId wrap limit is 2147483648, limited by database with OID 1
< 2017-05-28 09:29:36.128 CEST > DEBUG:  starting up replication slots
< 2017-05-28 09:29:36.128 CEST > DEBUG:  resetting unlogged relations: cleanup 1 init 0
< 2017-05-28 09:29:36.129 CEST > DEBUG:  initializing for hot standby
< 2017-05-28 09:29:36.129 CEST > DEBUG:  recovery snapshots are now enabled
< 2017-05-28 09:29:36.129 CEST > LOG:  consistent recovery state reached at 0/8000328
< 2017-05-28 09:29:36.129 CEST > LOG:  invalid record length at 0/8000328: wanted 24, got 0
< 2017-05-28 09:29:36.129 CEST > LOG:  database system is ready to accept read only connections
< 2017-05-28 09:29:36.135 CEST > FATAL:  could not connect to the primary server: FATAL:  the database system is starting up

< 2017-05-28 09:29:36.135 CEST > DEBUG:  invalid record length at 0/8000328: wanted 24, got 0
< 2017-05-28 09:29:51.153 CEST > DEBUG:  invalid record length at 0/8000328: wanted 24, got 0
< 2017-05-28 09:29:51.158 CEST > LOG:  fetching timeline history file for timeline 4 from primary server
< 2017-05-28 09:29:51.160 CEST > LOG:  started streaming WAL from primary at 0/8000000 on timeline 3
< 2017-05-28 09:29:51.160 CEST > LOG:  replication terminated by primary server
< 2017-05-28 09:29:51.160 CEST > DETAIL:  End of WAL reached on timeline 3 at 0/8000328.
< 2017-05-28 09:29:51.161 CEST > DEBUG:  walreceiver ended streaming and awaits new instructions
< 2017-05-28 09:29:51.161 CEST > LOG:  new target timeline is 4
< 2017-05-28 09:29:51.161 CEST > DEBUG:  invalid record length at 0/8000328: wanted 24, got 0
< 2017-05-28 09:29:51.161 CEST > LOG:  restarted WAL streaming at 0/8000000 on timeline 4
< 2017-05-28 09:29:51.191 CEST > LOG:  redo starts at 0/8000328


 
--
Michael



--
Ludovic Vaugeois-Pepin



--
Ludovic Vaugeois-Pepin

В списке pgsql-general по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [GENERAL] wal_retrieve_retry_interval
Следующее
От: Neil Anderson
Дата:
Сообщение: [GENERAL] Help with terminology to describe what my software does please?