Re: PostgreSQL 10.5 : Logical replication timeout results in PANIC inpg_wal "No space left on device"

Поиск
Список
Период
Сортировка
От Achilleas Mantzios
Тема Re: PostgreSQL 10.5 : Logical replication timeout results in PANIC inpg_wal "No space left on device"
Дата
Msg-id 4bdf3996-31c0-d9ec-274c-225b78683f6f@matrix.gatewaynet.com
обсуждение исходный текст
Ответ на Re: PostgreSQL 10.5 : Logical replication timeout results in PANIC inpg_wal "No space left on device"  (Achilleas Mantzios <achill@matrix.gatewaynet.com>)
Список pgsql-admin
On 13/11/18 2:00 μ.μ., Achilleas Mantzios wrote:
> On 12/11/18 12:41 μ.μ., Achilleas Mantzios wrote:
>> Hello,
>> while setting up logical replication since August we had seen early on the need to increase max_receiver_timeout and
max_sender_timeoutfrom 60sec to 5mins, otherwise the synchronization would 
 
>> never take place.
>> This Sunday (yesterday) we had an incident caused by wal sender terminating (on Friday) after reaching timeout (5
mins).This left the replication slot retaining wals till our production primary 
 
>> server run out of space. (this is not connected with the wal fill up of the previous Sunday nor does it explain why
ithappened, still in the dark about this one).
 
>>
>> I got the following messages on the publisher (primary) host :
>>
>> 10.9.0.77(48650) [65253] 5be2ca1d.fee5 2018-11-09 15:06:11.052 EET data_for_testsmadb_pub repmgr@dynacom line:9 LOG:
terminatingwalsender process due to replication timeout
 
>> 10.9.0.77(48650) [65253] 5be2ca1d.fee5 2018-11-09 15:06:11.052 EET data_for_testsmadb_pub repmgr@dynacom line:10
CONTEXT: slot "data_for_testsmadb_pub", output plugin "pgoutput", in the change 
 
>> callback, associated LSN 13DF/393BF7F0
>> 10.9.0.77(48650) [65253] 5be2ca1d.fee5 2018-11-09 15:06:11.066 EET data_for_testsmadb_pub repmgr@dynacom line:11
LOG:disconnection: session time: 49:47:17.937 user=repmgr database=dynacom 
 
>> host=10.9.0.77 port=48650
>>
>> By querying pg_stat_subscription all 3 timestamps were about 5 mins prior to the above walsender termination.
>>
>> I didn't get *any* ERROR/FATAL message on the subscriber side. We have built a monitoring system that notifies us
aboutproblems, conflicts such as unique key violations, etc but we had no alerts 
 
>> coming from the subscriber logs. So this went unnoticed. Also the scripts about disk size growth fired but this was
alreadytoo late.
 
>
>
> Just a thought :
> could it be related to https://groups.google.com/a/2ndquadrant.com/forum/#!topic/bdr-list/YDG1_MuvfVM , i.e. :
https://commitfest.postgresql.org/14/1151/

Just saw the above patch was already committed to 10.2 , so it must be something else.

> ?
> Is there a way for the WAL receiver to not have detected the termination of the replication stream?
> Shouldn't normally the WAL receiver detect this and try again in wal_retrieve_retry_interval ?
>
> We checked with our cloud provider, they replied that there were no outages at all.
>
>


-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt



В списке pgsql-admin по дате отправления:

Предыдущее
От: Achilleas Mantzios
Дата:
Сообщение: Re: PostgreSQL 10.5 : Logical replication timeout results in PANIC inpg_wal "No space left on device"
Следующее
От: Rui DeSousa
Дата:
Сообщение: Re: PostgreSQL 10.5 : Logical replication timeout results in PANIC inpg_wal "No space left on device"