Re: PostgreSQL 10.5 : Logical replication timeout results in PANIC inpg_wal "No space left on device"

Поиск

Список

Период

Сортировка

От	Rui DeSousa
Тема	Re: PostgreSQL 10.5 : Logical replication timeout results in PANIC inpg_wal "No space left on device"
Дата	16 ноября 2018 г. 15:29:07
Msg-id	30938331-D245-4B50-AF3F-51D3EFB71A67@crazybean.net обсуждение исходный текст
Ответ на	Re: PostgreSQL 10.5 : Logical replication timeout results in PANIC inpg_wal "No space left on device" (Achilleas Mantzios <achill@matrix.gatewaynet.com>)
Ответы	Re: PostgreSQL 10.5 : Logical replication timeout results in PANIC inpg_wal "No space left on device"
Список	pgsql-admin

Дерево обсуждения

On Nov 16, 2018, at 3:18 AM, Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:

net.inet.tcp.always_keepalive=1

This setting is from FreeBSD. I have tested changing the settings on my PostgreSQL 11.1 on my FreeBSD 11.2-RELEASE-p3, and this would have no effect at all to the postgresql settings, they remained all three of them at zero. This is completely irrelevant with my problem but anyway.

That is what I stated; you don’t need it. It is that in Linux the application has to enable it and I don’t know of a kernel setting for Linux like the one in FreeBSD

A quick google and it looks like Linux defaults to not enabling keep alive whereas FreeBSD enables it by default and globally regardless of application request. For Linux, Postgres will need to request it. You will need to setup the keep alive parameters in the Postgres configuration and restart the server.

http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html
So according to the official Linux docs, three are the parameters that govern TCP keepalive in Linux, which in both the said systems are set as :
root@TEST-smadb:/var/lib/pgsql# sysctl -a | grep keep
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 7200
root@TEST-smadb:/var/lib/pgsql#

That does not mean the connection has TCP keep alive enabled; it just means that if an application requests it those would be the defaults setting if it doesn’t provide its own. Those setting would be too large anyway; you want to be able to detect a broken connection much quicker than 18 hours.

The keep alive setup will allow WAL receiver to detect the broken connection resulting in it terminating the current connection and attempt to establish a new connection.

So from looks of this, keep alive is enabled. (Also don't confuse WAL receiver with logical worker, different programs, albeit similar).

I don’t believe it’s enabled; have you check to see that you getting keep alive packets? If it was enabled it would have terminated after 18 hours.

Is there any way (by network means?) to mock this behavior in order to fool the replication worker like the sender is there?

Put a firewall in-between the servers and drop the packets without sending resets.

Have a read here:

Section 4.2

http://www.tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/

The RFC states TCP keep alive should be off by default; FreeBSD changed that back in 1999 and I believe Linux still follows the RFC:

https://serverfault.com/questions/671710/why-does-freebsd-net-inet-tcp-always-keepalive-violate-rfc1122#671749

В списке pgsql-admin по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: PostgreSQL 10.5 : Logical replication timeout results in PANIC inpg_wal "No space left on device"