Re: PostgreSQL 10.5 : Strange pg_wal fill-up, solved with theshutdown checkpoint

Поиск
Список
Период
Сортировка
От Achilleas Mantzios
Тема Re: PostgreSQL 10.5 : Strange pg_wal fill-up, solved with theshutdown checkpoint
Дата
Msg-id 6031b670-52d1-b823-6b41-35d2fa1873c6@matrix.gatewaynet.com
обсуждение исходный текст
Ответ на Re: PostgreSQL 10.5 : Strange pg_wal fill-up, solved with theshutdown checkpoint  (Rui DeSousa <rui@crazybean.net>)
Ответы Re: PostgreSQL 10.5 : Strange pg_wal fill-up, solved with the shutdown checkpoint  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-admin
On 5/11/18 6:48 μ.μ., Rui DeSousa wrote:

On Nov 5, 2018, at 6:24 AM, Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:


Our current settings are :

wal_keep_segments = 512
max_wal_size = 2GB
min_wal_size = 1GB

Our setup is as follows :
The settings seem counterintuitive; 
This email was not sent in order to discuss WAL GUC settings but rather the actual problem in hand.
We are using postgresql since 2001 and this is the first time we faced such a (rather serious) issue.
if you’re using standard 16MB WAL files then keep parameter is at 8GB but max_wal_size is at 2GB — that seems counterproductive to me and would cause more checkpoints than needed.
wal_keep_segments cover the case where one has to provide some safety (by keeping at least this num of wals) for replication clients when replication slots are not in use.
checkpoint_timeout / max_wal_size control checkpoints. All of them + other conditions are used in the algorithm which decides how many files to keep in pg_wal. That's what I am trying to figure out here.

How often are your checkpoints occurring and why, time or log? What’s your checkpoint_timeout set to? 


primary (smadb) <--> (no replication slot) physical hot stanbdby (smadb2) (managed via repmgr) <--> (replication slot) barman               ^--> (replication slot) logical subscriber (testsmadb)               ^--> wal archiving to host (sma) (via /usr/bin/rsync -a --delay-updates %p sma:/smadb/pgsql/pitr/%f )

Did you check the status of both the replication slots and archiving status? 

No ERRORs indication anything with the archive command in the logs,

Postgres is not going log an error if archive command fails; I believe that is up to the your archive command to log the error. 
No, PostgreSQL will complain. Normally (in 10) you get something like : LOG:  archive command failed with exit code .....
In previous versions the LOG level were even more severe IIRC.
I would suspect it might have been your archive command.  Could you verify that you have all the WAL files? I’ve seen a case in a 9.2 environment where the startup removed files that were not yet archived thus losing WAL files and breaking the backup.  

It would be great if you can double check to see if have all the WAL files (no gaps) and report back.
Absolute continuity.

Remember : postgresql checkpointer decided to remove 5000+ files before shutdown. If any conditions were keeping those files afloat should also hold at this point, right.
The question is why didn't Postgresql removed them earlier.





-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt

В списке pgsql-admin по дате отправления:

Предыдущее
От: Shreeyansh Dba
Дата:
Сообщение: Re: hot standby cascading replication
Следующее
От: Laurenz Albe
Дата:
Сообщение: Re: PostgreSQL 10.5 : Strange pg_wal fill-up, solved with theshutdown checkpoint