Re: Logical replication monitoring

Поиск
Список
Период
Сортировка
От AYahorau@ibagroup.eu
Тема Re: Logical replication monitoring
Дата
Msg-id OF4FA86A2F.76D55182-ON4325834E.004D8E2E-4325834E.004FC51F@iba.by
обсуждение исходный текст
Ответ на Logical replication monitoring  (AYahorau@ibagroup.eu)
Ответы Re: Logical replication monitoring  (Achilleas Mantzios <achill@matrix.gatewaynet.com>)
Список pgsql-admin
Hello,
Thank you again for the suggestion. I configured tail_n_mail on my SLES12 machine as follows as everything was ok:

LOG_LINE_PREFIX: '%t %e '
EMAIL: someone@example.com
MAILSUBJECT: Acme HOST Postgres errors UNIQUE : NUMBER
INCLUDE: ERROR:  terminating logical replication worker due to timeout
INCLUDE: LOG:  worker process: logical replication worker for subscription [0-9]+ \({1}PID [0-9]+\){1} exited with exit code [0-9]+
FILE: /var/lib/pgsql/pg_log/LATEST

So I configured LOG_LINE_PREFIX  equal to log_prefix_line from postgresql.conf.
So tail_n_mail was able to catch such entries as this one:
2018-11-21 12:41:32 FET 00000 LOG:  worker process: logical replication worker for subscription 16386 (PID 31777) exited with exit code 1

But after I have updated my SLES12 to SP3 tail_n_mail stopped noticing these entry.
2018-11-21 12:41:32 +03 00000 LOG:  worker process: logical replication worker for subscription 16386 (PID 31777) exited with exit code 1
The reason is that  in different output of  timezone: FET was before and  +03 is now.


I have two questions in this regard.
I understand that currently tail_n_mail is not completely perfect detector. Is there any other robust, reliable way of logical replication monitoring?
Is it possible to configure any particular timestamp format  in postgresql  for its logs? As far as I know it can be only turned on or off depending on %t in log_prefix_line.

Thank you in advance,
Andrei Yahorau



From:        Andrei Yahorau/IBA
To:        pgsql-admin@postgresql.org,
Cc:        Mikalai Keida/IBA@IBA
Date:        24/08/2018 11:49
Subject:        Re: Logical replication monitoring



Hello,
Thank you for the suggestion.
I increased
wal_receiver_timeout , wal_sender_timeout parameters and now this error does not occur.

I installed tail_n_mail utility, made a simple config started in debug mode.
I am constantly facing the same error:


WARNING! Skipping non-existent file "/var/lib/pgsql/pg_log/postgresql.log-2018-08-23_154034"
Too many loops (20161): bailing:

The configuration file tail_n_mail.conf is quiet standart:
EMAIL: someone@example.com
PGLOG: log
MAILSUBJECT: Acme HOST Postgres errors UNIQUE : NUMBER
INCLUDE: ERROR:
INCLUDE: FATAL:
INCLUDE: PANIC:
FILE1: /var/lib/pgsql/pg_log/postgresql.log-%Y-%m-%d_%H%M%S
LASTFILE1: /var/lib/pgsql/pg_log/postgresql.log-2018-08-23_154034

Could you please say is there anything wrong in my configuration or script usage?

Thank you,
Andrei Yahorau




From:        Andrei Yahorau/IBA
To:        pgsql-admin@postgresql.org,
Cc:        Mikalai Keida/IBA@IBA
Date:        13/08/2018 13:16
Subject:        Re: Logical replication monitoring



Hello!

Thank you for your suggestion.
I  afraid this approach is not suitable for me. As a rule my postgresql log  on subscriber side contains a  bunch of the following entries:

ERROR:  terminating logical replication worker due to timeout

00000 LOG:  worker process: logical replication worker for subscription 24578 (PID 6217) exited with exit code 1

How should I handle this situation?
As I understand this is quite normal situation. But why is severity for it  an ERROR ?

I have another assumption. Could you correct me if I am wrong.
I found out in the source code that logical replication worker termination depends on wal_receiver_timeout paramer.
So I propose setting wal_receiver_timeout  to 0.
In this case I think that monitoring of the following views  pg_stat_subscription, pg_publication and pg_stat_replication  is enough.

In case if there  is some problem with connection or with replication pg_stat_replication  will show nothing because wal sender will not be working otherwise it will give some information.
Am I right? Are there any vulnerabilities in this approach ?


Best regards,
Andrei Yahorau




From:        Andrei Yahorau/IBA
To:        pgsql-admin@postgresql.org,
Cc:        Mikalai Keida/IBA@IBA
Date:        10/08/2018 13:05
Subject:        Logical replication monitoring




Hello PostgreSQL Community!

I configured logical replication for PostgreSQL 10.4 on 2 machines, set wal_level to logical, created a publication on master node and created a subscription on standby node according to the PostgreSQL documentation.
Could you please suggest an approach for replication state monitoring.

According to my experience the monitoring of pg_stat_subscription and pg_publication, pg_replication_slots unfortunately is not enough for this aim. Moreover standby database does not prohibit write operations by default and it can lead to some inconsistency between these databases.

For example a chain of queries as
SELECT pg_is_is_recovery(),
SELECT * FROM pg_stat_replication and
SELECT * FROM pg_stat_wal_receiver
provide insight into replication state for hot_standby replication.

So is there a reliable way of replication state monitoring for logical replication?

Best regards,
Andrei Yahorau

В списке pgsql-admin по дате отправления:

Предыдущее
От: Achilleas Mantzios
Дата:
Сообщение: Re: PostgreSQL 10.5 : Logical replication timeout results in PANIC inpg_wal "No space left on device"
Следующее
От: "Lentes, Bernd"
Дата:
Сообщение: User Authentication: LDAP and "local" accounts concurrently ?