Обсуждение: archive command doesnt work

Поиск

Список

Период

Сортировка

archive command doesnt work

От

Alpaslan AKDAĞ

Дата:

07 февраля 2024 г., 11:07:50

Hello

We have a primary, a hot standby and 2 warm standby servers with 2 days delay.

After switchover since 01.02.2024 hot standby server does not archive wal files. I couldn't find the problem.

Hot Standby Server infos:

Redhat Enterprise Linux 9.2

postgreSQL v13.10

related postgresql.conf lines:
archive_mode = on
archive_command = 'cp %p /archive/archived_wal/%f && cp %p /opt/postgres/backup/archived_wal/%f'
archive_timeout = 300

wal_keep_size = 80
wal_level = 'replica'
wal_log_hints = on

max_wal_size = 10GB

min_wal_size = 80MB

archived WAL files per day in /archive/archived_wal/ folder. All of them are before the switchover.

Number Day

421 2024-01-19
672 2024-01-20
1374 2024-01-21
564 2024-01-22
569 2024-01-23
607 2024-01-24
510 2024-01-25
612 2024-01-26
495 2024-01-27
1497 2024-01-28
556 2024-01-29
506 2024-01-30
549 2024-01-31
228 2024-02-01

and in log files there are lines like below but no error or failure lines:

2024-02-02 02:07:01.978 CET [979081] LOG: restartpoint complete: wrote 26904 buffers (2.6%); 0 WAL file(s) added, 0 removed, 0 recycled; write=445.
874 s, sync=0.005 s, total=445.887 s; sync files=88, longest=0.001 s, average=0.001 s; distance=732035 kB, estimate=732035 kB
2024-02-02 02:07:01.978 CET [979081] LOG: recovery restart point at 252/2D7E5E68
2024-02-02 02:07:01.978 CET [979081] DETAIL: Last completed transaction was at log time 2024-02-02 02:07:01.889865+01.
--
2024-02-02 02:07:02.024 CET [979081] LOG: restartpoint complete: wrote 310 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.037
s, sync=0.004 s, total=0.046 s; sync files=54, longest=0.002 s, average=0.001 s; distance=90216 kB, estimate=667853 kB
2024-02-02 02:07:02.024 CET [979081] LOG: recovery restart point at 252/33000028
2024-02-02 02:07:02.024 CET [979081] DETAIL: Last completed transaction was at log time 2024-02-02 02:07:01.889865+01.

postgresql-2024-02-03_000000.log:2024-02-03 09:03:20.383 CET [979081] LOG: restartpoint complete: wrote 102950 buffers (9.8%); 0 WAL file(s) added,
1286 removed, 117 recycled; write=1079.221 s, sync=0.008 s, total=1082.510 s; sync files=146, longest=0.001 s, average=0.001 s; distance=1143627 kB,
estimate=1143627 kB
postgresql-2024-02-03_000000.log-2024-02-03 09:03:20.383 CET [979081] LOG: recovery restart point at 255/43D56D80
postgresql-2024-02-03_000000.log-2024-02-03 09:03:20.383 CET [979081] DETAIL: Last completed transaction was at log time 2024-02-03 09:03:20.311619
+01.

--
postgresql-2024-02-03_000000.log:2024-02-03 17:47:29.480 CET [979081] LOG: restartpoint complete: wrote 1305 buffers (0.1%); 0 WAL file(s) added, 1
removed, 3 recycled; write=131.069 s, sync=0.005 s, total=131.088 s; sync files=104, longest=0.001 s, average=0.001 s; distance=65368 kB, estimate=13
9744 kB
postgresql-2024-02-03_000000.log-2024-02-03 17:47:29.480 CET [979081] LOG: recovery restart point at 255/E40CE260
postgresql-2024-02-03_000000.log-2024-02-03 17:47:29.480 CET [979081] DETAIL: Last completed transaction was at log time 2024-02-03 17:47:28.055165
+01.

--
postgresql-2024-02-05_000000.log:2024-02-05 06:25:50.272 CET [979081] LOG: restartpoint complete: wrote 477 buffers (0.0%); 0 WAL file(s) added, 0
removed, 4 recycled; write=47.946 s, sync=0.005 s, total=47.964 s; sync files=85, longest=0.001 s, average=0.001 s; distance=65484 kB, estimate=413488
kB
postgresql-2024-02-05_000000.log-2024-02-05 06:25:50.272 CET [979081] LOG: recovery restart point at 25C/74083E58
postgresql-2024-02-05_000000.log-2024-02-05 06:25:50.272 CET [979081] DETAIL: Last completed transaction was at log time 2024-02-05 06:25:50.223799
+01.

and last thing that i want to share

select pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp() \gx
-[ RECORD 1 ]-----------------+------------------------------
pg_last_wal_receive_lsn | 261/C10BB318
pg_last_wal_replay_lsn | 261/C10BB318
pg_last_xact_replay_timestamp | 2024-02-07 11:40:57.536997+01

Everything seems fine except archiving.

for now i share these informations but if you need more, i can give.

Thanks in advance.

Best Regards

Alpaslan

Re: archive command doesnt work

От

Laurenz Albe

Дата:

07 февраля 2024 г., 20:06:49

On Wed, 2024-02-07 at 12:07 +0100, Alpaslan AKDAĞ wrote:
> We have a primary, a hot standby and 2 warm standby servers with 2 days delay.
>
> After switchover since 01.02.2024 hot standby server does not archive wal files. I couldn't find the problem.
>
> related postgresql.conf lines:
> archive_mode = on
> archive_command = 'cp %p /archive/archived_wal/%f && cp %p /opt/postgres/backup/archived_wal/%f'
>
> and in log files there are lines like below but no error or failure lines:
>
> postgresql-2024-02-05_000000.log:2024-02-05 06:25:50.272 CET [979081]   LOG:  restartpoint complete: wrote 477
buffers(0.0%); 0 WAL file(s) added, 0  
> removed, 4 recycled; write=47.946 s, sync=0.005 s, total=47.964 s; sync files=85, longest=0.001 s, average=0.001 s;
distance=65484kB, estimate=413488 
>  kB
> postgresql-2024-02-05_000000.log-2024-02-05 06:25:50.272 CET [979081]   LOG:  recovery restart point at 25C/74083E58
> postgresql-2024-02-05_000000.log-2024-02-05 06:25:50.272 CET [979081]   DETAIL:  Last completed transaction was at
logtime 2024-02-05 06:25:50.223799 
> +01.

That looks like the standby wasn't promoted and is still in recovery,
so it won't generate WAL.

Restartpoints are only written on servers in recovery mode.  Moreover,
a promote request would generate a log entry.

Another option would be that the promote request is further up in the log,
and the server takes a long time to replay all the changes
(if the "2 days delay" you mention are set in "recovery_min_apply_delay").

What do you get for

  SELECT pg_is_in_recovery();

Yours,
Laurenz Albe

Re: archive command doesnt work

От

Alpaslan AKDAĞ

Дата:

08 февраля 2024 г., 08:23:02

Hello Laurenz,

Thank you for your answer.

I have checked the pg_is_in_recovery() and here are the results from primary and hot stand by server.

Primary (old standby ):

select pg_is_in_recovery();
pg_is_in_recovery
-------------------
f
(1 row)

hot standby(old primary):

select pg_is_in_recovery();
pg_is_in_recovery
-------------------
t
(1 row)

and there is also standby.signal file in standby server.

So it seems that there is nothing wrong.

recovery_min_apply_delay = '2d' are set in warm standby servers. Before the switchover we had the same settings but we did not have this problem.

How can I fix it?

Best regards,

Alpaslan

On Wed, Feb 7, 2024 at 9:06 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Wed, 2024-02-07 at 12:07 +0100, Alpaslan AKDAĞ wrote:
> We have a primary, a hot standby and 2 warm standby servers with 2 days delay.
>
> After switchover since 01.02.2024 hot standby server does not archive wal files. I couldn't find the problem.
>
> related postgresql.conf lines:
> archive_mode = on
> archive_command = 'cp %p /archive/archived_wal/%f && cp %p /opt/postgres/backup/archived_wal/%f'
>
> and in log files there are lines like below but no error or failure lines:
>
> postgresql-2024-02-05_000000.log:2024-02-05 06:25:50.272 CET [979081] LOG: restartpoint complete: wrote 477 buffers (0.0%); 0 WAL file(s) added, 0
> removed, 4 recycled; write=47.946 s, sync=0.005 s, total=47.964 s; sync files=85, longest=0.001 s, average=0.001 s; distance=65484 kB, estimate=413488
> kB
> postgresql-2024-02-05_000000.log-2024-02-05 06:25:50.272 CET [979081] LOG: recovery restart point at 25C/74083E58
> postgresql-2024-02-05_000000.log-2024-02-05 06:25:50.272 CET [979081] DETAIL: Last completed transaction was at log time 2024-02-05 06:25:50.223799
> +01.

That looks like the standby wasn't promoted and is still in recovery,
so it won't generate WAL.

Restartpoints are only written on servers in recovery mode. Moreover,
a promote request would generate a log entry.

Another option would be that the promote request is further up in the log,
and the server takes a long time to replay all the changes
(if the "2 days delay" you mention are set in "recovery_min_apply_delay").

What do you get for

SELECT pg_is_in_recovery();

Yours,
Laurenz Albe

Re: archive command doesnt work

От

Laurenz Albe

Дата:

08 февраля 2024 г., 20:10:26

On Thu, 2024-02-08 at 09:23 +0100, Alpaslan AKDAĞ wrote:
> Thank you for your answer.
>
> I have checked the pg_is_in_recovery() and here are the results
> from primary and hot stand by server.
>
> Primary (old standby ):
> select pg_is_in_recovery();
>  pg_is_in_recovery
> -------------------
>  f
> (1 row)
>
> hot standby(old primary):
> select pg_is_in_recovery();
>  pg_is_in_recovery
> -------------------
>  t
> (1 row)
> and there is also standby.signal file in standby server.
> So it seems that there is nothing wrong.
>
> recovery_min_apply_delay = '2d' are set in warm standby servers.
> Before the switchover we had the same settings but we did not have this problem.

It is becoming increasingly obvious that you never actually performed
a switchover: it seems that you didn't promote the standby.

Either use

  pg_ctl promote -D /path/to/datadir

on the command line or

  SELECT pg_promote();

in SQL.

Yours,
Laurenz Albe

Re: archive command doesnt work

От

Mateusz Henicz

Дата:

08 февраля 2024 г., 20:24:36

czw., 8 lut 2024 o 21:10 Laurenz Albe <laurenz.albe@cybertec.at> napisał(a):

On Thu, 2024-02-08 at 09:23 +0100, Alpaslan AKDAĞ wrote:
> Thank you for your answer.
>
> I have checked the pg_is_in_recovery() and here are the results
> from primary and hot stand by server.
>
> Primary (old standby ):
> select pg_is_in_recovery();
> pg_is_in_recovery
> -------------------
> f
> (1 row)
>
> hot standby(old primary):
> select pg_is_in_recovery();
> pg_is_in_recovery
> -------------------
> t
> (1 row)
> and there is also standby.signal file in standby server.
> So it seems that there is nothing wrong.
>
> recovery_min_apply_delay = '2d' are set in warm standby servers.
> Before the switchover we had the same settings but we did not have this problem.

It is becoming increasingly obvious that you never actually performed
a switchover: it seems that you didn't promote the standby.

Either use

pg_ctl promote -D /path/to/datadir

on the command line or

SELECT pg_promote();

in SQL.

Yours,
Laurenz Albe

From what you sent:

"After switchover since 01.02.2024 hot standby server does not archive wal files. I couldn't find the problem."

Hot standby normally does not archive WAL files. Only the primary server does that unless you have archive_mode set to "always" but your configuration shows that you have it "on" only.

"related postgresql.conf lines:

archive_mode = on "

Cheers,
Mateusz

Re: archive command doesnt work

От

Alpaslan AKDAĞ

Дата:

08 февраля 2024 г., 20:28:03

Now I am confused. In Primary pg_is_in_recovery() should be false and in standby node should be true. Am I wrong?

Thanks and regards

Alpaslan

On Thu, Feb 8, 2024 at 9:10 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Thu, 2024-02-08 at 09:23 +0100, Alpaslan AKDAĞ wrote:
> Thank you for your answer.
>
> I have checked the pg_is_in_recovery() and here are the results
> from primary and hot stand by server.
>
> Primary (old standby ):
> select pg_is_in_recovery();
> pg_is_in_recovery
> -------------------
> f
> (1 row)
>
> hot standby(old primary):
> select pg_is_in_recovery();
> pg_is_in_recovery
> -------------------
> t
> (1 row)
> and there is also standby.signal file in standby server.
> So it seems that there is nothing wrong.
>
> recovery_min_apply_delay = '2d' are set in warm standby servers.
> Before the switchover we had the same settings but we did not have this problem.

It is becoming increasingly obvious that you never actually performed
a switchover: it seems that you didn't promote the standby.

Either use

pg_ctl promote -D /path/to/datadir

on the command line or

SELECT pg_promote();

in SQL.

Yours,
Laurenz Albe

Re: archive command doesnt work

От

Alpaslan AKDAĞ

Дата:

08 февраля 2024 г., 20:35:28

Hello Mateusz

Thank you for your answer.

If it is like that, there is not any problem. I tried to find an explanation that hot standby archive wal files or not but I couldn't.

Is this information from the Documentation?

Best Regards,

Alpaslan

On Thu, Feb 8, 2024 at 9:24 PM Mateusz Henicz <mateuszhenicz@gmail.com> wrote:

czw., 8 lut 2024 o 21:10 Laurenz Albe <laurenz.albe@cybertec.at> napisał(a):
On Thu, 2024-02-08 at 09:23 +0100, Alpaslan AKDAĞ wrote:
> Thank you for your answer.
>
> I have checked the pg_is_in_recovery() and here are the results
> from primary and hot stand by server.
>
> Primary (old standby ):
> select pg_is_in_recovery();
> pg_is_in_recovery
> -------------------
> f
> (1 row)
>
> hot standby(old primary):
> select pg_is_in_recovery();
> pg_is_in_recovery
> -------------------
> t
> (1 row)
> and there is also standby.signal file in standby server.
> So it seems that there is nothing wrong.
>
> recovery_min_apply_delay = '2d' are set in warm standby servers.
> Before the switchover we had the same settings but we did not have this problem.

It is becoming increasingly obvious that you never actually performed
a switchover: it seems that you didn't promote the standby.

Either use

pg_ctl promote -D /path/to/datadir

on the command line or

SELECT pg_promote();

in SQL.

Yours,
Laurenz Albe

From what you sent:

"After switchover since 01.02.2024 hot standby server does not archive wal files. I couldn't find the problem."
Hot standby normally does not archive WAL files. Only the primary server does that unless you have archive_mode set to "always" but your configuration shows that you have it "on" only.

"related postgresql.conf lines:
archive_mode = on "

Cheers,
Mateusz

Re: archive command doesnt work

От

Mateusz Henicz

Дата:

08 февраля 2024 г., 20:38:55

Hey,
Yes, it is.

https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-ARCHIVE-MODE

czw., 8 lut 2024 o 21:35 Alpaslan AKDAĞ <alpaslanakdag@gmail.com> napisał(a):

Hello Mateusz

Thank you for your answer.

If it is like that, there is not any problem. I tried to find an explanation that hot standby archive wal files or not but I couldn't.
Is this information from the Documentation?

Best Regards,
Alpaslan

On Thu, Feb 8, 2024 at 9:24 PM Mateusz Henicz <mateuszhenicz@gmail.com> wrote:

czw., 8 lut 2024 o 21:10 Laurenz Albe <laurenz.albe@cybertec.at> napisał(a):
On Thu, 2024-02-08 at 09:23 +0100, Alpaslan AKDAĞ wrote:
> Thank you for your answer.
>
> I have checked the pg_is_in_recovery() and here are the results
> from primary and hot stand by server.
>
> Primary (old standby ):
> select pg_is_in_recovery();
> pg_is_in_recovery
> -------------------
> f
> (1 row)
>
> hot standby(old primary):
> select pg_is_in_recovery();
> pg_is_in_recovery
> -------------------
> t
> (1 row)
> and there is also standby.signal file in standby server.
> So it seems that there is nothing wrong.
>
> recovery_min_apply_delay = '2d' are set in warm standby servers.
> Before the switchover we had the same settings but we did not have this problem.

It is becoming increasingly obvious that you never actually performed
a switchover: it seems that you didn't promote the standby.

Either use

pg_ctl promote -D /path/to/datadir

on the command line or

SELECT pg_promote();

in SQL.

Yours,
Laurenz Albe

From what you sent:

"After switchover since 01.02.2024 hot standby server does not archive wal files. I couldn't find the problem."
Hot standby normally does not archive WAL files. Only the primary server does that unless you have archive_mode set to "always" but your configuration shows that you have it "on" only.

"related postgresql.conf lines:
archive_mode = on "

Cheers,
Mateusz

Re: archive command doesnt work

От

Alpaslan AKDAĞ

Дата:

08 февраля 2024 г., 20:47:37

Thank you very much.

I got my answer and also read from the documentation. Primary and standby both seem fine.

Best Regards

Alpaslan

On Thu, Feb 8, 2024 at 9:39 PM Mateusz Henicz <mateuszhenicz@gmail.com> wrote:

Hey,
Yes, it is.
https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-ARCHIVE-MODE

czw., 8 lut 2024 o 21:35 Alpaslan AKDAĞ <alpaslanakdag@gmail.com> napisał(a):
Hello Mateusz

Thank you for your answer.

If it is like that, there is not any problem. I tried to find an explanation that hot standby archive wal files or not but I couldn't.
Is this information from the Documentation?

Best Regards,
Alpaslan

On Thu, Feb 8, 2024 at 9:24 PM Mateusz Henicz <mateuszhenicz@gmail.com> wrote:

czw., 8 lut 2024 o 21:10 Laurenz Albe <laurenz.albe@cybertec.at> napisał(a):
On Thu, 2024-02-08 at 09:23 +0100, Alpaslan AKDAĞ wrote:
> Thank you for your answer.
>
> I have checked the pg_is_in_recovery() and here are the results
> from primary and hot stand by server.
>
> Primary (old standby ):
> select pg_is_in_recovery();
> pg_is_in_recovery
> -------------------
> f
> (1 row)
>
> hot standby(old primary):
> select pg_is_in_recovery();
> pg_is_in_recovery
> -------------------
> t
> (1 row)
> and there is also standby.signal file in standby server.
> So it seems that there is nothing wrong.
>
> recovery_min_apply_delay = '2d' are set in warm standby servers.
> Before the switchover we had the same settings but we did not have this problem.

It is becoming increasingly obvious that you never actually performed
a switchover: it seems that you didn't promote the standby.

Either use

pg_ctl promote -D /path/to/datadir

on the command line or

SELECT pg_promote();

in SQL.

Yours,
Laurenz Albe

From what you sent:

"After switchover since 01.02.2024 hot standby server does not archive wal files. I couldn't find the problem."
Hot standby normally does not archive WAL files. Only the primary server does that unless you have archive_mode set to "always" but your configuration shows that you have it "on" only.

"related postgresql.conf lines:
archive_mode = on "

Cheers,
Mateusz

Re: archive command doesnt work

От

Laurenz Albe

Дата:

08 февраля 2024 г., 22:23:43

On Thu, 2024-02-08 at 21:28 +0100, Alpaslan AKDAĞ wrote:
> Now I am confused. In Primary pg_is_in_recovery() should be false and in
> standby node should be true. Am I wrong?

Right.  I guess I got confused which is your primary and which is your
standby.  That's normal in the context of switchover :^/

Yours,
Laurenz Albe

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: archive command doesnt work