Обсуждение: archive command doesnt work
Hello
We have a primary, a hot standby and 2 warm standby servers with 2 days delay.After switchover since 01.02.2024 hot standby server does not archive wal files. I couldn't find the problem.
Redhat Enterprise Linux 9.2
postgreSQL v13.10
related postgresql.conf lines:
archive_mode = on
archive_command = 'cp %p /archive/archived_wal/%f && cp %p /opt/postgres/backup/archived_wal/%f'
archive_timeout = 300
archive_mode = on
archive_command = 'cp %p /archive/archived_wal/%f && cp %p /opt/postgres/backup/archived_wal/%f'
archive_timeout = 300
wal_keep_size = 80
wal_level = 'replica'
wal_log_hints = on
wal_level = 'replica'
wal_log_hints = on
max_wal_size = 10GB
min_wal_size = 80MB
archived WAL files per day in /archive/archived_wal/ folder. All of them are before the switchover.
Number Day
421 2024-01-19
672 2024-01-20
1374 2024-01-21
564 2024-01-22
569 2024-01-23
607 2024-01-24
510 2024-01-25
612 2024-01-26
495 2024-01-27
1497 2024-01-28
556 2024-01-29
506 2024-01-30
549 2024-01-31
228 2024-02-01
672 2024-01-20
1374 2024-01-21
564 2024-01-22
569 2024-01-23
607 2024-01-24
510 2024-01-25
612 2024-01-26
495 2024-01-27
1497 2024-01-28
556 2024-01-29
506 2024-01-30
549 2024-01-31
228 2024-02-01
and in log files there are lines like below but no error or failure lines:
2024-02-02 02:07:01.978 CET [979081] LOG: restartpoint complete: wrote 26904 buffers (2.6%); 0 WAL file(s) added, 0 removed, 0 recycled; write=445.
874 s, sync=0.005 s, total=445.887 s; sync files=88, longest=0.001 s, average=0.001 s; distance=732035 kB, estimate=732035 kB
2024-02-02 02:07:01.978 CET [979081] LOG: recovery restart point at 252/2D7E5E68
2024-02-02 02:07:01.978 CET [979081] DETAIL: Last completed transaction was at log time 2024-02-02 02:07:01.889865+01.
--
2024-02-02 02:07:02.024 CET [979081] LOG: restartpoint complete: wrote 310 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.037
s, sync=0.004 s, total=0.046 s; sync files=54, longest=0.002 s, average=0.001 s; distance=90216 kB, estimate=667853 kB
2024-02-02 02:07:02.024 CET [979081] LOG: recovery restart point at 252/33000028
2024-02-02 02:07:02.024 CET [979081] DETAIL: Last completed transaction was at log time 2024-02-02 02:07:01.889865+01.
874 s, sync=0.005 s, total=445.887 s; sync files=88, longest=0.001 s, average=0.001 s; distance=732035 kB, estimate=732035 kB
2024-02-02 02:07:01.978 CET [979081] LOG: recovery restart point at 252/2D7E5E68
2024-02-02 02:07:01.978 CET [979081] DETAIL: Last completed transaction was at log time 2024-02-02 02:07:01.889865+01.
--
2024-02-02 02:07:02.024 CET [979081] LOG: restartpoint complete: wrote 310 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.037
s, sync=0.004 s, total=0.046 s; sync files=54, longest=0.002 s, average=0.001 s; distance=90216 kB, estimate=667853 kB
2024-02-02 02:07:02.024 CET [979081] LOG: recovery restart point at 252/33000028
2024-02-02 02:07:02.024 CET [979081] DETAIL: Last completed transaction was at log time 2024-02-02 02:07:01.889865+01.
postgresql-2024-02-03_000000.log:2024-02-03 09:03:20.383 CET [979081] LOG: restartpoint complete: wrote 102950 buffers (9.8%); 0 WAL file(s) added,
1286 removed, 117 recycled; write=1079.221 s, sync=0.008 s, total=1082.510 s; sync files=146, longest=0.001 s, average=0.001 s; distance=1143627 kB,
estimate=1143627 kB
postgresql-2024-02-03_000000.log-2024-02-03 09:03:20.383 CET [979081] LOG: recovery restart point at 255/43D56D80
postgresql-2024-02-03_000000.log-2024-02-03 09:03:20.383 CET [979081] DETAIL: Last completed transaction was at log time 2024-02-03 09:03:20.311619
+01.
1286 removed, 117 recycled; write=1079.221 s, sync=0.008 s, total=1082.510 s; sync files=146, longest=0.001 s, average=0.001 s; distance=1143627 kB,
estimate=1143627 kB
postgresql-2024-02-03_000000.log-2024-02-03 09:03:20.383 CET [979081] LOG: recovery restart point at 255/43D56D80
postgresql-2024-02-03_000000.log-2024-02-03 09:03:20.383 CET [979081] DETAIL: Last completed transaction was at log time 2024-02-03 09:03:20.311619
+01.
--
postgresql-2024-02-03_000000.log:2024-02-03 17:47:29.480 CET [979081] LOG: restartpoint complete: wrote 1305 buffers (0.1%); 0 WAL file(s) added, 1
removed, 3 recycled; write=131.069 s, sync=0.005 s, total=131.088 s; sync files=104, longest=0.001 s, average=0.001 s; distance=65368 kB, estimate=13
9744 kB
postgresql-2024-02-03_000000.log-2024-02-03 17:47:29.480 CET [979081] LOG: recovery restart point at 255/E40CE260
postgresql-2024-02-03_000000.log-2024-02-03 17:47:29.480 CET [979081] DETAIL: Last completed transaction was at log time 2024-02-03 17:47:28.055165
+01.
postgresql-2024-02-03_000000.log:2024-02-03 17:47:29.480 CET [979081] LOG: restartpoint complete: wrote 1305 buffers (0.1%); 0 WAL file(s) added, 1
removed, 3 recycled; write=131.069 s, sync=0.005 s, total=131.088 s; sync files=104, longest=0.001 s, average=0.001 s; distance=65368 kB, estimate=13
9744 kB
postgresql-2024-02-03_000000.log-2024-02-03 17:47:29.480 CET [979081] LOG: recovery restart point at 255/E40CE260
postgresql-2024-02-03_000000.log-2024-02-03 17:47:29.480 CET [979081] DETAIL: Last completed transaction was at log time 2024-02-03 17:47:28.055165
+01.
--
postgresql-2024-02-05_000000.log:2024-02-05 06:25:50.272 CET [979081] LOG: restartpoint complete: wrote 477 buffers (0.0%); 0 WAL file(s) added, 0
removed, 4 recycled; write=47.946 s, sync=0.005 s, total=47.964 s; sync files=85, longest=0.001 s, average=0.001 s; distance=65484 kB, estimate=413488
kB
postgresql-2024-02-05_000000.log-2024-02-05 06:25:50.272 CET [979081] LOG: recovery restart point at 25C/74083E58
postgresql-2024-02-05_000000.log-2024-02-05 06:25:50.272 CET [979081] DETAIL: Last completed transaction was at log time 2024-02-05 06:25:50.223799
+01.
postgresql-2024-02-05_000000.log:2024-02-05 06:25:50.272 CET [979081] LOG: restartpoint complete: wrote 477 buffers (0.0%); 0 WAL file(s) added, 0
removed, 4 recycled; write=47.946 s, sync=0.005 s, total=47.964 s; sync files=85, longest=0.001 s, average=0.001 s; distance=65484 kB, estimate=413488
kB
postgresql-2024-02-05_000000.log-2024-02-05 06:25:50.272 CET [979081] LOG: recovery restart point at 25C/74083E58
postgresql-2024-02-05_000000.log-2024-02-05 06:25:50.272 CET [979081] DETAIL: Last completed transaction was at log time 2024-02-05 06:25:50.223799
+01.
and last thing that i want to share
select pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp() \gx
-[ RECORD 1 ]-----------------+------------------------------
pg_last_wal_receive_lsn | 261/C10BB318
pg_last_wal_replay_lsn | 261/C10BB318
pg_last_xact_replay_timestamp | 2024-02-07 11:40:57.536997+01
-[ RECORD 1 ]-----------------+------------------------------
pg_last_wal_receive_lsn | 261/C10BB318
pg_last_wal_replay_lsn | 261/C10BB318
pg_last_xact_replay_timestamp | 2024-02-07 11:40:57.536997+01
Everything seems fine except archiving.
for now i share these informations but if you need more, i can give.
Thanks in advance.
Best Regards
Alpaslan
On Wed, 2024-02-07 at 12:07 +0100, Alpaslan AKDAĞ wrote: > We have a primary, a hot standby and 2 warm standby servers with 2 days delay. > > After switchover since 01.02.2024 hot standby server does not archive wal files. I couldn't find the problem. > > related postgresql.conf lines: > archive_mode = on > archive_command = 'cp %p /archive/archived_wal/%f && cp %p /opt/postgres/backup/archived_wal/%f' > > and in log files there are lines like below but no error or failure lines: > > postgresql-2024-02-05_000000.log:2024-02-05 06:25:50.272 CET [979081] LOG: restartpoint complete: wrote 477 buffers(0.0%); 0 WAL file(s) added, 0 > removed, 4 recycled; write=47.946 s, sync=0.005 s, total=47.964 s; sync files=85, longest=0.001 s, average=0.001 s; distance=65484kB, estimate=413488 > kB > postgresql-2024-02-05_000000.log-2024-02-05 06:25:50.272 CET [979081] LOG: recovery restart point at 25C/74083E58 > postgresql-2024-02-05_000000.log-2024-02-05 06:25:50.272 CET [979081] DETAIL: Last completed transaction was at logtime 2024-02-05 06:25:50.223799 > +01. That looks like the standby wasn't promoted and is still in recovery, so it won't generate WAL. Restartpoints are only written on servers in recovery mode. Moreover, a promote request would generate a log entry. Another option would be that the promote request is further up in the log, and the server takes a long time to replay all the changes (if the "2 days delay" you mention are set in "recovery_min_apply_delay"). What do you get for SELECT pg_is_in_recovery(); Yours, Laurenz Albe
Hello Laurenz,
Thank you for your answer.
I have checked the pg_is_in_recovery() and here are the results from primary and hot stand by server.
Primary (old standby ):
select pg_is_in_recovery();
pg_is_in_recovery
-------------------
f
(1 row)
pg_is_in_recovery
-------------------
f
(1 row)
hot standby(old primary):
select pg_is_in_recovery();
pg_is_in_recovery
-------------------
t
(1 row)
pg_is_in_recovery
-------------------
t
(1 row)
and there is also standby.signal file in standby server.
So it seems that there is nothing wrong.
recovery_min_apply_delay = '2d' are set in warm standby servers. Before the switchover we had the same settings but we did not have this problem.
How can I fix it?
Best regards,
Alpaslan
On Wed, Feb 7, 2024 at 9:06 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Wed, 2024-02-07 at 12:07 +0100, Alpaslan AKDAĞ wrote:
> We have a primary, a hot standby and 2 warm standby servers with 2 days delay.
>
> After switchover since 01.02.2024 hot standby server does not archive wal files. I couldn't find the problem.
>
> related postgresql.conf lines:
> archive_mode = on
> archive_command = 'cp %p /archive/archived_wal/%f && cp %p /opt/postgres/backup/archived_wal/%f'
>
> and in log files there are lines like below but no error or failure lines:
>
> postgresql-2024-02-05_000000.log:2024-02-05 06:25:50.272 CET [979081] LOG: restartpoint complete: wrote 477 buffers (0.0%); 0 WAL file(s) added, 0
> removed, 4 recycled; write=47.946 s, sync=0.005 s, total=47.964 s; sync files=85, longest=0.001 s, average=0.001 s; distance=65484 kB, estimate=413488
> kB
> postgresql-2024-02-05_000000.log-2024-02-05 06:25:50.272 CET [979081] LOG: recovery restart point at 25C/74083E58
> postgresql-2024-02-05_000000.log-2024-02-05 06:25:50.272 CET [979081] DETAIL: Last completed transaction was at log time 2024-02-05 06:25:50.223799
> +01.
That looks like the standby wasn't promoted and is still in recovery,
so it won't generate WAL.
Restartpoints are only written on servers in recovery mode. Moreover,
a promote request would generate a log entry.
Another option would be that the promote request is further up in the log,
and the server takes a long time to replay all the changes
(if the "2 days delay" you mention are set in "recovery_min_apply_delay").
What do you get for
SELECT pg_is_in_recovery();
Yours,
Laurenz Albe
On Thu, 2024-02-08 at 09:23 +0100, Alpaslan AKDAĞ wrote: > Thank you for your answer. > > I have checked the pg_is_in_recovery() and here are the results > from primary and hot stand by server. > > Primary (old standby ): > select pg_is_in_recovery(); > pg_is_in_recovery > ------------------- > f > (1 row) > > hot standby(old primary): > select pg_is_in_recovery(); > pg_is_in_recovery > ------------------- > t > (1 row) > and there is also standby.signal file in standby server. > So it seems that there is nothing wrong. > > recovery_min_apply_delay = '2d' are set in warm standby servers. > Before the switchover we had the same settings but we did not have this problem. It is becoming increasingly obvious that you never actually performed a switchover: it seems that you didn't promote the standby. Either use pg_ctl promote -D /path/to/datadir on the command line or SELECT pg_promote(); in SQL. Yours, Laurenz Albe
czw., 8 lut 2024 o 21:10 Laurenz Albe <laurenz.albe@cybertec.at> napisał(a):
On Thu, 2024-02-08 at 09:23 +0100, Alpaslan AKDAĞ wrote:
> Thank you for your answer.
>
> I have checked the pg_is_in_recovery() and here are the results
> from primary and hot stand by server.
>
> Primary (old standby ):
> select pg_is_in_recovery();
> pg_is_in_recovery
> -------------------
> f
> (1 row)
>
> hot standby(old primary):
> select pg_is_in_recovery();
> pg_is_in_recovery
> -------------------
> t
> (1 row)
> and there is also standby.signal file in standby server.
> So it seems that there is nothing wrong.
>
> recovery_min_apply_delay = '2d' are set in warm standby servers.
> Before the switchover we had the same settings but we did not have this problem.
It is becoming increasingly obvious that you never actually performed
a switchover: it seems that you didn't promote the standby.
Either use
pg_ctl promote -D /path/to/datadir
on the command line or
SELECT pg_promote();
in SQL.
Yours,
Laurenz Albe
From what you sent:
"After switchover since 01.02.2024 hot standby server does not archive wal files. I couldn't find the problem."
Hot standby normally does not archive WAL files. Only the primary server does that unless you have archive_mode set to "always" but your configuration shows that you have it "on" only.
"related postgresql.conf lines:
archive_mode = on "
Cheers,
Mateusz
Mateusz
Now I am confused. In Primary pg_is_in_recovery() should be false and in standby node should be true. Am I wrong?
Thanks and regards
Alpaslan
On Thu, Feb 8, 2024 at 9:10 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Thu, 2024-02-08 at 09:23 +0100, Alpaslan AKDAĞ wrote:
> Thank you for your answer.
>
> I have checked the pg_is_in_recovery() and here are the results
> from primary and hot stand by server.
>
> Primary (old standby ):
> select pg_is_in_recovery();
> pg_is_in_recovery
> -------------------
> f
> (1 row)
>
> hot standby(old primary):
> select pg_is_in_recovery();
> pg_is_in_recovery
> -------------------
> t
> (1 row)
> and there is also standby.signal file in standby server.
> So it seems that there is nothing wrong.
>
> recovery_min_apply_delay = '2d' are set in warm standby servers.
> Before the switchover we had the same settings but we did not have this problem.
It is becoming increasingly obvious that you never actually performed
a switchover: it seems that you didn't promote the standby.
Either use
pg_ctl promote -D /path/to/datadir
on the command line or
SELECT pg_promote();
in SQL.
Yours,
Laurenz Albe
Hello Mateusz
Thank you for your answer.On Thu, Feb 8, 2024 at 9:24 PM Mateusz Henicz <mateuszhenicz@gmail.com> wrote:
czw., 8 lut 2024 o 21:10 Laurenz Albe <laurenz.albe@cybertec.at> napisał(a):On Thu, 2024-02-08 at 09:23 +0100, Alpaslan AKDAĞ wrote:
> Thank you for your answer.
>
> I have checked the pg_is_in_recovery() and here are the results
> from primary and hot stand by server.
>
> Primary (old standby ):
> select pg_is_in_recovery();
> pg_is_in_recovery
> -------------------
> f
> (1 row)
>
> hot standby(old primary):
> select pg_is_in_recovery();
> pg_is_in_recovery
> -------------------
> t
> (1 row)
> and there is also standby.signal file in standby server.
> So it seems that there is nothing wrong.
>
> recovery_min_apply_delay = '2d' are set in warm standby servers.
> Before the switchover we had the same settings but we did not have this problem.
It is becoming increasingly obvious that you never actually performed
a switchover: it seems that you didn't promote the standby.
Either use
pg_ctl promote -D /path/to/datadir
on the command line or
SELECT pg_promote();
in SQL.
Yours,
Laurenz AlbeFrom what you sent:
"After switchover since 01.02.2024 hot standby server does not archive wal files. I couldn't find the problem."Hot standby normally does not archive WAL files. Only the primary server does that unless you have archive_mode set to "always" but your configuration shows that you have it "on" only."related postgresql.conf lines:archive_mode = on "Cheers,
Mateusz
czw., 8 lut 2024 o 21:35 Alpaslan AKDAĞ <alpaslanakdag@gmail.com> napisał(a):
AlpaslanBest Regards,Is this information from the Documentation?If it is like that, there is not any problem. I tried to find an explanation that hot standby archive wal files or not but I couldn't.Hello MateuszThank you for your answer.On Thu, Feb 8, 2024 at 9:24 PM Mateusz Henicz <mateuszhenicz@gmail.com> wrote:czw., 8 lut 2024 o 21:10 Laurenz Albe <laurenz.albe@cybertec.at> napisał(a):On Thu, 2024-02-08 at 09:23 +0100, Alpaslan AKDAĞ wrote:
> Thank you for your answer.
>
> I have checked the pg_is_in_recovery() and here are the results
> from primary and hot stand by server.
>
> Primary (old standby ):
> select pg_is_in_recovery();
> pg_is_in_recovery
> -------------------
> f
> (1 row)
>
> hot standby(old primary):
> select pg_is_in_recovery();
> pg_is_in_recovery
> -------------------
> t
> (1 row)
> and there is also standby.signal file in standby server.
> So it seems that there is nothing wrong.
>
> recovery_min_apply_delay = '2d' are set in warm standby servers.
> Before the switchover we had the same settings but we did not have this problem.
It is becoming increasingly obvious that you never actually performed
a switchover: it seems that you didn't promote the standby.
Either use
pg_ctl promote -D /path/to/datadir
on the command line or
SELECT pg_promote();
in SQL.
Yours,
Laurenz AlbeFrom what you sent:
"After switchover since 01.02.2024 hot standby server does not archive wal files. I couldn't find the problem."Hot standby normally does not archive WAL files. Only the primary server does that unless you have archive_mode set to "always" but your configuration shows that you have it "on" only."related postgresql.conf lines:archive_mode = on "Cheers,
Mateusz
Thank you very much.
I got my answer and also read from the documentation. Primary and standby both seem fine.
Best Regards
Alpaslan
On Thu, Feb 8, 2024 at 9:39 PM Mateusz Henicz <mateuszhenicz@gmail.com> wrote:
czw., 8 lut 2024 o 21:35 Alpaslan AKDAĞ <alpaslanakdag@gmail.com> napisał(a):AlpaslanBest Regards,Is this information from the Documentation?If it is like that, there is not any problem. I tried to find an explanation that hot standby archive wal files or not but I couldn't.Hello MateuszThank you for your answer.On Thu, Feb 8, 2024 at 9:24 PM Mateusz Henicz <mateuszhenicz@gmail.com> wrote:czw., 8 lut 2024 o 21:10 Laurenz Albe <laurenz.albe@cybertec.at> napisał(a):On Thu, 2024-02-08 at 09:23 +0100, Alpaslan AKDAĞ wrote:
> Thank you for your answer.
>
> I have checked the pg_is_in_recovery() and here are the results
> from primary and hot stand by server.
>
> Primary (old standby ):
> select pg_is_in_recovery();
> pg_is_in_recovery
> -------------------
> f
> (1 row)
>
> hot standby(old primary):
> select pg_is_in_recovery();
> pg_is_in_recovery
> -------------------
> t
> (1 row)
> and there is also standby.signal file in standby server.
> So it seems that there is nothing wrong.
>
> recovery_min_apply_delay = '2d' are set in warm standby servers.
> Before the switchover we had the same settings but we did not have this problem.
It is becoming increasingly obvious that you never actually performed
a switchover: it seems that you didn't promote the standby.
Either use
pg_ctl promote -D /path/to/datadir
on the command line or
SELECT pg_promote();
in SQL.
Yours,
Laurenz AlbeFrom what you sent:
"After switchover since 01.02.2024 hot standby server does not archive wal files. I couldn't find the problem."Hot standby normally does not archive WAL files. Only the primary server does that unless you have archive_mode set to "always" but your configuration shows that you have it "on" only."related postgresql.conf lines:archive_mode = on "Cheers,
Mateusz
On Thu, 2024-02-08 at 21:28 +0100, Alpaslan AKDAĞ wrote: > Now I am confused. In Primary pg_is_in_recovery() should be false and in > standby node should be true. Am I wrong? Right. I guess I got confused which is your primary and which is your standby. That's normal in the context of switchover :^/ Yours, Laurenz Albe