Обсуждение: archive command doesnt work

Поиск
Список
Период
Сортировка

archive command doesnt work

От
Alpaslan AKDAĞ
Дата:
Hello

We have a primary, a hot standby and 2 warm standby servers with 2 days delay.

After switchover since 01.02.2024 hot standby server does not archive wal files. I couldn't find the problem.

Hot Standby Server infos:
Redhat Enterprise Linux 9.2
postgreSQL v13.10

related postgresql.conf lines:
archive_mode = on
archive_command = 'cp %p /archive/archived_wal/%f && cp %p /opt/postgres/backup/archived_wal/%f'
archive_timeout = 300
wal_keep_size = 80
wal_level = 'replica'
wal_log_hints = on
max_wal_size = 10GB
min_wal_size = 80MB

archived WAL files per day in /archive/archived_wal/ folder. All of them are before the switchover.
Number Day
    421 2024-01-19
    672 2024-01-20
   1374 2024-01-21
    564 2024-01-22
    569 2024-01-23
    607 2024-01-24
    510 2024-01-25
    612 2024-01-26
    495 2024-01-27
   1497 2024-01-28
    556 2024-01-29
    506 2024-01-30
    549 2024-01-31
    228 2024-02-01

and in log files there are lines like below but no error or failure lines:

2024-02-02 02:07:01.978 CET [979081]   LOG:  restartpoint complete: wrote 26904 buffers (2.6%); 0 WAL file(s) added, 0 removed, 0 recycled; write=445.
874 s, sync=0.005 s, total=445.887 s; sync files=88, longest=0.001 s, average=0.001 s; distance=732035 kB, estimate=732035 kB
2024-02-02 02:07:01.978 CET [979081]   LOG:  recovery restart point at 252/2D7E5E68
2024-02-02 02:07:01.978 CET [979081]   DETAIL:  Last completed transaction was at log time 2024-02-02 02:07:01.889865+01.
--
2024-02-02 02:07:02.024 CET [979081]   LOG:  restartpoint complete: wrote 310 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.037
s, sync=0.004 s, total=0.046 s; sync files=54, longest=0.002 s, average=0.001 s; distance=90216 kB, estimate=667853 kB
2024-02-02 02:07:02.024 CET [979081]   LOG:  recovery restart point at 252/33000028
2024-02-02 02:07:02.024 CET [979081]   DETAIL:  Last completed transaction was at log time 2024-02-02 02:07:01.889865+01.

postgresql-2024-02-03_000000.log:2024-02-03 09:03:20.383 CET [979081]   LOG:  restartpoint complete: wrote 102950 buffers (9.8%); 0 WAL file(s) added,
 1286 removed, 117 recycled; write=1079.221 s, sync=0.008 s, total=1082.510 s; sync files=146, longest=0.001 s, average=0.001 s; distance=1143627 kB,
estimate=1143627 kB
postgresql-2024-02-03_000000.log-2024-02-03 09:03:20.383 CET [979081]   LOG:  recovery restart point at 255/43D56D80
postgresql-2024-02-03_000000.log-2024-02-03 09:03:20.383 CET [979081]   DETAIL:  Last completed transaction was at log time 2024-02-03 09:03:20.311619
+01.
--
postgresql-2024-02-03_000000.log:2024-02-03 17:47:29.480 CET [979081]   LOG:  restartpoint complete: wrote 1305 buffers (0.1%); 0 WAL file(s) added, 1
 removed, 3 recycled; write=131.069 s, sync=0.005 s, total=131.088 s; sync files=104, longest=0.001 s, average=0.001 s; distance=65368 kB, estimate=13
9744 kB
postgresql-2024-02-03_000000.log-2024-02-03 17:47:29.480 CET [979081]   LOG:  recovery restart point at 255/E40CE260
postgresql-2024-02-03_000000.log-2024-02-03 17:47:29.480 CET [979081]   DETAIL:  Last completed transaction was at log time 2024-02-03 17:47:28.055165
+01.

--
postgresql-2024-02-05_000000.log:2024-02-05 06:25:50.272 CET [979081]   LOG:  restartpoint complete: wrote 477 buffers (0.0%); 0 WAL file(s) added, 0
removed, 4 recycled; write=47.946 s, sync=0.005 s, total=47.964 s; sync files=85, longest=0.001 s, average=0.001 s; distance=65484 kB, estimate=413488
 kB
postgresql-2024-02-05_000000.log-2024-02-05 06:25:50.272 CET [979081]   LOG:  recovery restart point at 25C/74083E58
postgresql-2024-02-05_000000.log-2024-02-05 06:25:50.272 CET [979081]   DETAIL:  Last completed transaction was at log time 2024-02-05 06:25:50.223799
+01.

and last thing that i want to share


select pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn(), pg_last_xact_replay_timestamp() \gx
-[ RECORD 1 ]-----------------+------------------------------
pg_last_wal_receive_lsn       | 261/C10BB318
pg_last_wal_replay_lsn        | 261/C10BB318
pg_last_xact_replay_timestamp | 2024-02-07 11:40:57.536997+01

Everything seems fine except archiving.
for now i share these informations but if you need more, i can give.

Thanks in advance.
Best Regards
Alpaslan

Re: archive command doesnt work

От
Laurenz Albe
Дата:
On Wed, 2024-02-07 at 12:07 +0100, Alpaslan AKDAĞ wrote:
> We have a primary, a hot standby and 2 warm standby servers with 2 days delay.
>
> After switchover since 01.02.2024 hot standby server does not archive wal files. I couldn't find the problem.
>
> related postgresql.conf lines:
> archive_mode = on
> archive_command = 'cp %p /archive/archived_wal/%f && cp %p /opt/postgres/backup/archived_wal/%f'
>
> and in log files there are lines like below but no error or failure lines:
>
> postgresql-2024-02-05_000000.log:2024-02-05 06:25:50.272 CET [979081]   LOG:  restartpoint complete: wrote 477
buffers(0.0%); 0 WAL file(s) added, 0  
> removed, 4 recycled; write=47.946 s, sync=0.005 s, total=47.964 s; sync files=85, longest=0.001 s, average=0.001 s;
distance=65484kB, estimate=413488 
>  kB
> postgresql-2024-02-05_000000.log-2024-02-05 06:25:50.272 CET [979081]   LOG:  recovery restart point at 25C/74083E58
> postgresql-2024-02-05_000000.log-2024-02-05 06:25:50.272 CET [979081]   DETAIL:  Last completed transaction was at
logtime 2024-02-05 06:25:50.223799 
> +01.

That looks like the standby wasn't promoted and is still in recovery,
so it won't generate WAL.

Restartpoints are only written on servers in recovery mode.  Moreover,
a promote request would generate a log entry.

Another option would be that the promote request is further up in the log,
and the server takes a long time to replay all the changes
(if the "2 days delay" you mention are set in "recovery_min_apply_delay").

What do you get for

  SELECT pg_is_in_recovery();

Yours,
Laurenz Albe



Re: archive command doesnt work

От
Alpaslan AKDAĞ
Дата:
Hello Laurenz,

Thank you for your answer.

I have checked the pg_is_in_recovery() and here are the results from primary and hot stand by server.

Primary (old standby ):
select pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 f
(1 row)

hot standby(old primary):
select pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t
(1 row)
and there is also standby.signal file in standby server.
So it seems that there is nothing wrong.

recovery_min_apply_delay = '2d' are set in warm standby servers.   Before the switchover we had the same settings but we did not have this problem.
How can I fix it?

Best regards,
Alpaslan

On Wed, Feb 7, 2024 at 9:06 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Wed, 2024-02-07 at 12:07 +0100, Alpaslan AKDAĞ wrote:
> We have a primary, a hot standby and 2 warm standby servers with 2 days delay.
>
> After switchover since 01.02.2024 hot standby server does not archive wal files. I couldn't find the problem.
>
> related postgresql.conf lines:
> archive_mode = on
> archive_command = 'cp %p /archive/archived_wal/%f && cp %p /opt/postgres/backup/archived_wal/%f'
>
> and in log files there are lines like below but no error or failure lines:
>
> postgresql-2024-02-05_000000.log:2024-02-05 06:25:50.272 CET [979081]   LOG:  restartpoint complete: wrote 477 buffers (0.0%); 0 WAL file(s) added, 0
> removed, 4 recycled; write=47.946 s, sync=0.005 s, total=47.964 s; sync files=85, longest=0.001 s, average=0.001 s; distance=65484 kB, estimate=413488
>  kB
> postgresql-2024-02-05_000000.log-2024-02-05 06:25:50.272 CET [979081]   LOG:  recovery restart point at 25C/74083E58
> postgresql-2024-02-05_000000.log-2024-02-05 06:25:50.272 CET [979081]   DETAIL:  Last completed transaction was at log time 2024-02-05 06:25:50.223799
> +01.

That looks like the standby wasn't promoted and is still in recovery,
so it won't generate WAL.

Restartpoints are only written on servers in recovery mode.  Moreover,
a promote request would generate a log entry.

Another option would be that the promote request is further up in the log,
and the server takes a long time to replay all the changes
(if the "2 days delay" you mention are set in "recovery_min_apply_delay").

What do you get for

  SELECT pg_is_in_recovery();

Yours,
Laurenz Albe

Re: archive command doesnt work

От
Laurenz Albe
Дата:
On Thu, 2024-02-08 at 09:23 +0100, Alpaslan AKDAĞ wrote:
> Thank you for your answer.
>
> I have checked the pg_is_in_recovery() and here are the results
> from primary and hot stand by server.
>
> Primary (old standby ):
> select pg_is_in_recovery();
>  pg_is_in_recovery
> -------------------
>  f
> (1 row)
>
> hot standby(old primary):
> select pg_is_in_recovery();
>  pg_is_in_recovery
> -------------------
>  t
> (1 row)
> and there is also standby.signal file in standby server.
> So it seems that there is nothing wrong.
>
> recovery_min_apply_delay = '2d' are set in warm standby servers.
> Before the switchover we had the same settings but we did not have this problem.

It is becoming increasingly obvious that you never actually performed
a switchover: it seems that you didn't promote the standby.

Either use

  pg_ctl promote -D /path/to/datadir

on the command line or

  SELECT pg_promote();

in SQL.

Yours,
Laurenz Albe



Re: archive command doesnt work

От
Mateusz Henicz
Дата:

czw., 8 lut 2024 o 21:10 Laurenz Albe <laurenz.albe@cybertec.at> napisał(a):
On Thu, 2024-02-08 at 09:23 +0100, Alpaslan AKDAĞ wrote:
> Thank you for your answer.
>
> I have checked the pg_is_in_recovery() and here are the results
> from primary and hot stand by server.
>
> Primary (old standby ):
> select pg_is_in_recovery();
>  pg_is_in_recovery
> -------------------
>  f
> (1 row)
>
> hot standby(old primary):
> select pg_is_in_recovery();
>  pg_is_in_recovery
> -------------------
>  t
> (1 row)
> and there is also standby.signal file in standby server.
> So it seems that there is nothing wrong.
>
> recovery_min_apply_delay = '2d' are set in warm standby servers.
> Before the switchover we had the same settings but we did not have this problem.

It is becoming increasingly obvious that you never actually performed
a switchover: it seems that you didn't promote the standby.

Either use

  pg_ctl promote -D /path/to/datadir

on the command line or

  SELECT pg_promote();

in SQL.

Yours,
Laurenz Albe



From what you sent:
 
"After switchover since 01.02.2024 hot standby server does not archive wal files. I couldn't find the problem."
Hot standby normally does not archive WAL files. Only the primary server does that unless you have archive_mode set to "always" but your configuration shows that you have it "on" only.

"related postgresql.conf lines:
archive_mode = on "

Cheers,
Mateusz

Re: archive command doesnt work

От
Alpaslan AKDAĞ
Дата:
Now I am confused. In Primary pg_is_in_recovery() should be false and in standby node should be true. Am I wrong?

Thanks and regards
Alpaslan

On Thu, Feb 8, 2024 at 9:10 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Thu, 2024-02-08 at 09:23 +0100, Alpaslan AKDAĞ wrote:
> Thank you for your answer.
>
> I have checked the pg_is_in_recovery() and here are the results
> from primary and hot stand by server.
>
> Primary (old standby ):
> select pg_is_in_recovery();
>  pg_is_in_recovery
> -------------------
>  f
> (1 row)
>
> hot standby(old primary):
> select pg_is_in_recovery();
>  pg_is_in_recovery
> -------------------
>  t
> (1 row)
> and there is also standby.signal file in standby server.
> So it seems that there is nothing wrong.
>
> recovery_min_apply_delay = '2d' are set in warm standby servers.
> Before the switchover we had the same settings but we did not have this problem.

It is becoming increasingly obvious that you never actually performed
a switchover: it seems that you didn't promote the standby.

Either use

  pg_ctl promote -D /path/to/datadir

on the command line or

  SELECT pg_promote();

in SQL.

Yours,
Laurenz Albe

Re: archive command doesnt work

От
Alpaslan AKDAĞ
Дата:
Hello Mateusz

Thank you for your answer.

If it is like that, there is not any problem. I tried to find an explanation that hot standby archive wal files or not but I couldn't.
Is this information  from the Documentation?

Best Regards,
Alpaslan

On Thu, Feb 8, 2024 at 9:24 PM Mateusz Henicz <mateuszhenicz@gmail.com> wrote:

czw., 8 lut 2024 o 21:10 Laurenz Albe <laurenz.albe@cybertec.at> napisał(a):
On Thu, 2024-02-08 at 09:23 +0100, Alpaslan AKDAĞ wrote:
> Thank you for your answer.
>
> I have checked the pg_is_in_recovery() and here are the results
> from primary and hot stand by server.
>
> Primary (old standby ):
> select pg_is_in_recovery();
>  pg_is_in_recovery
> -------------------
>  f
> (1 row)
>
> hot standby(old primary):
> select pg_is_in_recovery();
>  pg_is_in_recovery
> -------------------
>  t
> (1 row)
> and there is also standby.signal file in standby server.
> So it seems that there is nothing wrong.
>
> recovery_min_apply_delay = '2d' are set in warm standby servers.
> Before the switchover we had the same settings but we did not have this problem.

It is becoming increasingly obvious that you never actually performed
a switchover: it seems that you didn't promote the standby.

Either use

  pg_ctl promote -D /path/to/datadir

on the command line or

  SELECT pg_promote();

in SQL.

Yours,
Laurenz Albe



From what you sent:
 
"After switchover since 01.02.2024 hot standby server does not archive wal files. I couldn't find the problem."
Hot standby normally does not archive WAL files. Only the primary server does that unless you have archive_mode set to "always" but your configuration shows that you have it "on" only.

"related postgresql.conf lines:
archive_mode = on "

Cheers,
Mateusz

Re: archive command doesnt work

От
Mateusz Henicz
Дата:

czw., 8 lut 2024 o 21:35 Alpaslan AKDAĞ <alpaslanakdag@gmail.com> napisał(a):
Hello Mateusz

Thank you for your answer.

If it is like that, there is not any problem. I tried to find an explanation that hot standby archive wal files or not but I couldn't.
Is this information  from the Documentation?

Best Regards,
Alpaslan

On Thu, Feb 8, 2024 at 9:24 PM Mateusz Henicz <mateuszhenicz@gmail.com> wrote:

czw., 8 lut 2024 o 21:10 Laurenz Albe <laurenz.albe@cybertec.at> napisał(a):
On Thu, 2024-02-08 at 09:23 +0100, Alpaslan AKDAĞ wrote:
> Thank you for your answer.
>
> I have checked the pg_is_in_recovery() and here are the results
> from primary and hot stand by server.
>
> Primary (old standby ):
> select pg_is_in_recovery();
>  pg_is_in_recovery
> -------------------
>  f
> (1 row)
>
> hot standby(old primary):
> select pg_is_in_recovery();
>  pg_is_in_recovery
> -------------------
>  t
> (1 row)
> and there is also standby.signal file in standby server.
> So it seems that there is nothing wrong.
>
> recovery_min_apply_delay = '2d' are set in warm standby servers.
> Before the switchover we had the same settings but we did not have this problem.

It is becoming increasingly obvious that you never actually performed
a switchover: it seems that you didn't promote the standby.

Either use

  pg_ctl promote -D /path/to/datadir

on the command line or

  SELECT pg_promote();

in SQL.

Yours,
Laurenz Albe



From what you sent:
 
"After switchover since 01.02.2024 hot standby server does not archive wal files. I couldn't find the problem."
Hot standby normally does not archive WAL files. Only the primary server does that unless you have archive_mode set to "always" but your configuration shows that you have it "on" only.

"related postgresql.conf lines:
archive_mode = on "

Cheers,
Mateusz

Re: archive command doesnt work

От
Alpaslan AKDAĞ
Дата:
Thank you very much.

I got my answer and also read from the documentation. Primary and standby both seem fine.

Best Regards
Alpaslan


On Thu, Feb 8, 2024 at 9:39 PM Mateusz Henicz <mateuszhenicz@gmail.com> wrote:

czw., 8 lut 2024 o 21:35 Alpaslan AKDAĞ <alpaslanakdag@gmail.com> napisał(a):
Hello Mateusz

Thank you for your answer.

If it is like that, there is not any problem. I tried to find an explanation that hot standby archive wal files or not but I couldn't.
Is this information  from the Documentation?

Best Regards,
Alpaslan

On Thu, Feb 8, 2024 at 9:24 PM Mateusz Henicz <mateuszhenicz@gmail.com> wrote:

czw., 8 lut 2024 o 21:10 Laurenz Albe <laurenz.albe@cybertec.at> napisał(a):
On Thu, 2024-02-08 at 09:23 +0100, Alpaslan AKDAĞ wrote:
> Thank you for your answer.
>
> I have checked the pg_is_in_recovery() and here are the results
> from primary and hot stand by server.
>
> Primary (old standby ):
> select pg_is_in_recovery();
>  pg_is_in_recovery
> -------------------
>  f
> (1 row)
>
> hot standby(old primary):
> select pg_is_in_recovery();
>  pg_is_in_recovery
> -------------------
>  t
> (1 row)
> and there is also standby.signal file in standby server.
> So it seems that there is nothing wrong.
>
> recovery_min_apply_delay = '2d' are set in warm standby servers.
> Before the switchover we had the same settings but we did not have this problem.

It is becoming increasingly obvious that you never actually performed
a switchover: it seems that you didn't promote the standby.

Either use

  pg_ctl promote -D /path/to/datadir

on the command line or

  SELECT pg_promote();

in SQL.

Yours,
Laurenz Albe



From what you sent:
 
"After switchover since 01.02.2024 hot standby server does not archive wal files. I couldn't find the problem."
Hot standby normally does not archive WAL files. Only the primary server does that unless you have archive_mode set to "always" but your configuration shows that you have it "on" only.

"related postgresql.conf lines:
archive_mode = on "

Cheers,
Mateusz

Re: archive command doesnt work

От
Laurenz Albe
Дата:
On Thu, 2024-02-08 at 21:28 +0100, Alpaslan AKDAĞ wrote:
> Now I am confused. In Primary pg_is_in_recovery() should be false and in
> standby node should be true. Am I wrong?

Right.  I guess I got confused which is your primary and which is your
standby.  That's normal in the context of switchover :^/

Yours,
Laurenz Albe