WAL replay stuck in hot standby when performing the backup

Поиск
Список
Период
Сортировка
От Andrea Moretto
Тема WAL replay stuck in hot standby when performing the backup
Дата
Msg-id 92CEF44C-A3DD-433D-8CD7-B96EA01A2AA5@gmail.com
обсуждение исходный текст
Список pgsql-admin
Hi there,

  I have noticed that WAL replay blocks in a hot standby slave when performing the backup.
I am using on both PostgreSQL 9.4.23 on Redhat.

During the whole backup procedure (which takes more than 24 hours), the data onto the hot standby node is not updated.

Is this behaviour correct?

My idea is to confine batch workloads, backups and less critical OLTP queries on the hot standby.
If during the backup live data is not updated in the hot standby node, I have to re-plan the environment.

Details:

- Replication statistics show that replay location is lagging far behind the flush location (on master side):

  select * from pg_stat_replication;

    pid  | usesysid | usename  | application_name | client_addr | client_hostname | client_port |         backend_start
       | backend_xmin |   state   | sent_location | write_location | flush_location | replay_location | sync_priority |
sync_state 

-------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+--------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
   15477 |       10 | postgres | walreceiver      | 127.0.0.1   |                 |       50612 | 2019-07-15
17:34:36.759306+00|   1152223785 | streaming | 274D/804CBCB8 | 274D/804CBCB8  | 274D/804CBCB8  | 2744/30F84028   |
      0 | async 
  (1 row)

- Replay lag is about 19 hours and running (on hot standby side):

  select now()-pg_last_xact_replay_timestamp() as replication_lag;

   replication_lag
  -----------------
   19:12:11.2289
  (1 row)

  SELECT
  pg_last_xlog_receive_location() receive,
  pg_last_xlog_replay_location() replay,
  (
   extract(epoch FROM now()) -
   extract(epoch FROM pg_last_xact_replay_timestamp())
  )::int lag;

      receive    |    replay     |  lag
  ---------------+---------------+-------
   274E/BAF0B610 | 2744/30F84028 | 69157
  (1 row)

  The only query currently running (not in idle state) on the slave is a COPY:

  select * from pg_stat_activity;

   datid |   datname   |  pid  | usesysid |    usename     |    application_name    | client_addr  | client_hostname |
client_port|         backend_start         |          xact_start           |          query_start          |
state_change         | waiting | state  | backend_xid | backend_xmin |
                                                           query
                                                     

-------+-------------+-------+----------+----------------+------------------------+--------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+---------+--------+-------------+--------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   16993 | mydb        | 15604 |  3226933 | rousr          | PostgreSQL JDBC Driver | 10.192.87.91 |                 |
    41491 | 2019-07-15 17:35:57.01458+00  |                               | 2019-07-16 12:55:58.21023+00  | 2019-07-16
12:55:58.210251+00| f       | idle   |             |              | COMMIT 
   16993 | mydb        | 18868 |  3226603 | amoretto       | psql93                 | 10.174.62.66 |                 |
    54106 | 2019-07-16 12:59:30.122865+00 | 2019-07-16 13:00:18.807889+00 | 2019-07-16 13:00:18.807889+00 | 2019-07-16
13:00:18.807893+00| f       | active |             |   1152301732 | select * from pg_stat_activity; 
   16993 | mydb        |  2559 |  3226603 | amoretto       | psql93                 | 10.174.62.66 |                 |
    52338 | 2019-07-16 10:57:03.118782+00 |                               | 2019-07-16 10:58:27.278095+00 | 2019-07-16
10:58:27.279752+00| f       | idle   |             |              | select * from pg_stat_replication; 
   16993 | mydb        | 15736 |  3226603 | amoretto       | pg_dump                | 10.192.87.90 |                 |
     4281 | 2019-07-15 17:36:42.540157+00 | 2019-07-15 17:36:42.566704+00 | 2019-07-16 12:51:04.488305+00 | 2019-07-16
12:51:04.488307+00| f       | active |             |   1152223785 | COPY public.c_box_hw_dev_app_hist (box_id,
hardware_id,device_id, application_id, sequence, history_time, history_info, usr_id, usr_info, archived,
is_history_info_zip,description, parent_application_name, iid) TO stdout; 
(4 rows)

  select * from pg_stat_database_conflicts;
   datid |   datname   | confl_tablespace | confl_lock | confl_snapshot | confl_bufferpin | confl_deadlock
  -------+-------------+------------------+------------+----------------+-----------------+----------------
       1 | template1   |                0 |          0 |              0 |               0 |              0
   12998 | template0   |                0 |          0 |              0 |               0 |              0
   13003 | postgres    |                0 |          0 |              0 |               0 |              0
   16993 | mydb        |                0 |          0 |              0 |               0 |              0
  (4 rows)

  select * from pg_stat_database;
   datid |   datname   | numbackends | xact_commit | xact_rollback | blks_read | blks_hit | tup_returned | tup_fetched
|tup_inserted | tup_updated | tup_deleted | conflicts | temp_files | temp_bytes | deadlocks | blk_read_time |
blk_write_time|          stats_reset           

-------+-------------+-------------+-------------+---------------+-----------+----------+--------------+-------------+--------------+-------------+-------------+-----------+------------+------------+-----------+---------------+----------------+-------------------------------
       1 | template1   |           0 |           0 |             0 |         0 |        0 |            0 |           0
|           0 |           0 |           0 |         0 |          0 |          0 |         0 |             0 |
  0 |  
   12998 | template0   |           0 |           0 |             0 |         0 |        0 |            0 |           0
|           0 |           0 |           0 |         0 |          0 |          0 |         0 |             0 |
  0 |  
   13003 | postgres    |           0 |           0 |             0 |         0 |        0 |            0 |           0
|           0 |           0 |           0 |         0 |          0 |          0 |         0 |             0 |
  0 |  
   16993 | mydb        |           3 |        2710 |             2 |      2499 |   333071 |       213588 |       72658
|           0 |           0 |           0 |         0 |          0 |          0 |         0 |             0 |
  0 | 2019-07-15 17:31:51.600354+00 
  (4 rows)

  select * from pg_stat_bgwriter;
   checkpoints_timed | checkpoints_req | checkpoint_write_time | checkpoint_sync_time | buffers_checkpoint |
buffers_clean| maxwritten_clean | buffers_backend | buffers_backend_fsync | buffers_alloc |          stats_reset
 

-------------------+-----------------+-----------------------+----------------------+--------------------+---------------+------------------+-----------------+-----------------------+---------------+-------------------------------
                4636 |               1 |               1107064 |                    9 |             378395 |
0 |                0 |          460498 |                     0 |       1432887 | 2019-07-15 17:31:22.319099+00 
  (1 row)

I hope that someone can give me some hint on this topic, it would be appreciated!

Thanks in advance,

  Andrea

Andrea Moretto
moretto.andrea@gmail.com
-------------------------------------------------------
CONFIDENTIALITY NOTICE
This message and its attachments are addressed solely to the persons
above and may contain confidential information. If you have received
the message in error, be informed that any use of the content hereof
is prohibited. Please return it immediately to the sender and delete
the message.
-------------------------------------------------------




В списке pgsql-admin по дате отправления:

Предыдущее
От: Venkataramana Aitla
Дата:
Сообщение: Re: Unable to connect DB...?
Следующее
От: Ron
Дата:
Сообщение: Re: Unable to connect DB...?