Обсуждение: Monitoring Replication on Master/Slave Postgres(9.1)
HI All,
I want to know the ways to monitor the replication, whether the master and slave server are sync.
The way I am checking it is:
On Master:
[root@livedb1 ~]# ps -ef | grep sender
postgres 2152 7045 0 Nov05 ? 01:05:33 postgres: wal sender process postgres 10.0.0.2(45862) streaming 17A/35B62638
root 26030 10506 0 15:26 pts/2 00:00:00 grep sender
radius=# SELECT pg_current_xlog_location();
pg_current_xlog_location
--------------------------
17A/342A6F78
(1 row)
On slave database:
[root@liveDB2 ~]# ps -ef | grep receiver
root 11725 4286 0 15:28 pts/0 00:00:00 grep receiver
postgres 27250 3432 0 Nov05 ? 00:35:33 postgres: wal receiver process streaming 17A/363EE668
radius=# select pg_last_xlog_receive_location();
pg_last_xlog_receive_location
-------------------------------
17A/34366C30
(1 row)
If I go to pg_xlog folder: I checked on slave WAL files are generated. However I could not find the same result whenever I run pg_current_xlog_location(); on master and on the other side I get the different number (on slave)
My question is how do we read these numbers(17A/342A6F78 and 17A/34366C30) I never got the same results on both servers. Is that mean slave is not synced with master?
If yes, then how do we ensure my replication is working fine?
Please share you knowledge and through some light.
thanx
I want to know the ways to monitor the replication, whether the master and slave server are sync.
The way I am checking it is:
On Master:
[root@livedb1 ~]# ps -ef | grep sender
postgres 2152 7045 0 Nov05 ? 01:05:33 postgres: wal sender process postgres 10.0.0.2(45862) streaming 17A/35B62638
root 26030 10506 0 15:26 pts/2 00:00:00 grep sender
radius=# SELECT pg_current_xlog_location();
pg_current_xlog_location
--------------------------
17A/342A6F78
(1 row)
On slave database:
[root@liveDB2 ~]# ps -ef | grep receiver
root 11725 4286 0 15:28 pts/0 00:00:00 grep receiver
postgres 27250 3432 0 Nov05 ? 00:35:33 postgres: wal receiver process streaming 17A/363EE668
radius=# select pg_last_xlog_receive_location();
pg_last_xlog_receive_location
-------------------------------
17A/34366C30
(1 row)
If I go to pg_xlog folder: I checked on slave WAL files are generated. However I could not find the same result whenever I run pg_current_xlog_location(); on master and on the other side I get the different number (on slave)
My question is how do we read these numbers(17A/342A6F78 and 17A/34366C30) I never got the same results on both servers. Is that mean slave is not synced with master?
If yes, then how do we ensure my replication is working fine?
Please share you knowledge and through some light.
thanx
Can somebody help me on that????
---------- Forwarded message ----------
From: Shams Khan <shams.khan22@gmail.com>
Date: Mon, Nov 26, 2012 at 3:35 PM
Subject: Monitoring Replication on Master/Slave Postgres(9.1)
To: pgsql-admin@postgresql.org
HI All,
I want to know the ways to monitor the replication, whether the master and slave server are sync.
The way I am checking it is:
On Master:
[root@livedb1 ~]# ps -ef | grep sender
postgres 2152 7045 0 Nov05 ? 01:05:33 postgres: wal sender process postgres 10.0.0.2(45862) streaming 17A/35B62638
root 26030 10506 0 15:26 pts/2 00:00:00 grep sender
radius=# SELECT pg_current_xlog_location();
pg_current_xlog_location
--------------------------
17A/342A6F78
(1 row)
On slave database:
[root@liveDB2 ~]# ps -ef | grep receiver
root 11725 4286 0 15:28 pts/0 00:00:00 grep receiver
postgres 27250 3432 0 Nov05 ? 00:35:33 postgres: wal receiver process streaming 17A/363EE668
radius=# select pg_last_xlog_receive_location();
pg_last_xlog_receive_location
-------------------------------
17A/34366C30
(1 row)
If I go to pg_xlog folder: I checked on slave WAL files are generated. However I could not find the same result whenever I run pg_current_xlog_location(); on master and on the other side I get the different number (on slave)
My question is how do we read these numbers(17A/342A6F78 and 17A/34366C30) I never got the same results on both servers. Is that mean slave is not synced with master?
If yes, then how do we ensure my replication is working fine?
Please share you knowledge and through some light.
thanx
From: Shams Khan <shams.khan22@gmail.com>
Date: Mon, Nov 26, 2012 at 3:35 PM
Subject: Monitoring Replication on Master/Slave Postgres(9.1)
To: pgsql-admin@postgresql.org
HI All,
I want to know the ways to monitor the replication, whether the master and slave server are sync.
The way I am checking it is:
On Master:
[root@livedb1 ~]# ps -ef | grep sender
postgres 2152 7045 0 Nov05 ? 01:05:33 postgres: wal sender process postgres 10.0.0.2(45862) streaming 17A/35B62638
root 26030 10506 0 15:26 pts/2 00:00:00 grep sender
radius=# SELECT pg_current_xlog_location();
pg_current_xlog_location
--------------------------
17A/342A6F78
(1 row)
On slave database:
[root@liveDB2 ~]# ps -ef | grep receiver
root 11725 4286 0 15:28 pts/0 00:00:00 grep receiver
postgres 27250 3432 0 Nov05 ? 00:35:33 postgres: wal receiver process streaming 17A/363EE668
radius=# select pg_last_xlog_receive_location();
pg_last_xlog_receive_location
-------------------------------
17A/34366C30
(1 row)
If I go to pg_xlog folder: I checked on slave WAL files are generated. However I could not find the same result whenever I run pg_current_xlog_location(); on master and on the other side I get the different number (on slave)
My question is how do we read these numbers(17A/342A6F78 and 17A/34366C30) I never got the same results on both servers. Is that mean slave is not synced with master?
If yes, then how do we ensure my replication is working fine?
Please share you knowledge and through some light.
thanx
Thanks for the response:
I tried it and got the below result:
radius=# select procpid,usesysid, usename, application_name, client_addr, state, sent_location,write_location,
I tried it and got the below result:
radius=# select procpid,usesysid, usename, application_name, client_addr, state, sent_location,write_location,
sync_state from pg_stat_replication;
procpid | usesysid | usename | application_name | client_addr | state | sent_location | write_location | sync_state
---------+----------+----------+------------------+-------------+-----------+---------------+----------------+------------
2152 | 10 | postgres | walreceiver | 10.0.0.2 | streaming | 17B/EBA37AD8 | 17B/EBA37AD8 | async
(1 row)
The last column says async....could you please tell me how can we sync it?
procpid | usesysid | usename | application_name | client_addr | state | sent_location | write_location | sync_state
---------+----------+----------+------------------+-------------+-----------+---------------+----------------+------------
2152 | 10 | postgres | walreceiver | 10.0.0.2 | streaming | 17B/EBA37AD8 | 17B/EBA37AD8 | async
(1 row)
The last column says async....could you please tell me how can we sync it?
On Tue, Nov 27, 2012 at 1:26 PM, Shams Khan <shams.khan22@gmail.com> wrote:
Thanks for the response:
I tried it and got the below result:
radius=# select procpid,usesysid, usename, application_name, client_addr, state, sent_location,write_location,sync_state from pg_stat_replication;
procpid | usesysid | usename | application_name | client_addr | state | sent_location | write_location | sync_state
---------+----------+----------+------------------+-------------+-----------+---------------+----------------+------------
2152 | 10 | postgres | walreceiver | 10.0.0.2 | streaming | 17B/EBA37AD8 | 17B/EBA37AD8 | async
(1 row)
The last column says async....could you please tell me how can we sync it?On Tue, Nov 27, 2012 at 1:15 PM, Sergey Garas <aorashi@gmail.com> wrote:on the master use
select procpid,usesysid, usename, application_name, client_addr, state, sent_location,write_location,sync_state from pg_stat_replication;
the last field is the state of your repliocation status sync or async
Best regards,
Sergey Garas2012/11/27 Shams Khan <shams.khan22@gmail.com>Can somebody help me on that????---------- Forwarded message ----------
From: Shams Khan <shams.khan22@gmail.com>
Date: Mon, Nov 26, 2012 at 3:35 PM
Subject: Monitoring Replication on Master/Slave Postgres(9.1)
To: pgsql-admin@postgresql.org
HI All,
I want to know the ways to monitor the replication, whether the master and slave server are sync.
The way I am checking it is:
On Master:
[root@livedb1 ~]# ps -ef | grep sender
postgres 2152 7045 0 Nov05 ? 01:05:33 postgres: wal sender process postgres 10.0.0.2(45862) streaming 17A/35B62638
root 26030 10506 0 15:26 pts/2 00:00:00 grep sender
radius=# SELECT pg_current_xlog_location();
pg_current_xlog_location
--------------------------
17A/342A6F78
(1 row)
On slave database:
[root@liveDB2 ~]# ps -ef | grep receiver
root 11725 4286 0 15:28 pts/0 00:00:00 grep receiver
postgres 27250 3432 0 Nov05 ? 00:35:33 postgres: wal receiver process streaming 17A/363EE668
radius=# select pg_last_xlog_receive_location();
pg_last_xlog_receive_location
-------------------------------
17A/34366C30
(1 row)
If I go to pg_xlog folder: I checked on slave WAL files are generated. However I could not find the same result whenever I run pg_current_xlog_location(); on master and on the other side I get the different number (on slave)
My question is how do we read these numbers(17A/342A6F78 and 17A/34366C30) I never got the same results on both servers. Is that mean slave is not synced with master?
If yes, then how do we ensure my replication is working fine?
Please share you knowledge and through some light.
thanx
On Mon, Nov 26, 2012 at 5:05 PM, Shams Khan <shams.khan22@gmail.com> wrote: > I want to know the ways to monitor the replication, whether the master and > slave server are sync. On the slave, run "SELECT now() - pg_last_xact_replay_timestamp() AS time_lag;". That tells you how far behind in time the slave is. > My question is how do we read these numbers(17A/342A6F78 and 17A/34366C30) I > never got the same results on both servers. Is that mean slave is not synced > with master? With asynchronous replication, the slave will always lag behind the master. The query I posted above will tell you by how much. -- Stuart Bishop <stuart@stuartbishop.net> http://www.stuartbishop.net/
Thanks for the response Stuart...It was really helpful:
Below are some doubts I wanted to clarify..please read and suggest.
Can we also check if replication was broken earlier...somehow due to power failure of some other reasons in past...The reason I am asking is I am getting some discrepancies in data between master and slave...? I want to know the reason of it...
thanx
Below are some doubts I wanted to clarify..please read and suggest.
Can we also check if replication was broken earlier...somehow due to power failure of some other reasons in past...The reason I am asking is I am getting some discrepancies in data between master and slave...? I want to know the reason of it...
thanx
---------- Forwarded message ----------
From: Stuart Bishop <stuart@stuartbishop.net>
Date: Tue, Nov 27, 2012 at 2:03 PM
Subject: Re: [ADMIN] Monitoring Replication on Master/Slave Postgres(9.1)
To: Shams Khan <shams.khan22@gmail.com>
time_lag;". That tells you how far behind in time the slave is.
> My question is how do we read these numbers(17A/342A6F78 and 17A/34366C30) I
> never got the same results on both servers. Is that mean slave is not synced
> with master?
With asynchronous replication, the slave will always lag behind the
master. The query I posted above will tell you by how much.
--
Stuart Bishop <stuart@stuartbishop.net>
http://www.stuartbishop.net/
From: Stuart Bishop <stuart@stuartbishop.net>
Date: Tue, Nov 27, 2012 at 2:03 PM
Subject: Re: [ADMIN] Monitoring Replication on Master/Slave Postgres(9.1)
To: Shams Khan <shams.khan22@gmail.com>
On Mon, Nov 26, 2012 at 5:05 PM, Shams Khan <shams.khan22@gmail.com> wrote:
> HI All,
>
> I want to know the ways to monitor the replication, whether the master and
> slave server are sync.
On the slave, run "SELECT now() - pg_last_xact_replay_timestamp() AS> HI All,
>
> I want to know the ways to monitor the replication, whether the master and
> slave server are sync.
time_lag;". That tells you how far behind in time the slave is.
> My question is how do we read these numbers(17A/342A6F78 and 17A/34366C30) I
> never got the same results on both servers. Is that mean slave is not synced
> with master?
master. The query I posted above will tell you by how much.
--
Stuart Bishop <stuart@stuartbishop.net>
http://www.stuartbishop.net/
Can anybody answer the below query???
---------- Forwarded message ----------
From: Shams Khan <shams.khan22@gmail.com>
Date: Mon, Nov 26, 2012 at 3:35 PM
Subject: Monitoring Replication on Master/Slave Postgres(9.1)
To: pgsql-admin@postgresql.org
HI All,
I want to know the ways to monitor the replication, whether the master and slave server are sync.
The way I am checking it is:
On Master:
[root@livedb1 ~]# ps -ef | grep sender
postgres 2152 7045 0 Nov05 ? 01:05:33 postgres: wal sender process postgres 10.0.0.2(45862) streaming 17A/35B62638
root 26030 10506 0 15:26 pts/2 00:00:00 grep sender
radius=# SELECT pg_current_xlog_location();
pg_current_xlog_location
--------------------------
17A/342A6F78
(1 row)
On slave database:
[root@liveDB2 ~]# ps -ef | grep receiver
root 11725 4286 0 15:28 pts/0 00:00:00 grep receiver
postgres 27250 3432 0 Nov05 ? 00:35:33 postgres: wal receiver process streaming 17A/363EE668
radius=# select pg_last_xlog_receive_location();
pg_last_xlog_receive_location
-------------------------------
17A/34366C30
(1 row)
If I go to pg_xlog folder: I checked on slave WAL files are generated. However I could not find the same result whenever I run pg_current_xlog_location(); on master and on the other side I get the different number (on slave)
My question is how do we read these numbers(17A/342A6F78 and 17A/34366C30) I never got the same results on both servers. Is that mean slave is not synced with master?
If yes, then how do we ensure my replication is working fine?
Please share you knowledge and through some light.
thanx
From: Shams Khan <shams.khan22@gmail.com>
Date: Mon, Nov 26, 2012 at 3:35 PM
Subject: Monitoring Replication on Master/Slave Postgres(9.1)
To: pgsql-admin@postgresql.org
HI All,
I want to know the ways to monitor the replication, whether the master and slave server are sync.
The way I am checking it is:
On Master:
[root@livedb1 ~]# ps -ef | grep sender
postgres 2152 7045 0 Nov05 ? 01:05:33 postgres: wal sender process postgres 10.0.0.2(45862) streaming 17A/35B62638
root 26030 10506 0 15:26 pts/2 00:00:00 grep sender
radius=# SELECT pg_current_xlog_location();
pg_current_xlog_location
--------------------------
17A/342A6F78
(1 row)
On slave database:
[root@liveDB2 ~]# ps -ef | grep receiver
root 11725 4286 0 15:28 pts/0 00:00:00 grep receiver
postgres 27250 3432 0 Nov05 ? 00:35:33 postgres: wal receiver process streaming 17A/363EE668
radius=# select pg_last_xlog_receive_location();
pg_last_xlog_receive_location
-------------------------------
17A/34366C30
(1 row)
If I go to pg_xlog folder: I checked on slave WAL files are generated. However I could not find the same result whenever I run pg_current_xlog_location(); on master and on the other side I get the different number (on slave)
My question is how do we read these numbers(17A/342A6F78 and 17A/34366C30) I never got the same results on both servers. Is that mean slave is not synced with master?
If yes, then how do we ensure my replication is working fine?
Please share you knowledge and through some light.
thanx
On 11/28/2012 10:21 AM, Shams Khan wrote: > ...how do we ensure my replication is working fine?... > Below is the core of one of my bash-script tools. It could use some tweaking (comments welcome) but works well. The script is run every minute by cron on master and standby servers. It auto-determines whether the server is currently a master or standby so the same script can be deployed to all servers. If a master-server, it updates a one-record test table with a current timestamp to ensure there is activity on the master. If a standby-server, it determines the lag based both on the age of pg_last_xact_replay_timestamp() and on the age of the record in the test table then returns the worst of the two. The delay value is set in $standby_delay which is a value in seconds. It's up to you to decide what constitutes an issue that requires attention (but remember that 60-seconds does not necessarily indicate a problem on an idle server). My first-level alert triggers at 130-seconds and I have never hit that much of a delay. #!/bin/bash # # Check PostgreSQL sync-status # # Requires table "sync_status" with column "sync_time" of type timestamp with time zone # We need a temp file tempquery="$(mktemp /tmp/monitor_db_synchronizationXXXXXXXXXX)" # If master, update sync_status timestamp and return 0. If standby, check both age # of log-replay location and of timestamp in sync_status table and set $standby_delay # to the greater of the two (in seconds) # standby_delay=$( psql -q --tuples-only --no-align 2>/dev/null <<EOS \o ${tempquery} select case when setting='on' then ' with logdelay as ( select case when pg_last_xlog_receive_location()=pg_last_xlog_replay_location() then 0::int else (extract(epoch from now())-extract(epoch from pg_last_xact_replay_timestamp()))::int end as replicadelay union select (extract(epoch from now())-extract(epoch from sync_time))::int as replicadelay from sync_status ) select max(replicadelay) from logdelay ; ' else ' begin; delete from sync_status; insert into sync_status (sync_time) values (now()) returning 0::int as replicadelay; commit; ' end from pg_settings where name='transaction_read_only'; \o \i ${tempquery} EOS ) # Cleanup temp file test -f "${tempquery}" && rm "${tempquery}" # Do some alert based on the number of seconds of lag between master and standby here Cheers, Steve
Thanks for the response Steve...It was really helpful:
Below are some doubts I wanted to clarify..please read and suggest.
Can we also check if replication was broken earlier...somehow due to power failure of some other reasons in past...The reason I am asking is I am getting some discrepancies in data between master and slave...? I want to know the reason of it...
thanx
Below are some doubts I wanted to clarify..please read and suggest.
Can we also check if replication was broken earlier...somehow due to power failure of some other reasons in past...The reason I am asking is I am getting some discrepancies in data between master and slave...? I want to know the reason of it...
thanx
On Thu, Nov 29, 2012 at 12:32 AM, Steve Crawford <scrawford@pinpointresearch.com> wrote:
On 11/28/2012 10:21 AM, Shams Khan wrote:...how do we ensure my replication is working fine?...
Below is the core of one of my bash-script tools. It could use some tweaking (comments welcome) but works well. The script is run every minute by cron on master and standby servers. It auto-determines whether the server is currently a master or standby so the same script can be deployed to all servers.
If a master-server, it updates a one-record test table with a current timestamp to ensure there is activity on the master.
If a standby-server, it determines the lag based both on the age of pg_last_xact_replay_timestamp() and on the age of the record in the test table then returns the worst of the two.
The delay value is set in $standby_delay which is a value in seconds. It's up to you to decide what constitutes an issue that requires attention (but remember that 60-seconds does not necessarily indicate a problem on an idle server). My first-level alert triggers at 130-seconds and I have never hit that much of a delay.
#!/bin/bash
#
# Check PostgreSQL sync-status
#
# Requires table "sync_status" with column "sync_time" of type timestamp with time zone
# We need a temp file
tempquery="$(mktemp /tmp/monitor_db_synchronizationXXXXXXXXXX)"
# If master, update sync_status timestamp and return 0. If standby, check both age
# of log-replay location and of timestamp in sync_status table and set $standby_delay
# to the greater of the two (in seconds)
#
standby_delay=$(
psql -q --tuples-only --no-align 2>/dev/null <<EOS
\o ${tempquery}
select
case when setting='on' then
'
with logdelay as
(
select
case when
pg_last_xlog_receive_location()=pg_last_xlog_replay_location() then 0::int
else
(extract(epoch from now())-extract(epoch from pg_last_xact_replay_timestamp()))::int
end as replicadelay
union
select
(extract(epoch from now())-extract(epoch from sync_time))::int as replicadelay
from
sync_status
)
select
max(replicadelay)
from
logdelay
;
'
else
'
begin;
delete from sync_status;
insert into sync_status (sync_time) values (now()) returning 0::int as replicadelay;
commit;
'
end
from pg_settings where name='transaction_read_only';
\o
\i ${tempquery}
EOS
)
# Cleanup temp file
test -f "${tempquery}" && rm "${tempquery}"
# Do some alert based on the number of seconds of lag between master and standby here
Cheers,
Steve
--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
The Bucardo check_postgres module contains a hot_standby_delay check function which will calculate the delta between thexlog position of the master with the slave(s). http://bucardo.org/check_postgres/check_postgres.pl.html#hot_standby_delay -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Steve Crawford Sent: Wednesday, November 28, 2012 1:02 PM To: pgsql-admin@postgresql.org Subject: Re: [ADMIN] Fwd: Monitoring Replication on Master/Slave Postgres(9.1) On 11/28/2012 10:21 AM, Shams Khan wrote: > ...how do we ensure my replication is working fine?... > Below is the core of one of my bash-script tools. It could use some tweaking (comments welcome) but works well. The scriptis run every minute by cron on master and standby servers. It auto-determines whether the server is currently a masteror standby so the same script can be deployed to all servers. If a master-server, it updates a one-record test table with a current timestamp to ensure there is activity on the master. If a standby-server, it determines the lag based both on the age of pg_last_xact_replay_timestamp() and on the age of the record in the test table then returns the worst of the two. The delay value is set in $standby_delay which is a value in seconds. It's up to you to decide what constitutes an issue that requires attention (but remember that 60-seconds does not necessarilyindicate a problem on an idle server). My first-level alert triggers at 130-seconds and I have never hit thatmuch of a delay. #!/bin/bash # # Check PostgreSQL sync-status # # Requires table "sync_status" with column "sync_time" of type timestamp with time zone # We need a temp file tempquery="$(mktemp /tmp/monitor_db_synchronizationXXXXXXXXXX)" # If master, update sync_status timestamp and return 0. If standby, check both age # of log-replay location and of timestamp in sync_status table and set $standby_delay # to the greater of the two (in seconds) # standby_delay=$( psql -q --tuples-only --no-align 2>/dev/null <<EOS \o ${tempquery} select case when setting='on' then ' with logdelay as ( select case when pg_last_xlog_receive_location()=pg_last_xlog_replay_location() then 0::int else (extract(epoch from now())-extract(epoch from pg_last_xact_replay_timestamp()))::int end as replicadelay union select (extract(epoch from now())-extract(epoch from sync_time))::int as replicadelay from sync_status ) select max(replicadelay) from logdelay ; ' else ' begin; delete from sync_status; insert into sync_status (sync_time) values (now()) returning 0::int as replicadelay; commit; ' end from pg_settings where name='transaction_read_only'; \o \i ${tempquery} EOS ) # Cleanup temp file test -f "${tempquery}" && rm "${tempquery}" # Do some alert based on the number of seconds of lag between master and standby here Cheers, Steve -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin
On 11/28/2012 11:24 AM, Shams Khan wrote: > Thanks for the response Steve...It was really helpful: > > Below are some doubts I wanted to clarify..please read and suggest. > Can we also check if replication was broken earlier...somehow due to > power failure of some other reasons in past...The reason I am asking > is I am getting some discrepancies in data between master and > slave...? I want to know the reason of it... > What sort of discrepancies? I am unaware of situations in a properly configured and functioning system where the standby could be anything other than lagging the master but I didn't dig through release notes to see if there were any related bugs. Cheers, Steve
I use "select * from pg_stat_replication;" and compare columns: sent_location | write_location | flush_location | replay_location
Gabriel.
Gabriel.
On Wed, Nov 28, 2012 at 7:09 PM, Steve Crawford <scrawford@pinpointresearch.com> wrote:
On 11/28/2012 11:24 AM, Shams Khan wrote:What sort of discrepancies? I am unaware of situations in a properly configured and functioning system where the standby could be anything other than lagging the master but I didn't dig through release notes to see if there were any related bugs.Thanks for the response Steve...It was really helpful:
Below are some doubts I wanted to clarify..please read and suggest.
Can we also check if replication was broken earlier...somehow due to power failure of some other reasons in past...The reason I am asking is I am getting some discrepancies in data between master and slave...? I want to know the reason of it...
Cheers,
Steve
--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin