Обсуждение: Monitoring Replication on Master/Slave Postgres(9.1)

Поиск

Список

Период

Сортировка

Monitoring Replication on Master/Slave Postgres(9.1)

От

Shams Khan

Дата:

26 ноября 2012 г., 10:05:14

HI All,

I want to know the ways to monitor the replication, whether the master and slave server are sync.
The way I am checking it is:

On Master:

[root@livedb1 ~]# ps -ef | grep sender
postgres 2152 7045 0 Nov05 ?        01:05:33 postgres: wal sender process postgres 10.0.0.2(45862) streaming 17A/35B62638
root     26030 10506 0 15:26 pts/2    00:00:00 grep sender

radius=# SELECT pg_current_xlog_location();
pg_current_xlog_location
--------------------------
17A/342A6F78
(1 row)

On slave database:

[root@liveDB2 ~]# ps -ef | grep receiver
root     11725 4286 0 15:28 pts/0    00:00:00 grep receiver
postgres 27250 3432 0 Nov05 ?        00:35:33 postgres: wal receiver process   streaming 17A/363EE668

radius=# select pg_last_xlog_receive_location();
pg_last_xlog_receive_location
-------------------------------
17A/34366C30
(1 row)

If I go to pg_xlog folder: I checked on slave WAL files are generated. However I could not find the same result whenever I run pg_current_xlog_location(); on master and on the other side I get the different number (on slave)

My question is how do we read these numbers(17A/342A6F78 and 17A/34366C30) I never got the same results on both servers. Is that mean slave is not synced with master?
If yes, then how do we ensure my replication is working fine?

Please share you knowledge and through some light.

thanx

Fwd: Monitoring Replication on Master/Slave Postgres(9.1)

От

Shams Khan

Дата:

27 ноября 2012 г., 07:37:14

Can somebody help me on that????

---------- Forwarded message ----------
From: Shams Khan <shams.khan22@gmail.com>
Date: Mon, Nov 26, 2012 at 3:35 PM
Subject: Monitoring Replication on Master/Slave Postgres(9.1)
To: pgsql-admin@postgresql.org

HI All,

I want to know the ways to monitor the replication, whether the master and slave server are sync.
The way I am checking it is:

On Master:

[root@livedb1 ~]# ps -ef | grep sender
postgres 2152 7045 0 Nov05 ?        01:05:33 postgres: wal sender process postgres 10.0.0.2(45862) streaming 17A/35B62638
root     26030 10506 0 15:26 pts/2    00:00:00 grep sender

radius=# SELECT pg_current_xlog_location();
pg_current_xlog_location
--------------------------
17A/342A6F78
(1 row)

On slave database:

[root@liveDB2 ~]# ps -ef | grep receiver
root     11725 4286 0 15:28 pts/0    00:00:00 grep receiver
postgres 27250 3432 0 Nov05 ?        00:35:33 postgres: wal receiver process   streaming 17A/363EE668

radius=# select pg_last_xlog_receive_location();
pg_last_xlog_receive_location
-------------------------------
17A/34366C30
(1 row)

If I go to pg_xlog folder: I checked on slave WAL files are generated. However I could not find the same result whenever I run pg_current_xlog_location(); on master and on the other side I get the different number (on slave)

My question is how do we read these numbers(17A/342A6F78 and 17A/34366C30) I never got the same results on both servers. Is that mean slave is not synced with master?
If yes, then how do we ensure my replication is working fine?

Please share you knowledge and through some light.

thanx

Re: Fwd: Monitoring Replication on Master/Slave Postgres(9.1)

От

Shams Khan

Дата:

27 ноября 2012 г., 08:03:02

Thanks for the response:

I tried it and got the below result:

radius=# select procpid,usesysid, usename, application_name, client_addr, state, sent_location,write_location,

On Tue, Nov 27, 2012 at 1:26 PM, Shams Khan <shams.khan22@gmail.com> wrote:

Thanks for the response:

I tried it and got the below result:

radius=# select procpid,usesysid, usename, application_name, client_addr, state, sent_location,write_location,sync_state from pg_stat_replication;
procpid | usesysid | usename | application_name | client_addr |   state   | sent_location | write_location | sync_state
---------+----------+----------+------------------+-------------+-----------+---------------+----------------+------------
    2152 |       10 | postgres | walreceiver      | 10.0.0.2    | streaming | 17B/EBA37AD8 | 17B/EBA37AD8   | async
(1 row)

The last column says async....could you please tell me how can we sync it?

On Tue, Nov 27, 2012 at 1:15 PM, Sergey Garas <aorashi@gmail.com> wrote:
on the master use

select procpid,usesysid, usename, application_name, client_addr, state, sent_location,write_location,sync_state from pg_stat_replication;

the last field is the state of your repliocation status sync or async

Best regards,
Sergey Garas

2012/11/27 Shams Khan <shams.khan22@gmail.com>
Can somebody help me on that????

---------- Forwarded message ----------
From: Shams Khan <shams.khan22@gmail.com>
Date: Mon, Nov 26, 2012 at 3:35 PM
Subject: Monitoring Replication on Master/Slave Postgres(9.1)
To: pgsql-admin@postgresql.org

HI All,

I want to know the ways to monitor the replication, whether the master and slave server are sync.
The way I am checking it is:

On Master:

[root@livedb1 ~]# ps -ef | grep sender
postgres 2152 7045 0 Nov05 ?        01:05:33 postgres: wal sender process postgres 10.0.0.2(45862) streaming 17A/35B62638
root     26030 10506 0 15:26 pts/2    00:00:00 grep sender

radius=# SELECT pg_current_xlog_location();
pg_current_xlog_location
--------------------------
17A/342A6F78
(1 row)

On slave database:

[root@liveDB2 ~]# ps -ef | grep receiver
root     11725 4286 0 15:28 pts/0    00:00:00 grep receiver
postgres 27250 3432 0 Nov05 ?        00:35:33 postgres: wal receiver process   streaming 17A/363EE668

radius=# select pg_last_xlog_receive_location();
pg_last_xlog_receive_location
-------------------------------
17A/34366C30
(1 row)

If I go to pg_xlog folder: I checked on slave WAL files are generated. However I could not find the same result whenever I run pg_current_xlog_location(); on master and on the other side I get the different number (on slave)

My question is how do we read these numbers(17A/342A6F78 and 17A/34366C30) I never got the same results on both servers. Is that mean slave is not synced with master?
If yes, then how do we ensure my replication is working fine?

Please share you knowledge and through some light.

thanx

Re: Monitoring Replication on Master/Slave Postgres(9.1)

От

Stuart Bishop

Дата:

27 ноября 2012 г., 08:34:40

On Mon, Nov 26, 2012 at 5:05 PM, Shams Khan <shams.khan22@gmail.com> wrote:

> I want to know the ways to monitor the replication, whether the master and
> slave server are sync.

On the slave, run "SELECT now() - pg_last_xact_replay_timestamp() AS
time_lag;". That tells you how far behind in time the slave is.

> My question is how do we read these numbers(17A/342A6F78 and 17A/34366C30) I
> never got the same results on both servers. Is that mean slave is not synced
> with master?

With asynchronous replication, the slave will always lag behind the
master. The query I posted above will tell you by how much.

--
Stuart Bishop <stuart@stuartbishop.net>
http://www.stuartbishop.net/

Fwd: Monitoring Replication on Master/Slave Postgres(9.1)

От

Shams Khan

Дата:

27 ноября 2012 г., 09:39:34

Thanks for the response Stuart...It was really helpful:

Below are some doubts I wanted to clarify..please read and suggest.
Can we also check if replication was broken earlier...somehow due to power failure of some other reasons in past...The reason I am asking is I am getting some discrepancies in data between master and slave...? I want to know the reason of it...

thanx

---------- Forwarded message ----------
From: Stuart Bishop <stuart@stuartbishop.net>
Date: Tue, Nov 27, 2012 at 2:03 PM
Subject: Re: [ADMIN] Monitoring Replication on Master/Slave Postgres(9.1)
To: Shams Khan <shams.khan22@gmail.com>

On Mon, Nov 26, 2012 at 5:05 PM, Shams Khan <shams.khan22@gmail.com> wrote:
> HI All,
>
> I want to know the ways to monitor the replication, whether the master and
> slave server are sync.

On the slave, run "SELECT now() - pg_last_xact_replay_timestamp() AS
time_lag;". That tells you how far behind in time the slave is.

> My question is how do we read these numbers(17A/342A6F78 and 17A/34366C30) I
> never got the same results on both servers. Is that mean slave is not synced
> with master?

With asynchronous replication, the slave will always lag behind the
master. The query I posted above will tell you by how much.

--
Stuart Bishop <stuart@stuartbishop.net>
http://www.stuartbishop.net/

Fwd: Monitoring Replication on Master/Slave Postgres(9.1)

От

Shams Khan

Дата:

28 ноября 2012 г., 18:21:57

Can anybody answer the below query???

Re: Fwd: Monitoring Replication on Master/Slave Postgres(9.1)

От

Steve Crawford

Дата:

28 ноября 2012 г., 19:02:31

On 11/28/2012 10:21 AM, Shams Khan wrote:
> ...how do we ensure my replication is working fine?...
>

Below is the core of one of my bash-script tools. It could use some
tweaking (comments welcome) but works well. The script is run every
minute by cron on master and standby servers. It auto-determines whether
the server is currently a master or standby so the same script can be
deployed to all servers.

If a master-server, it updates a one-record test table with a current
timestamp to ensure there is activity on the master.

If a standby-server, it determines the lag based both on the age of
pg_last_xact_replay_timestamp() and on the age of the record in the test
table then returns the worst of the two.

The delay value is set in $standby_delay which is a value in seconds.
It's up to you to decide what constitutes an issue that requires
attention (but remember that 60-seconds does not necessarily indicate a
problem on an idle server). My first-level alert triggers at 130-seconds
and I have never hit that much of a delay.

#!/bin/bash
#
# Check PostgreSQL sync-status
#
# Requires table "sync_status" with column "sync_time" of type timestamp
with time zone

# We need a temp file
tempquery="$(mktemp /tmp/monitor_db_synchronizationXXXXXXXXXX)"

# If master, update sync_status timestamp and return 0. If standby,
check both age
# of log-replay location and of timestamp in sync_status table and set
$standby_delay
# to the greater of the two (in seconds)
#
standby_delay=$(
psql -q --tuples-only --no-align 2>/dev/null <<EOS
\o ${tempquery}
select
case when setting='on' then
'
with logdelay as
     (
     select
     case when
pg_last_xlog_receive_location()=pg_last_xlog_replay_location() then 0::int
     else
         (extract(epoch from now())-extract(epoch from
pg_last_xact_replay_timestamp()))::int
     end as replicadelay
     union
     select
         (extract(epoch from now())-extract(epoch from sync_time))::int
as replicadelay
     from
         sync_status
     )
select
     max(replicadelay)
from
     logdelay
;
'
else
'
begin;
delete from sync_status;
insert into sync_status (sync_time) values (now()) returning 0::int as
replicadelay;
commit;
'
end
from pg_settings where name='transaction_read_only';
\o
\i ${tempquery}
EOS
)

# Cleanup temp file
test -f "${tempquery}" && rm "${tempquery}"

# Do some alert based on the number of seconds of lag between master and
standby here

Cheers,
Steve

Re: Fwd: Monitoring Replication on Master/Slave Postgres(9.1)

От

Shams Khan

Дата:

28 ноября 2012 г., 19:25:01

Thanks for the response Steve...It was really helpful:

Below are some doubts I wanted to clarify..please read and suggest.
Can we also check if replication was broken earlier...somehow due to power failure of some other reasons in past...The reason I am asking is I am getting some discrepancies in data between master and slave...? I want to know the reason of it...

thanx

On Thu, Nov 29, 2012 at 12:32 AM, Steve Crawford <scrawford@pinpointresearch.com> wrote:

On 11/28/2012 10:21 AM, Shams Khan wrote:
...how do we ensure my replication is working fine?...

Below is the core of one of my bash-script tools. It could use some tweaking (comments welcome) but works well. The script is run every minute by cron on master and standby servers. It auto-determines whether the server is currently a master or standby so the same script can be deployed to all servers.

If a master-server, it updates a one-record test table with a current timestamp to ensure there is activity on the master.

If a standby-server, it determines the lag based both on the age of pg_last_xact_replay_timestamp() and on the age of the record in the test table then returns the worst of the two.

The delay value is set in $standby_delay which is a value in seconds. It's up to you to decide what constitutes an issue that requires attention (but remember that 60-seconds does not necessarily indicate a problem on an idle server). My first-level alert triggers at 130-seconds and I have never hit that much of a delay.

#!/bin/bash
#
# Check PostgreSQL sync-status
#
# Requires table "sync_status" with column "sync_time" of type timestamp with time zone

# We need a temp file
tempquery="$(mktemp /tmp/monitor_db_synchronizationXXXXXXXXXX)"

# If master, update sync_status timestamp and return 0. If standby, check both age
# of log-replay location and of timestamp in sync_status table and set $standby_delay
# to the greater of the two (in seconds)
#
standby_delay=$(
psql -q --tuples-only --no-align 2>/dev/null <<EOS
\o ${tempquery}
select
case when setting='on' then
'
with logdelay as
(
select
case when
pg_last_xlog_receive_location()=pg_last_xlog_replay_location() then 0::int
else
(extract(epoch from now())-extract(epoch from pg_last_xact_replay_timestamp()))::int
end as replicadelay
union
select
(extract(epoch from now())-extract(epoch from sync_time))::int as replicadelay
from
sync_status
)
select
max(replicadelay)
from
logdelay
;
'
else
'
begin;
delete from sync_status;
insert into sync_status (sync_time) values (now()) returning 0::int as replicadelay;
commit;
'
end
from pg_settings where name='transaction_read_only';
\o
\i ${tempquery}
EOS
)

# Cleanup temp file
test -f "${tempquery}" && rm "${tempquery}"

# Do some alert based on the number of seconds of lag between master and standby here

Cheers,
Steve

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: Fwd: Monitoring Replication on Master/Slave Postgres(9.1)

От

Kris Bushover

Дата:

28 ноября 2012 г., 19:29:52

The Bucardo check_postgres module contains a hot_standby_delay check function which will calculate the delta between
thexlog position of the master with the slave(s). 

http://bucardo.org/check_postgres/check_postgres.pl.html#hot_standby_delay

-----Original Message-----
From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Steve Crawford
Sent: Wednesday, November 28, 2012 1:02 PM
To: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Fwd: Monitoring Replication on Master/Slave Postgres(9.1)

On 11/28/2012 10:21 AM, Shams Khan wrote:
> ...how do we ensure my replication is working fine?...
>

Below is the core of one of my bash-script tools. It could use some tweaking (comments welcome) but works well. The
scriptis run every minute by cron on master and standby servers. It auto-determines whether the server is currently a
masteror standby so the same script can be deployed to all servers. 

If a master-server, it updates a one-record test table with a current timestamp to ensure there is activity on the
master.

If a standby-server, it determines the lag based both on the age of
pg_last_xact_replay_timestamp() and on the age of the record in the test table then returns the worst of the two.

The delay value is set in $standby_delay which is a value in seconds.
It's up to you to decide what constitutes an issue that requires attention (but remember that 60-seconds does not
necessarilyindicate a problem on an idle server). My first-level alert triggers at 130-seconds and I have never hit
thatmuch of a delay. 

#!/bin/bash
#
# Check PostgreSQL sync-status
#
# Requires table "sync_status" with column "sync_time" of type timestamp with time zone

# We need a temp file
tempquery="$(mktemp /tmp/monitor_db_synchronizationXXXXXXXXXX)"

# If master, update sync_status timestamp and return 0. If standby,
check both age
# of log-replay location and of timestamp in sync_status table and set
$standby_delay
# to the greater of the two (in seconds)
#
standby_delay=$(
psql -q --tuples-only --no-align 2>/dev/null <<EOS
\o ${tempquery}
select
case when setting='on' then
'
with logdelay as
     (
     select
     case when
pg_last_xlog_receive_location()=pg_last_xlog_replay_location() then 0::int
     else
         (extract(epoch from now())-extract(epoch from
pg_last_xact_replay_timestamp()))::int
     end as replicadelay
     union
     select
         (extract(epoch from now())-extract(epoch from sync_time))::int
as replicadelay
     from
         sync_status
     )
select
     max(replicadelay)
from
     logdelay
;
'
else
'
begin;
delete from sync_status;
insert into sync_status (sync_time) values (now()) returning 0::int as
replicadelay;
commit;
'
end
from pg_settings where name='transaction_read_only';
\o
\i ${tempquery}
EOS
)

# Cleanup temp file
test -f "${tempquery}" && rm "${tempquery}"

# Do some alert based on the number of seconds of lag between master and
standby here

Cheers,
Steve

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: Fwd: Monitoring Replication on Master/Slave Postgres(9.1)

От

Steve Crawford

Дата:

28 ноября 2012 г., 22:09:40

On 11/28/2012 11:24 AM, Shams Khan wrote:
> Thanks for the response Steve...It was really helpful:
>
> Below are some doubts I wanted to clarify..please read and suggest.
> Can we also check if replication was broken earlier...somehow due to
> power failure of some other reasons in past...The reason I am asking
> is I am getting some discrepancies in data between master and
> slave...? I want to know the reason of it...
>
What sort of discrepancies? I am unaware of situations in a properly
configured and functioning system where the standby could be anything
other than lagging the master but I didn't dig through release notes to
see if there were any related bugs.

Cheers,
Steve

Re: Fwd: Monitoring Replication on Master/Slave Postgres(9.1)

От

Gabriel Muñoz

Дата:

29 ноября 2012 г., 02:31:19

I use "select * from pg_stat_replication;" and compare columns: sent_location | write_location | flush_location | replay_location

Gabriel.

On Wed, Nov 28, 2012 at 7:09 PM, Steve Crawford <scrawford@pinpointresearch.com> wrote:

On 11/28/2012 11:24 AM, Shams Khan wrote:
Thanks for the response Steve...It was really helpful:

Below are some doubts I wanted to clarify..please read and suggest.
Can we also check if replication was broken earlier...somehow due to power failure of some other reasons in past...The reason I am asking is I am getting some discrepancies in data between master and slave...? I want to know the reason of it...

What sort of discrepancies? I am unaware of situations in a properly configured and functioning system where the standby could be anything other than lagging the master but I didn't dig through release notes to see if there were any related bugs.

Cheers,
Steve

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Monitoring Replication on Master/Slave Postgres(9.1)