Обсуждение: Replication lag from transaction logs

Поиск

Список

Период

Сортировка

Replication lag from transaction logs

От

Debraj Manna

Дата:

18 июня 2018 г., 15:43:36

Is it possible to figure out the replication lag from transaction log files in the slave and master without querying postgres on the slave? Basically figuring out the replication lag in a slave when the postgres is down in that slave.

Re: Replication lag from transaction logs

От

Fabio Pardi

Дата:

18 июня 2018 г., 15:56:28

That I know, if standby is down, you cannot know exactly the last applied transaction log file.

But you could get a gross estimate if you check how many WAL files is the standby behind, just listing the WAL files on the master and on the standby.

Given that you can login on the standby machine.

regards,

fabio pardi

On 18/06/18 14:43, Debraj Manna wrote:

Is it possible to figure out the replication lag from transaction log files in the slave and master without querying postgres on the slave? Basically figuring out the replication lag in a slave when the postgres is down in that slave.

Re: Replication lag from transaction logs

От

Keith

Дата:

18 июня 2018 г., 16:22:08

You can use pg_controldata on any data directory to get its rough position in WAL replay, even when the database isn't running

https://www.postgresql.org/docs/current/static/app-pgcontroldata.html

On Mon, Jun 18, 2018 at 8:56 AM, Fabio Pardi <f.pardi@portavita.eu> wrote:

That I know, if standby is down, you cannot know exactly the last applied transaction log file.
But you could get a gross estimate if you check how many WAL files is the standby behind, just listing the WAL files on the master and on the standby.
Given that you can login on the standby machine.
regards,
fabio pardi

On 18/06/18 14:43, Debraj Manna wrote:
Is it possible to figure out the replication lag from transaction log files in the slave and master without querying postgres on the slave? Basically figuring out the replication lag in a slave when the postgres is down in that slave.

Re: Replication lag from transaction logs

От

Debraj Manna

Дата:

18 июня 2018 г., 16:59:56

Thanks Keith I am looking something like the solution you suggested.

Once I run pg_controldata on the slave data directory it will be give me the latest checkpoint like below. Can you let me know a bit more detail on how can I get the approximate lag in the slave in bytes from this?

Latest checkpoint location: 2F9/B38DE758

On Mon 18 Jun, 2018, 6:53 PM Keith, <keith@keithf4.com> wrote:

You can use pg_controldata on any data directory to get its rough position in WAL replay, even when the database isn't running

https://www.postgresql.org/docs/current/static/app-pgcontroldata.html

On Mon, Jun 18, 2018 at 8:56 AM, Fabio Pardi <f.pardi@portavita.eu> wrote:
That I know, if standby is down, you cannot know exactly the last applied transaction log file.
But you could get a gross estimate if you check how many WAL files is the standby behind, just listing the WAL files on the master and on the standby.
Given that you can login on the standby machine.
regards,
fabio pardi

On 18/06/18 14:43, Debraj Manna wrote:
Is it possible to figure out the replication lag from transaction log files in the slave and master without querying postgres on the slave? Basically figuring out the replication lag in a slave when the postgres is down in that slave.

Re: Replication lag from transaction logs

От

Keith

Дата:

18 июня 2018 г., 17:09:24

The portion before the slash is the actual WAL file itself that it got to start replaying, usually the last few characters of the WAL file name. The portion after the slash is the position in the WAL file itself. Default size of each WAL is 16MB, so if that first portion doesn't match the primary, you can count back that many WAL files x16MB. If it's on the same WAL file, I'm not really sure how to get a more precise byte lag than that.

Keith

On Mon, Jun 18, 2018 at 9:59 AM, Debraj Manna <subharaj.manna@gmail.com> wrote:

Thanks Keith I am looking something like the solution you suggested.

Once I run pg_controldata on the slave data directory it will be give me the latest checkpoint like below. Can you let me know a bit more detail on how can I get the approximate lag in the slave in bytes from this?

Latest checkpoint location: 2F9/B38DE758

On Mon 18 Jun, 2018, 6:53 PM Keith, <keith@keithf4.com> wrote:
You can use pg_controldata on any data directory to get its rough position in WAL replay, even when the database isn't running

https://www.postgresql.org/docs/current/static/app-pgcontroldata.html

On Mon, Jun 18, 2018 at 8:56 AM, Fabio Pardi <f.pardi@portavita.eu> wrote:
That I know, if standby is down, you cannot know exactly the last applied transaction log file.
But you could get a gross estimate if you check how many WAL files is the standby behind, just listing the WAL files on the master and on the standby.
Given that you can login on the standby machine.
regards,
fabio pardi

On 18/06/18 14:43, Debraj Manna wrote:
Is it possible to figure out the replication lag from transaction log files in the slave and master without querying postgres on the slave? Basically figuring out the replication lag in a slave when the postgres is down in that slave.

Re: Replication lag from transaction logs

От

Debraj Manna

Дата:

18 июня 2018 г., 18:56:55

Thanks Keith this is useful.

One more query if I need to know that if a have fallen too far behind and the WAL is not available. I guess I can do this. Let me know if my understanding is correct.

Run pg_controldata <DATA_DIR> on the slave node which has been down for long.
It will output the details about the WAL along with the WAL file name from where it will start the replication. (Field to look for in the output– 'Latest checkpoint's REDO WAL file')
Then check if the file mentioned in ` 'Latest checkpoint's REDO WAL file' is present in master `pg_wal` directory. If not then slave have fallen too far behind and will not be able to recover from WAL.

On Mon, Jun 18, 2018 at 7:39 PM, Keith <keith@keithf4.com> wrote:

The portion before the slash is the actual WAL file itself that it got to start replaying, usually the last few characters of the WAL file name. The portion after the slash is the position in the WAL file itself. Default size of each WAL is 16MB, so if that first portion doesn't match the primary, you can count back that many WAL files x16MB. If it's on the same WAL file, I'm not really sure how to get a more precise byte lag than that.

Keith

On Mon, Jun 18, 2018 at 9:59 AM, Debraj Manna <subharaj.manna@gmail.com> wrote:
Thanks Keith I am looking something like the solution you suggested.

Once I run pg_controldata on the slave data directory it will be give me the latest checkpoint like below. Can you let me know a bit more detail on how can I get the approximate lag in the slave in bytes from this?

Latest checkpoint location: 2F9/B38DE758

On Mon 18 Jun, 2018, 6:53 PM Keith, <keith@keithf4.com> wrote:
You can use pg_controldata on any data directory to get its rough position in WAL replay, even when the database isn't running

https://www.postgresql.org/docs/current/static/app-pgcontroldata.html

On Mon, Jun 18, 2018 at 8:56 AM, Fabio Pardi <f.pardi@portavita.eu> wrote:
That I know, if standby is down, you cannot know exactly the last applied transaction log file.
But you could get a gross estimate if you check how many WAL files is the standby behind, just listing the WAL files on the master and on the standby.
Given that you can login on the standby machine.
regards,
fabio pardi

On 18/06/18 14:43, Debraj Manna wrote:
Is it possible to figure out the replication lag from transaction log files in the slave and master without querying postgres on the slave? Basically figuring out the replication lag in a slave when the postgres is down in that slave.

Re: Replication lag from transaction logs

От

Scott Ribe

Дата:

18 июня 2018 г., 20:03:27

> On Jun 18, 2018, at 9:56 AM, Debraj Manna <subharaj.manna@gmail.com> wrote:
>
> Thanks Keith this is useful.
>
> One more query if I need to know that if a have fallen too far behind and the WAL is not available. I guess I can do
this.Let me know if my understanding is correct. 
>     • Run pg_controldata <DATA_DIR> on the slave node which has been down for long.
>     • It will output the details about the WAL along with the WAL file name from where it will start the replication.
(Fieldto look for in the output– 'Latest checkpoint's REDO WAL file') 
>     • Then check if the file mentioned in ` 'Latest checkpoint's REDO WAL file' is present in master `pg_wal`
directory.If not then slave have fallen too far behind and will not be able to recover from WAL.   

You could also try bringing the slave back up, and monitoring the log for the error about needed WAL file not being
available--thisavoids the race condition between checking that all WAL is available and restarting the slave. 

--
Scott Ribe
scott_ribe@elevated-dev.com
https://www.linkedin.com/in/scottribe/

Re: Replication lag from transaction logs

От

Keith

Дата:

18 июня 2018 г., 20:57:53

Yes, in general using pg_basebackup is not ideal to monitor for replica lag. I was just providing a means to do so from an offline instance. For normal monitoring, if you want to monitor byte lag, you can query that from the primary system. Or you can also check for lag from the replica as well.

https://www.keithf4.com/monitoring_streaming_slave_lag/

Keith

On Mon, Jun 18, 2018 at 1:03 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote:

> On Jun 18, 2018, at 9:56 AM, Debraj Manna <subharaj.manna@gmail.com> wrote:
>
> Thanks Keith this is useful.
>
> One more query if I need to know that if a have fallen too far behind and the WAL is not available. I guess I can do this. Let me know if my understanding is correct.
> • Run pg_controldata <DATA_DIR> on the slave node which has been down for long.
> • It will output the details about the WAL along with the WAL file name from where it will start the replication. (Field to look for in the output– 'Latest checkpoint's REDO WAL file')
> • Then check if the file mentioned in ` 'Latest checkpoint's REDO WAL file' is present in master `pg_wal` directory. If not then slave have fallen too far behind and will not be able to recover from WAL.

You could also try bringing the slave back up, and monitoring the log for the error about needed WAL file not being available--this avoids the race condition between checking that all WAL is available and restarting the slave.

--
Scott Ribe
scott_ribe@elevated-dev.com
https://www.linkedin.com/in/scottribe/

Re: Replication lag from transaction logs

От

Keith

Дата:

18 июня 2018 г., 21:27:41

Sorry, I mean pg_controldata, not pg_basebackup.

On Mon, Jun 18, 2018 at 1:57 PM, Keith <keith@keithf4.com> wrote:

Yes, in general using pg_basebackup is not ideal to monitor for replica lag. I was just providing a means to do so from an offline instance. For normal monitoring, if you want to monitor byte lag, you can query that from the primary system. Or you can also check for lag from the replica as well.

https://www.keithf4.com/monitoring_streaming_slave_lag/

Keith

On Mon, Jun 18, 2018 at 1:03 PM, Scott Ribe <scott_ribe@elevated-dev.com> wrote:

> On Jun 18, 2018, at 9:56 AM, Debraj Manna <subharaj.manna@gmail.com> wrote:
>
> Thanks Keith this is useful.
>
> One more query if I need to know that if a have fallen too far behind and the WAL is not available. I guess I can do this. Let me know if my understanding is correct.
> • Run pg_controldata <DATA_DIR> on the slave node which has been down for long.
> • It will output the details about the WAL along with the WAL file name from where it will start the replication. (Field to look for in the output– 'Latest checkpoint's REDO WAL file')
> • Then check if the file mentioned in ` 'Latest checkpoint's REDO WAL file' is present in master `pg_wal` directory. If not then slave have fallen too far behind and will not be able to recover from WAL.

You could also try bringing the slave back up, and monitoring the log for the error about needed WAL file not being available--this avoids the race condition between checking that all WAL is available and restarting the slave.

--
Scott Ribe
scott_ribe@elevated-dev.com
https://www.linkedin.com/in/scottribe/

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Replication lag from transaction logs