Re: Calculating Replication Lag - units

Поиск
Список
Период
Сортировка
От Steve Crawford
Тема Re: Calculating Replication Lag - units
Дата
Msg-id 4FE9DFB8.3050605@pinpointresearch.com
обсуждение исходный текст
Ответ на Re: Calculating Replication Lag - units  (David Kerr <dmk@mr-paradox.net>)
Ответы Re: Calculating Replication Lag - units  (David Kerr <dmk@mr-paradox.net>)
Список pgsql-general
On 06/26/2012 08:16 AM, David Kerr wrote:
> On 06/26/2012 05:11 AM, Stuart Bishop wrote:
>> On Tue, Jun 26, 2012 at 6:21 AM, David Kerr<dmk@mr-paradox.net>  wrote:
>>> On Mon, Jun 25, 2012 at 02:17:22PM -0700, Steve Crawford wrote:
>>> - On 06/25/2012 01:17 PM, David Kerr wrote:
>>> ->Howdy,
>>> ->
>>> ->When calculating Replication lag, I know that we have to compare the
>>> ->pg_current_xlog_location
>>> ->to pg_last_xlog_receive_location, etc. but what I'm trying to
>>> figure out
>>> ->is what are
>>> ->the units that I'm left with after the calculation.
>>> ->
>>> ->(i.e., does the xlog_location imply some time value?)
>>> ->
>>> ->Here's the output of the (slightly modified script)
>>> ->Master: 5003964876715
>>> ->Receive: 5003964876715
>>> ->Replay: 5003964765203
>>> ->
>>> ->receive.value 0
>>> ->apply.value 111512
>>> ->
>>> ->111512 isn't inherently useful to me on its own.
>>> ->
>>> ->Any tips?
>>> ->
>>> - How about now()-pg_last_xact_replay_timestamp() (however this can
>>> be a
>>> - large number if there have not been any recent transactions on the
>>> - master). I suppose you could do something like:
>>> -
>>> - case when pg_last_xlog_receive_location() =
>>> - pg_last_xlog_replay_location() then '0 seconds'::interval
>>> - else now()-pg_last_xact_replay_timestamp() end as log_delay;
>>>
>>> i don't know for sure that 111512 is a time value.. that's kind of
>>> what i'm wondering. If i knew that it was like miliseconds or something
>>> that would be helpful.
>>
>> On the hot standby:
>>
>>      SELECT now()-pg_last_xact_replay_timestamp() AS lag;
>>
>> This gives you the lag time as a PostgreSQL interval.
>>
>> (It also might give you a value if you run it on a database that is
>> not a hot standby if it started in recovery mode).
>>
>> It seems difficult or impossible to calculate this on the master.
>>
>>
>
> Ah, awesome. I don't need to calculate it on the master so that's
> perfect.
>
> Thanks!
>
>
But beware. If your master server has any idle time you may be misled by
the simple calculation. I was running a pgbench test on a replicated
pair of machines. It finished this morning sometime so the lag delay
shows 02:31:11.651118, a value that might set off alarm bells. That's
why I used the case statement to force the interval to 0 if the replay
is up-to-date.

I think it is still worthwhile to check pg_stat_replication on the
master to make sure that it is still *sending* logs and perhaps
cross-checking the current log position on the master with the replayed
log location on the standby to see if they are reasonably close.

An additional verification check I've toyed with is to have a cron
script on the master update a one-row one-column table that holds a
timestamp and checking that timestamp on the standby(s) to double-check
that it is not too-far out-of-date. (This would also force regular data
delivery to the standby so that pg_last_xact_replay_timestamp() should
not lag far behind on an otherwise idle server.)

Cheers,
Steve


В списке pgsql-general по дате отправления:

Предыдущее
От: David Kerr
Дата:
Сообщение: Re: Calculating Replication Lag - units
Следующее
От: David Kerr
Дата:
Сообщение: Re: Calculating Replication Lag - units