Re: [HACKERS] Measuring replay lag

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: [HACKERS] Measuring replay lag
Дата
Msg-id CANP8+jLuWr=q46k=OG6pdkjdpLjBoBecVqjXdD12aEwd8m3x1A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Measuring replay lag  (Thomas Munro <thomas.munro@enterprisedb.com>)
Ответы Re: [HACKERS] Measuring replay lag  (Thomas Munro <thomas.munro@enterprisedb.com>)
Список pgsql-hackers
On 21 December 2016 at 21:14, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Thu, Dec 22, 2016 at 2:14 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> I agree that the capability to measure the remote_apply lag is very useful.
>> Also I want to measure the remote_write and remote_flush lags, for example,
>> in order to diagnose the cause of replication lag.
>
> Good idea.  I will think about how to make that work.  There was a
> proposal to make writing and flushing independent[1].  I'd like that
> to go in.  Then the write_lag and flush_lag could diverge
> significantly, and it would be nice to be able to see that effect as
> time (though you could already see it with LSN positions).

I think it has a much better chance now that the replies from apply
are OK. Will check in this release, but not now.

>> For that, what about maintaining the pairs of send-timestamp and LSN in
>> *sender side* instead of receiver side? That is, walsender adds the pairs
>> of send-timestamp and LSN into the buffer every sampling period.
>> Whenever walsender receives the write, flush and apply locations from
>> walreceiver, it calculates the write, flush and apply lags by comparing
>> the received and stored LSN and comparing the current timestamp and
>> stored send-timestamp.
>
> I thought about that too, but I couldn't figure out how to make the
> sampling work.  If the primary is choosing (LSN, time) pairs to store
> in a buffer, and the standby is sending replies at times of its
> choosing (when wal_receiver_status_interval has been exceeded), then
> you can't accurately measure anything.

Skipping adding the line delay to this was very specifically excluded
by Tom, so that clock disparity between servers is not included.

If the balance of opinion is in favour of including a measure of
complete roundtrip time then I'm OK with that.

> You could fix that by making the standby send a reply *every time* it
> applies some WAL (like it does for transactions committing with
> synchronous_commit = remote_apply, though that is only for commit
> records), but then we'd be generating a lot of recovery->walreceiver
> communication and standby->primary network traffic, even for people
> who don't otherwise need it.  It seems unacceptable.

I don't see why that would be unacceptable. If we do it for
remote_apply, why not also do it for other modes? Whatever the
reasoning was for remote_apply should work for other modes. I should
add it was originally designed to be that way by me, so must have been
changed later.

This seems like a bug to me now that I look harder. The docs for
wal_receiver_status_interval say  "Updates are sent each time the
write or flush positions change, or at least as often as specified by
this parameter." But it doesn't do that, as I think it should.

> Or you could fix that by setting the XACT_COMPLETION_APPLY_FEEDBACK
> bit in the xl_xinfo.xinfo for selected transactions, as a way to ask
> the standby to send a reply when that commit record is applied, but
> that only works for commit records.  One of my goals was to be able to
> report lag accurately even between commits (very large data load
> transactions etc).

As we said, we do have keepalive records we could use for that.

> Or you could fix that by sending a list of 'interesting LSNs' to the
> standby, as a way to ask it to send a reply when those LSNs are
> applied.  Then you'd need a circular buffer of (LSN, time) pairs in
> the primary AND a circular buffer of LSNs in the standby to remember
> which locations should generate a reply.  This doesn't seem to be an
> improvement.
>
> That's why I thought that the standby should have the (LSN, time)
> buffer: it decides which samples to record in its buffer, using LSN
> and time provided by the sending server, and then it can send replies
> at exactly the right times.  The LSNs don't have to be commit records,
> they're just arbitrary points in the WAL stream which we attach
> timestamps to.  IPC and network overhead is minimised, and accuracy is
> maximised.

I'm dubious of keeping standby-side state, but I will review the patch.

>> As a bonus of this approach, we don't need to add the field into the replay
>> message that walreceiver can very frequently send back. Which might be
>> helpful in terms of networking overhead.
>
> For the record, these replies are only sent approximately every
> replay_lag_sample_interval (with variation depending on replay speed)
> and are only 42 bytes with the new field added.
>
> [1] https://www.postgresql.org/message-id/CA%2BU5nMJifauXvVbx%3Dv3UbYbHO3Jw2rdT4haL6CCooEDM5%3D4ASQ%40mail.gmail.com

We have time to make any changes to allow this to be applied in this release.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fabien COELHO
Дата:
Сообщение: Re: [HACKERS] proposal: session server side variables
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] Potential data loss of 2PC files