Re: Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.
Дата
Msg-id CA+U5nMKrt21u_Fnb55a2j1DGHcsK4QYktrs2D+iwZSYg0F9KWA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 24 May 2012 19:45, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, May 23, 2012 at 2:28 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
>> On Sat, Dec 31, 2011 at 10:34 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> Send new protocol keepalive messages to standby servers.
>>> Allows streaming replication users to calculate transfer latency
>>> and apply delay via internal functions. No external functions yet.
>>
>> Is there plan to implement such external functions before 9.2 release?
>> If not, keepalive protocol seems to be almost useless because there is
>> no use of it for a user and the increase in the number of packets might
>> increase the replication performance overhead slightly. No?
>
> Good point.  IMHO, this shouldn't really have been committed like
> this, but since it was, we had better fix it, either by reverting the
> change or forcing an initdb to expose the functionality.

OK, I've had a chance to review all the code and the preceding
discussions on this.

It's clear to me that I owe Fujii and Robert an apology for insisting
that a better solution was possible but then not fully delivering in
time. Obviously, I did mean to deliver but obviously I didn't, having
confused my action item to finish keepalives with the file-based
version. Worse, I was unaware of that until you raised it here, for
which you get apology number two and some activity to make up for
that.

I agree with Robert that the function GetReplicationApplyDelay()
doesn't return a very useful answer. But the answer it returns is
exactly what the standby currently uses for its delay calculation.
Whatever else we do, giving the user access to the same calculation
seems important. So until/unless we change the standby delay
calculation we need that function, though perhaps a name change might
be appropriate.

We might want to have a different definition of apply delay for
different purposes, so an improved definition of apply delay doesn't
necessarily mean changing standby delay mechanism.

An improved definition of apply delay would be, IMHO
if (XLByteLE(receivePtr, replayPtr))   return 0;
if (recoveryLastXTime > currentChunkStartTime) then LastKnownTS = LastAppliedTS else        LastKnownTS = StartChunkTS
ApplyDelay = TimestampDifference(LastKnownTS, GetCurrentTimestamp()….);

Which assumes the clocks are in sync. It also doesn't give very useful
answers when no commits are occurring, and can hide the effects of
large amounts of WAL generated by VACUUMs. So we need a better
definition.

Defining the delay as difference between master.lastCommittedXactTS
and standby.lastAppliedXactTS suffers from the same problems.

An even better definition of apply delay was intended, which would
allow delayed apply of WAL data. I don't see a way of doing that
without keepalives to identify exactly when WAL arrived, even when it
has no timestamps. But what I had in mind is too much, too late for
9.2

Let's look at what we can do.

1. Functions - it's fairly easy to add some functions. Initially, we
can add them as a contrib module, then if an initdb is forced
elsewhere we can include them in the main server.

pg_last_xlog_receive_timestamp() - lastMsgReceiptTime
pg_last_chunk_replay_timestamp() - currentChunkStartTime

pg_current_standby_delay() - same calc as GetReplicationApplyDelay()
is today, but we rename
pg_current_apply_delay() - the new apply delay calc suggested above
which has more to do with query/staleness wrt master
pg_current_transfer_delay() - GetReplicationTransferLatency

pg_last_repmsg_receive_timestamp() - last msg receipt time, even if not WAL

pg_is_xlog_replay_conflict() - true if currently waiting on a conflict
pg_xlog_replay_wait_timestamp() - time of last conflict or pause

2. Keepalive messages - My plan was to flesh this out more. Given
complaints about bandwidth, I don't think that keepalives should be
sent without an option to disable them. The bandwidth is fairly small,
but agree it should be optional nonetheless. If its too late to add an
option, then I accept that we should just turn off the keepalives in
this release.

Your thoughts, please.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kohei KaiGai
Дата:
Сообщение: Re: [RFC] Interface of Row Level Security
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: Unnecessary WAL archiving after failover