Re: Synchronous replication patch built on SR

Поиск
Список
Период
Сортировка
От Boszormenyi Zoltan
Тема Re: Synchronous replication patch built on SR
Дата
Msg-id 4BF3E077.3080100@cybertec.at
обсуждение исходный текст
Ответ на Re: Synchronous replication patch built on SR  (Fujii Masao <masao.fujii@gmail.com>)
Ответы Re: Synchronous replication patch built on SR  (Fujii Masao <masao.fujii@gmail.com>)
Список pgsql-hackers
Fujii Masao írta:
> On Wed, May 19, 2010 at 5:41 PM, Boszormenyi Zoltan <zb@cybertec.at> wrote:
>   
>>> Isn't reading the same WAL twice (by walreceiver and startup process)
>>> inefficient?
>>>       
>> Yes, and I didn't implement that because it's inefficient.
>>     
>
> So I'd like to propose to use LSN instead of XID since LSN can
> be easily handled by both walreceiver and startup process.
>   

OK, I will look into it replacing XIDs with LSNs.

>>>  Currently
>>> PQputCopyData() cannot be executed in COPY OUT, but we can relax
>>> that.
>>>
>>>       
>> And I implemented just that, in a way that upon walreceiver startup
>> it sends a new protocol message to the walsender by calling
>> PQsetDuplexCopy() (see my patch) and the walsender response is ACK.
>> This protocol message is intentionally not handled by the normal
>> backend, so plain libpq clients cannot mess up their COPY streams.
>>     
>
> The newly-introduced message type "Set Duplex Copy" is really required?
> I think that the standby can send its replication mode to the master
> via Query or CopyData message, which are already used in SR. For example,
> how about including the mode in the handshake message "START_REPLICATION"?
> If we do that, we would not need to introduce new libpq function
> PQsetDuplexCopy(). BTW, I often got the complaints about adding
> new libpq function when I implemented SR ;)
>   

:-)

> In the patch, PQputCopyData() checks the newly-introduced pg_conn field
> "duplexCopy". Instead, how about checking the existing field "replication"?
>   

I didn't see there was such a new field. (looking...) I can see now,
it was added in the middle of the structure. Ok, we can then use it
to allow duplex COPY instead of my new field. I suppose it's non-NULL
if replication is on, right? Then the extra call is not needed then.

> Or we can just allow PQputCopyData() to go even in COPY OUT state.
>   

I think this may not be too useful for SQL clients, but who knows? :-)
Use cases, anyone?

>> We can change the walreceiver so it sends similarly encapsulated
>> messages as the walsender does. In our patch, the walreceiver
>> currently sends the raw XIDs. If we add a minimal protocol
>> encapsulation, we can distinguish between the XIDs (or later LSNs)
>> and the "mark me synchronous from now on" message.
>>
>> The only problem is: what should be the point when such a client
>> becomes synchronous from the master's POV, so the XID/LSN reports
>> will count and transactions are made to wait for this client?
>>     
>
> One idea is to switch to "sync" when the gap of LSN becomes less
> than or equal to XLOG_SEG_SIZE (currently 8MB). That is, walsender
> calculates the gap from the current write WAL location on the master
> and the last receive/flush/replay location on the standby. And if
> the gap <= XLOG_SEG_SIZE, it instructs backends to wait for
> replication from then on.
>   

This is a sensible idea.

>> As a side note, the async walreceivers' behaviour should be kept
>> so they don't send anything back and the message that
>> PQsetDuplexCopy() sends to the master would then only
>> prepare the walsender that its client will become synchronous
>> in the near future.
>>     
>
> I agree that walreceiver should send no replication ack if "async"
> mode is chosen. OTOH, in "sync" case, walreceiver should always
> send ack even if the gap is large and the master doesn't wait for
> replication yet. As mentioned above, walsender needs to calculate
> the gap from the ack.
>   

Agreed.

>>> Seems s/min_sync_replication_clients/max_sync_replication_clients
>>>
>>>       
>> No, "min" is indicating the minimum number of walreceiver reports
>> needed before a transaction can be released from under the waiting.
>> The other reports coming from walreceivers are ignored.
>>     
>
> Hmm... when min_sync_replication_clients = 2 and there are three
> "synchronous" standbys, the master waits for only two standbys?
>   

Yes. This is the idea, "partially synchronous replication".
I heard anecdotes about replication solutions where say
ensuring that (say) if at least 50% of the machines across the
whole cluster report back synchronously then the transaction
is considered replicated "good enough".

> The standby which the master ignores is fixed? or dynamically (or
> randomly) changed?
>   

It may be randomly changed, depending on who send the reports
first. The replication servers themselves may get very busy with
large queries or they may be loaded by some other ways and
be somewhat late in processing the WAL stream. The less loaded
servers answer first, and the transaction is considered properly
replicated.

>>> min_sync_replication_clients is required to prevent outside attacker
>>> from connecting to the master as "synchronous" standby, and degrading
>>> the performance on the master?
>>>       
>> ???
>>
>> Properly configured pg_hba.conf prevents outside attackers
>> to connect as replication clients, no?
>>     
>
> Yes :)
>
> I'd like to just know the use case of min_sync_replication_clients.
> Sorry, I've not understood yet how useful this option is.
>   

I hope I answered it. :-)

Best regards,
Zoltán Böszörményi

-- 
Bible has answers for everything. Proof:
"But let your communication be, Yea, yea; Nay, nay: for whatsoever is more
than these cometh of evil." (Matthew 5:37) - basics of digital technology.
"May your kingdom come" - superficial description of plate tectonics

----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
http://www.postgresql.at/



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Mike Fowler
Дата:
Сообщение: Adding XML Schema validation (XMLVALIDATE)
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Stefan's bug (was: max_standby_delay considered harmful)