Re: Synchronous replication patch built on SR

Поиск

Список

Период

Сортировка

От	Fujii Masao
Тема	Re: Synchronous replication patch built on SR
Дата	18 мая 2010 г. 11:30:58
Msg-id	AANLkTinaX876sBFA84EdbjN3cm7xe8848W8C9dn3wfg4@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Synchronous replication patch built on SR (Boszormenyi Zoltan <zb@cybertec.at>)
Ответы	Re: Synchronous replication patch built on SR (Boszormenyi Zoltan <zb@cybertec.at>)
Список	pgsql-hackers

Дерево обсуждения

Thanks for your reply!

On Fri, May 14, 2010 at 10:33 PM, Boszormenyi Zoltan <zb@cybertec.at> wrote:
>> In your design, the transaction commit on the master waits for its XID
>> to be read from the XLOG_XACT_COMMIT record and replied by the standby.
>> Right? This design seems not to be extensible to #2 and #3 since
>> walreceiver cannot read XID from the XLOG_XACT_COMMIT record.
>
> Yes, this was my problem, too. I would have had to
> implement a custom interpreter into walreceiver to
> process the WAL records and extract the XIDs.

Isn't reading the same WAL twice (by walreceiver and startup process)
inefficient? In synchronous replication, the overhead of walreceiver
directly affects the performance of the master. We should not assign
such a hard work to walreceiver, I think.

> But at least the supporting details, i.e. not opening another
> connection, instead being able to do duplex COPY operations in
> a server-acknowledged way is acceptable, no? :-)

Though I might not understand your point (sorry), it's OK for the standby
to send the reply to the master by using CopyData message. Currently
PQputCopyData() cannot be executed in COPY OUT, but we can relax
that.

>>  How about
>> using LSN instead of XID? That is, the transaction commit waits until
>> the standby has reached its LSN. LSN is more easy-used for walreceiver
>> and startup process, I think.
>>
>
> Indeed, using the LSN seems to be more appropriate for
> the walreceiver, but how would you extract the information
> that a certain LSN means a COMMITted transaction? Or
> we could release a locked transaction in case the master receives
> an LSN greater than or equal to the transaction's own LSN?

Yep, we can ensure that the transaction has been replicated by
comparing its own LSN with the smallest LSN in the latest LSNs
of each connected "synchronous" standby.

> Sending back all the LSNs in case of long transactions would
> increase the network traffic compared to sending back only the
> XIDs, but the amount is not clear for me. What I am more
> worried about is the contention on the ProcArrayLock.
> XIDs are rarer then LSNs, no?

No. For example, when WAL data sent by walsender at a time
has two XLOG_XACT_COMMIT records, in XID approach, walreceiver
would need to send two replies. OTOH, in LSN approach, only
one reply which indicates the last received location would
need to be sent.

>> What if the "synchronous" standby starts up from the very old backup?
>> The transaction on the master needs to wait until a large amount of
>> outstanding WAL has been applied? I think that synchronous replication
>> should start with *asynchronous* replication, and should switch to the
>> sync level after the gap between servers has become enough small.
>> What's your opinion?
>>
>
> It's certainly one option, which I think partly addressed
> with the "strict_sync_replication" knob below.
> If strict_sync_replication = off, then the master doesn't make
> its transactions wait for the synchronous reports, and the client(s)
> can work through their WALs. IIRC, the walreceiver connects
> to the master only very late in the recovery process, no?

No, the master might have a large number of WAL files which
the standby doesn't have.

>>> I have added 3 new options, two GUCs in postgresql.conf and one
>>> setting in recovery.conf. These options are:
>>>
>>> 1. min_sync_replication_clients = N
>>>
>>> where N is the number of reports for a given transaction before it's
>>> released as committed synchronously. 0 means completely asynchronous,
>>> the value is maximized by the value of max_wal_senders. Anything
>>> in between 0 and max_wal_senders means different levels of partially
>>> synchronous replication.
>>>
>>> 2. strict_sync_replication = boolean
>>>
>>> where the expected number of synchronous reports from standby
>>> servers is further limited to the actual number of connected synchronous
>>> standby servers if the value of this GUC is false. This means that if
>>> no standby servers are connected yet then the replication is asynchronous
>>> and transactions are allowed to finish without waiting for synchronous
>>> reports. If the value of this GUC is true, then transactions wait until
>>> enough synchronous standbys connect and report back.
>>>
>>
>> Why are these options necessary?
>>
>> Can these options cover more than three synchronization levels?
>>
>
> I think I explained it in my mail.
>
> If  min_sync_replication_clients == 0, then the replication is async.
> If  min_sync_replication_clients == max_wal_senders then the
> replication is fully synchronous.
> If 0 < min_sync_replication_clients < max_wal_senders then
> the replication is partially synchronous, i.e. the master can wait
> only for say, 50% of the clients to report back before it's considered
> synchronous and the relevant transactions get released from the wait.

Seems s/min_sync_replication_clients/max_sync_replication_clients

min_sync_replication_clients is required to prevent outside attacker
from connecting to the master as "synchronous" standby, and degrading
the performance on the master? Other usecase?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Peter Geoghegan
Дата: 18 мая 2010 г., 11:21:35
Сообщение: Re: Clarifications of licences on pgfoundry

Следующее

От: Andrew Dunstan
Дата: 18 мая 2010 г., 11:32:22
Сообщение: Re: Clarifications of licences on pgfoundry

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Synchronous replication patch built on SR

Предыдущее

Следующее