Re: Logical replication and multimaster

Поиск

Список

Период

Сортировка

От	Konstantin Knizhnik
Тема	Re: Logical replication and multimaster
Дата	2 декабря 2015 г. 23:18:22
Msg-id	565F5208.3070100@postgrespro.ru обсуждение исходный текст
Ответ на	Re: Logical replication and multimaster (Robert Haas <robertmhaas@gmail.com>)
Ответы	Re: Logical replication and multimaster (Craig Ringer <craig@2ndquadrant.com>)
Список	pgsql-hackers

Дерево обсуждения

<div class="moz-cite-prefix">Thank you for reply.<br /><br /> On 12/02/2015 08:30 PM, Robert Haas wrote:<br
/></div><blockquotecite="mid:CA+TgmoY1o3G0B-21zv2Pw5iEkpR8=J42GdsUOs4m0inKka3FEA@mail.gmail.com" type="cite"><br /><pre
wrap="">
Logical decoding only begins decoding a transaction once the
transaction is complete.  So I would guess that the sequence of
operations here is something like this - correct me if I'm wrong:

1. Do the transaction.
2. PREPARE.
3. Replay the transaction.
4. PREPARE the replay.
5. COMMIT PREPARED on original machine.
6. COMMIT PREPARED on replica.
</pre></blockquote><br /> Logical decoding is started after execution of XLogFlush method.<br /> So atually transaction
isnot yet completed at this moment:<br /> - it is not marked as committed in clog<br /> - It is marked as in-progress
inprocarray<br /> - locks are not released<br /><br /> We are not using PostgreSQL two-phase commit here.<br /> Instead
ofour DTM catches control in TransactionIdCommitTree and sends request to arbiter which in turn wait status of
committingtransactions on replicas.<br /> The problem is that transactions are delivered to replica through single
channel:logical replication slot.<br /> And while such transaction is waiting acknowledgement from arbiter, it is
blockingreplication channel preventing other (parallel transactions)  from been replicated and applied.<br /><br /> I
haveimplemented pool of background workers. May be it will be useful not only for me.<br /> It consists of one
produces-multipleconsumers queue implemented using buffer in shared memory, spinlock and two semaphores.<br /> API is
verysimple:<br /><br /> typedef void(*BgwPoolExecutor)(int id, void* work, size_t size);<br /> typedef
BgwPool*(*BgwPoolConstructor)(void);<br/><br /> extern void BgwPoolStart(int nWorkers, BgwPoolConstructor
constructor);<br/> extern void BgwPoolInit(BgwPool* pool, BgwPoolExecutor executor, char const* dbname, size_t
queueSize);<br/> extern void BgwPoolExecute(BgwPool* pool, void* work, size_t size);<br /><br /> You just place in this
queuesome bulk of bytes (work, size), it is placed in queue and then first available worker will dequeue it and
execute.<br/><br /> Using this pool and larger number of accounts (reducing possibility of conflict), I get better
results.<br/> So now receiver of logical replication is not executing transactions directly, instead of it receiver is
placingthem in queue and them are executed concurrent by pool of background workers.<br /><br /> At cluster with three
nodesresults of out debit-credit benchmark are the following:<br /><br /><table border="1" cellpadding="2"
cellspacing="2"height="112" width="366"><tbody><tr><td valign="top"><br /></td><td valign="top">TPS<br
/></td></tr><tr><tdvalign="top">Multimaster (ACID transactions)<br /></td><td align="right" valign="top">12500<br
/></td></tr><tr><tdvalign="top">Multimaster (async replication)<br /></td><td align="right" valign="top">34800<br
/></td></tr><tr><tdvalign="top">Standalone PostgreSQL<br /></td><td align="right" valign="top">44000<br
/></td></tr></tbody></table><br/><br /> We tested two modes: when client randomly distribute queries between cluster
nodesand when client is working only with one master nodes and other are just used as replicas. Performance is slightly
betterin the second case, but the difference is not very large (about 11000 TPS in first case).<br /><br /> Number of
workersin pool has signficant imact on performance: with 8 workers we get about 7800 TPS and with 16 workers -
12500.<br/> Also performance greatly depends on number of accounts (and so probability of lock conflicts). In case of
100accounts speed is less than 1000 TPS.<br /><br /><br /><blockquote
cite="mid:CA+TgmoY1o3G0B-21zv2Pw5iEkpR8=J42GdsUOs4m0inKka3FEA@mail.gmail.com"type="cite"><pre wrap="">
 
Step 3 introduces latency proportional to the amount of work the
transaction did, which could be a lot.   If you were doing synchronous
physical replication, the replay of the COMMIT record would only need
to wait for the replay of the commit record itself.  But with
synchronous logical replication, you've got to wait for the replay of
the entire transaction.  That's a major bummer, especially if replay
is single-threaded and there a large number of backends generating
transactions.  Of course, the 2PC dance itself can also add latency -
that's most likely to be the issue if the transactions are each very
short.

What I'd suggest is trying to measure where the latency is coming
from.  You should be able to measure how much time each transaction
spends (a) executing, (b) preparing itself, (c) waiting for the replay
thread to begin replaying it, (d) waiting for the replay thread to
finish replaying it, and (e) committing.  Separating (c) and (d) might
be a little bit tricky, but I bet it's worth putting some effort in,
because the answer is probably important to understanding what sort of
change will help here.  If (c) is the problem, you might be able to
get around it by having multiple processes, though that only helps if
applying is slower than decoding.  But if (d) is the problem, then the
only solution is probably to begin applying the transaction
speculatively before it's prepared/committed.  I think.

</pre></blockquote><br />

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Merlin Moncure
Дата: 02 декабря 2015 г., 23:01:35
Сообщение: Re: Some questions about the array.

Следующее

От: Pavel Stehule
Дата: 02 декабря 2015 г., 23:32:28
Сообщение: Re: proposal: function parse_ident

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Logical replication and multimaster

Предыдущее

Следующее