Re: Multimaster

Поиск

Список

Период

Сортировка

От	Craig Ringer
Тема	Re: Multimaster
Дата	17 апреля 2016 г. 12:30:48
Msg-id	CAMsr+YFH6e540wniVOutbEdxewkt8AswwAyWmk=kQ27iMRwJyQ@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Multimaster (konstantin knizhnik <k.knizhnik@postgrespro.ru>)
Ответы	Re: Multimaster
Список	pgsql-general

Дерево обсуждения

On 14 April 2016 at 17:14, konstantin knizhnik <k.knizhnik@postgrespro.ru> wrote:

On Apr 14, 2016, at 8:41 AM, Craig Ringer wrote:

On 1 April 2016 at 19:50, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:

Right now the main problem is parallel apply: we need to apply changes concurrently to avoid unintended dependencies causing deadlocks and provide reasonable performance.

How do you intend to approach that?

Actually we already have working implementation of multimaster...
There is a pool of pglogical executors. pglogical_receiver just reads transaction body from connection and append it to ready-for-execution queue.

I intend to make the same split in pglogical its self - a receiver and apply worker split. Though my intent is to have them communicate via a shared memory segment until/unless the apply worker gets too far behind and spills to disk.

Any vacant worker form this pool can dequeue this work and proceed it.

How do you handle correctness of ordering though? A naïve approach will suffer from a variety of anomalies when subject to insert/delete/insert write patterns, among other things. You can also get lost updates, rows deleted upstream that don't get deleted downstream and various other exciting ordering issues.

At absolute minimum you'd have to commit on the downstream in the same commit order as the upstream.. This can deadlock. So when you get a deadlock you'd abort the xacts of the deadlocked worker and all xacts with later commit timestamps, then retry the lot.

BDR has enough trouble with this when applying transactions from multiple peer nodes. To a degree it just throws its hands up and gives up - in particular, it can't tell the difference between an insert/update conflict and an update/delete conflict. But that's between loosely coupled nodes where we explicitly document that some kinds of anomalies are permitted. I can't imagine it being OK to have an even more complex set of possible anomalies occur when simply replaying transactions from a single peer...

It is certainly possible with this approach that order of applying transactions can be not the same at different nodes.

Well, it can produce downright wrong results, and the results even in a single-master case will be all over the place.

But it is not a problem if we have DTM.

How does that follow?

The only exception is recovery of multimaster node. In this case we have to apply transaction exactly in the same order as them were applied at the original node performing recovery. It is done by applying changes in recovery mode by pglogical_receiver itself.

I'm not sure I understand what you area saying here.

We also need 2PC support but this code was sent to you by Stas, so I hope that sometime it will be included in PostgreSQL core and pglogical plugin.

I never got a response to my suggestion that testing of upstream DDL is needed for that. I want to see more on how you plan to handle DDL on the upstream side that changes the table structure and acquires strong locks. Especially when it's combined with row changes in the same prepared xacts.

We are now replicating DDL in the way similar with one used in BDR: DDL statements are inserted in special table and are replayed at destination node as part of transaction.

We have also alternative implementation done by Artur Zakirov <a.zakirov@postgrespro.ru>
which is using custom WAL records: https://gitlab.postgrespro.ru/pgpro-dev/postgrespro/tree/logical_deparse
Patch for custom WAL records was committed in 9.6, so we are going to switch to this approach.

How does that really improve anything over using a table?

This doesn't address what I asked above though, which is whether you have tried doing ALTER TABLE in a 2PC xact with your 2PC replication patch, especially one that also makes row changes.

Well, recently I have made attempt to merge our code with the latest version of pglogical plugin (because our original implementation of multimaster was based on the code partly taken fro BDR) but finally have to postpone most of changes. My primary intention was to support metadata caching. But presence of multiple apply workers make it not possible to implement it in the same way as it is done node in pglogical plugin.

Not with a simplistic implementation of multiple workers that just round-robin process transactions, no. Your receiver will have to be smart enough to read the protocol stream and write the metadata changes to a separate stream all the workers read. Which is awkward.

I think you'll probably need your receiver to act as a metadata broker for the apply workers in the end.

Also now pglogical plugin contains a lot of code which performs mapping between source and target database schemas. So it it is assumed that them may be different.
But it is not true in case of multimaster and I do not want to pay extra cost for the functionality we do not need.

All it's really doing is mapping upstream to downstream tables by name, since the oids will be different.

Are you attempting to force table oids to be the same on all nodes, so you can rely on direct 1:1 table oid mappings? 'cos that seems fragile...

We can try to prepare our "wish list" for pglogical plugin.

That would be useful.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-general по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Multimaster