Re: Multi-Master Logical Replication

Поиск
Список
Период
Сортировка
От Peter Smith
Тема Re: Multi-Master Logical Replication
Дата
Msg-id CAHut+PsvvfTWWwE8vkgUg4q+QLyoCyNE7NU=mEiYHcMcXciXdg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Multi-Master Logical Replication  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы RE: Multi-Master Logical Replication  ("kuroda.hayato@fujitsu.com" <kuroda.hayato@fujitsu.com>)
Список pgsql-hackers
On Wed, May 25, 2022 at 4:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, May 24, 2022 at 5:57 PM Bruce Momjian <bruce@momjian.us> wrote:
> >
> > On Sat, May 14, 2022 at 12:20:05PM +0530, Amit Kapila wrote:
> > > On Sat, May 14, 2022 at 12:33 AM Bruce Momjian <bruce@momjian.us> wrote:
> > > >
> > > > Uh, without these features, what workload would this help with?
> > > >
> > >
> > > To allow replication among multiple nodes when some of the nodes may
> > > have pre-existing data. This work plans to provide simple APIs to
> > > achieve that. Now, let me try to explain the difficulties users can
> > > face with the existing interface. It is simple to set up replication
> > > among various nodes when they don't have any pre-existing data but
> > > even in that case if the user operates on the same table at multiple
> > > nodes, the replication will lead to an infinite loop and won't
> > > proceed. The example in email [1] demonstrates that and the patch in
> > > that thread attempts to solve it. I have mentioned that problem
> > > because this work will need that patch.
> > ...
> > > This will become more complicated when more than two nodes are
> > > involved, see the example provided for the three nodes case [2]. Can
> > > you think of some other simpler way to achieve the same? If not, I
> > > don't think the current way is ideal and even users won't prefer that.
> > > I am not telling that the APIs proposed in this thread is the only or
> > > best way to achieve the desired purpose but I think we should do
> > > something to allow users to easily set up replication among multiple
> > > nodes.
> >
> > You still have not answered my question above.  "Without these features,
> > what workload would this help with?"  You have only explained how the
> > patch would fix one of the many larger problems.
> >
>
> It helps with setting up logical replication among two or more nodes
> (data flows both ways) which is important for use cases where
> applications are data-aware. For such apps, it will be beneficial to
> always send and retrieve data to local nodes in a geographically
> distributed database. Now, for such apps, to get 100% consistent data
> among nodes, one needs to enable synchronous_mode (aka set
> synchronous_standby_names) but if that hurts performance and the data
> is for analytical purposes then one can use it in asynchronous mode.
> Now, for such cases, if the local node goes down, the other master
> node can be immediately available to use, sure it may slow down the
> operations for some time till the local node come-up. For such apps,
> later it will be also easier to perform online upgrades.
>
> Without this, if the user tries to achieve the same via physical
> replication by having two local nodes, it can take quite long before
> the standby can be promoted to master and local reads/writes will be
> much costlier.
>

As mentioned above, the LRG idea might be a useful addition to logical
replication for configuring certain types of "data-aware"
applications.

LRG for data-aware apps (e.g. sensor data)
------------------------------------------
Consider an example where there are multiple weather stations for a
country. Each weather station is associated with a PostgreSQL node and
inserts the local sensor data (e.g wind/rain/sunshine etc) once a
minute to some local table. The row data is identified by some station
ID.

- Perhaps there are many nodes.

- Loss of a single row of replicated sensor data if some node goes
down is not a major problem for this sort of application.

- Benefits of processing data locally can be realised.

- Using LRG simplifies the setup/sharing of the data across all group
nodes via a common table.

~~

LRG makes setup easier
----------------------
Although it is possible already (using Vignesh's "infinite recursion"
WIP patch [1]) to set up this kind of environment using logical
replication, as the number of nodes grows it becomes more and more
difficult to do it. For each new node, there needs to be N-1 x CREATE
SUBSCRIPTION for the other group nodes, meaning the connection details
for every other node also must be known up-front for the script.

OTOH, the LRG API can simplify all this, removing the user's burden
and risk of mistakes. Also, LRG only needs to know how to reach just 1
other node in the group (the implementation will discover all the
other node connection details internally).

~~

LRG can handle initial table data
--------------------------------
If the joining node (e.g. a new weather station) already has some
initial local sensor data then sharing that initial data manually with
all the other nodes requires some tricky steps. LRG can hide all this
complexity behind the API, so it is not a user problem anymore.

------
[1] https://www.postgresql.org/message-id/flat/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9%3D9orqubhjcQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: pg_upgrade test writes to source directory
Следующее
От: Zhihong Yu
Дата:
Сообщение: Re: adding status for COPY progress report