Re: Proposal: Support Logical replication of large objects

Поиск

Список

Период

Сортировка

От	Nitin Motiani
Тема	Re: Proposal: Support Logical replication of large objects
Дата	2 февраля 13:02:13
Msg-id	CAH5HC943TtLL6snuqhx8-Uzj065Pxgxj04_fAt1EOHSiYH+uiA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Proposal: Support Logical replication of large objects (Dilip Kumar <dilipbalaut@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

Hi Dilip,

I have created a couple of POC patches for the tablesync side of this.
The challenge with the tablesync involved permissions because the
table is owned by the superuser. I considered a few solutions and have
patches for a couple.

1. One simple solution is to allow only the superuser to set up large
object replication. This can be achieved with a command to enable
large object replication to a publication. Or maybe a guc flag can be
used. Ideally we don't want all publications to publish the large
objects here. So a publication level command might be preferred.

While the enablement mechanism is in question, I have implemented the
POC by adding the pg_largeobject table to the result of
pg_get_publication_tables. In the final implementation, we might also
add the pg_largeobject_metadata table.

The patch is attached in
POC-v3-0003-Tablesync-for-large-objects-for-superuser.patch. An
alternative would be to handle this special case on the subscriber
side. But it is simpler to do it on the publisher side. Alternatively
we might need the setup on both the publisher and subscriber sides.

For the full enablement we'll need this patch, the existing apply
worker patch for lo_write, and a patch supporting lo_create on the
apply worker side.

2. An alternative solution would be to allow a subscription owner to
only copy the large object entries owned by its counterpart on the
publisher. This allows all users to replicate large objects. It also
enables multiple subscriptions to setup large object replication. The
enablement question from the first approach still remains open. We can
also assume that a patch supporting lo_create on the apply worker will
be created along with this change.

For implementation, we can have two tablesync workers - one for
pg_largeobject_metadata and one for pg_largeobject. The
pg_largeobject_metadata worker can use lo_create to create the
largeobject entries. Alternatively it can just copy the items since
permissions are not an issue for this table. The pg_largeobject worker
will use lo_write/lo_put to insert the entries owned by it.

The major challenge with this approach is the synchronization required
between the pg_largeobject and pg_largeobject_metadata tablesync
workers. Usually for a table copy (as suggested for superusers in the
first approach), integrity checks are turned off and we should have no
issue with the two tablesync workers running in parallel. However,
explicitly using lo_write will check for the large object's existence,
which can lead to failures.

3. To avoid the synchronization problem above, I experimented with a
modified approach. Instead of having a separate tablesync worker for
pg_largeobject_metadata, I only created one tablesync worker for
pg_largeobject. This differs from a regular tablesync worker in the
following ways :

a. Instead of a copy command, it runs "select m.oid, lo_get(m.oid)
from pg_largeobject_metadata as m join pg_user as u on m.lomowner =
u.usesysid where u.usename = CURRENT_USER;"

This returns all largeobjects owned by the user.

b. Then I use lo_from_bytea (a refactored version which doesn't need
fmgr) to both create and write the large object which removes the need
for a separate lo_create.

POC-v3-0004-Enable-tablesync-for-large-objects-for-all-users.patch
contains this implementation. I tested it successfully for a small set
of largeobjects. This patch, along with support for lo_create in
applyworker should provide full support for large objects replication.

Note : We should also modify the applyWorker code to replicate only
the lo_writes on objects owned by the subscription user. I have not
made that change in the POC but can do it in the next version.

One major concern here is the performance of tablesync. I think that
most of the users will set up replication at the start time. Or
convert a physical replica to a logical replica. So this cost might
not be borne in many cases.

4. If we want a more performant version, one idea is to support bulk
writes for large objects. Then the above solution can be made more
performant. I have not analyzed the work required. Suggestions on this
would be welcome.

Thanks.


Nitin Motiani
Google

Вложения

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Proposal: Support Logical replication of large objects

Вложения