Re: Proposal: Support Logical replication of large objects
| От | Nitin Motiani |
|---|---|
| Тема | Re: Proposal: Support Logical replication of large objects |
| Дата | |
| Msg-id | CAH5HC943TtLL6snuqhx8-Uzj065Pxgxj04_fAt1EOHSiYH+uiA@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: Proposal: Support Logical replication of large objects (Dilip Kumar <dilipbalaut@gmail.com>) |
| Список | pgsql-hackers |
Hi Dilip, I have created a couple of POC patches for the tablesync side of this. The challenge with the tablesync involved permissions because the table is owned by the superuser. I considered a few solutions and have patches for a couple. 1. One simple solution is to allow only the superuser to set up large object replication. This can be achieved with a command to enable large object replication to a publication. Or maybe a guc flag can be used. Ideally we don't want all publications to publish the large objects here. So a publication level command might be preferred. While the enablement mechanism is in question, I have implemented the POC by adding the pg_largeobject table to the result of pg_get_publication_tables. In the final implementation, we might also add the pg_largeobject_metadata table. The patch is attached in POC-v3-0003-Tablesync-for-large-objects-for-superuser.patch. An alternative would be to handle this special case on the subscriber side. But it is simpler to do it on the publisher side. Alternatively we might need the setup on both the publisher and subscriber sides. For the full enablement we'll need this patch, the existing apply worker patch for lo_write, and a patch supporting lo_create on the apply worker side. 2. An alternative solution would be to allow a subscription owner to only copy the large object entries owned by its counterpart on the publisher. This allows all users to replicate large objects. It also enables multiple subscriptions to setup large object replication. The enablement question from the first approach still remains open. We can also assume that a patch supporting lo_create on the apply worker will be created along with this change. For implementation, we can have two tablesync workers - one for pg_largeobject_metadata and one for pg_largeobject. The pg_largeobject_metadata worker can use lo_create to create the largeobject entries. Alternatively it can just copy the items since permissions are not an issue for this table. The pg_largeobject worker will use lo_write/lo_put to insert the entries owned by it. The major challenge with this approach is the synchronization required between the pg_largeobject and pg_largeobject_metadata tablesync workers. Usually for a table copy (as suggested for superusers in the first approach), integrity checks are turned off and we should have no issue with the two tablesync workers running in parallel. However, explicitly using lo_write will check for the large object's existence, which can lead to failures. 3. To avoid the synchronization problem above, I experimented with a modified approach. Instead of having a separate tablesync worker for pg_largeobject_metadata, I only created one tablesync worker for pg_largeobject. This differs from a regular tablesync worker in the following ways : a. Instead of a copy command, it runs "select m.oid, lo_get(m.oid) from pg_largeobject_metadata as m join pg_user as u on m.lomowner = u.usesysid where u.usename = CURRENT_USER;" This returns all largeobjects owned by the user. b. Then I use lo_from_bytea (a refactored version which doesn't need fmgr) to both create and write the large object which removes the need for a separate lo_create. POC-v3-0004-Enable-tablesync-for-large-objects-for-all-users.patch contains this implementation. I tested it successfully for a small set of largeobjects. This patch, along with support for lo_create in applyworker should provide full support for large objects replication. Note : We should also modify the applyWorker code to replicate only the lo_writes on objects owned by the subscription user. I have not made that change in the POC but can do it in the next version. One major concern here is the performance of tablesync. I think that most of the users will set up replication at the start time. Or convert a physical replica to a logical replica. So this cost might not be borne in many cases. 4. If we want a more performant version, one idea is to support bulk writes for large objects. Then the above solution can be made more performant. I have not analyzed the work required. Suggestions on this would be welcome. Thanks. Nitin Motiani Google
Вложения
В списке pgsql-hackers по дате отправления: