RE: Perform streaming logical transactions by background workers and parallel apply

Поиск
Список
Период
Сортировка
От wangw.fnst@fujitsu.com
Тема RE: Perform streaming logical transactions by background workers and parallel apply
Дата
Msg-id OS3PR01MB6275FBD9359F8ED0EDE7E5459EDE9@OS3PR01MB6275.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Perform streaming logical transactions by background workers and parallel apply  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Mon, May 30, 2022 7:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> Few comments/suggestions for 0001 and 0003
> =====================================
> 0001
> --------
Thanks for your comments.

> 1.
> + else
> + snprintf(bgw.bgw_name, BGW_MAXLEN,
> + "logical replication apply worker for subscription %u", subid);
> 
> Can we slightly change the message to: "logical replication background
> apply worker for subscription %u"?
Improve the message as suggested.

> 2. Can we think of separating the new logic for applying the xact by
> bgworker into a new file like applybgwroker or applyparallel? We have
> previously done the same in the case of vacuum (see vacuumparallel.c).
Improve the patch as suggested. I separated the new logic related to apply
background worker to new file src/backend/replication/logical/applybgwroker.c.

> 3.
> + /*
> + * XXX The publisher side doesn't always send relation update messages
> + * after the streaming transaction, so update the relation in main
> + * apply worker here.
> + */
> + if (action == LOGICAL_REP_MSG_RELATION)
> + {
> + LogicalRepRelation *rel = logicalrep_read_rel(s);
> + logicalrep_relmap_update(rel);
> + }
> 
> I think the publisher side won't send the relation update message
> after streaming transaction only if it has already been sent for a
> non-streaming transaction in which case we don't need to update the
> local cache here. This is as per my understanding of
> maybe_send_schema(), do let me know if I am missing something? If my
> understanding is correct then we don't need this change.
I think we need this change because the publisher will invoke function
cleanup_rel_sync_cache when committing a streaming transaction, then it will
set "schema_sent" to true for related entry. Later, publisher may not send this
schema in function maybe_send_schema because we already sent this schema
(schema_sent = true).
If we do not have this change, It would cause an error in the following case:
Suppose that after walsender worker starts, first we commit a streaming
transaction. Walsender sends relation update message, and only apply background
worker can update relation map cache by this message. After this, if we commit
a non-streamed transaction that contains same replicated table, walsender will
not send relation update message, so main apply worker would not get relation
update message.
I think we need this change to update relation map cache not only in apply
background worker but also in apply main worker.
In addition, we should also handle the LOGICAL_REP_MSG_TYPE message just like
LOGICAL_REP_MSG_RELATION. So improve the code you mentioned. BTW, I simplify
the function handle_streamed_transaction().

> 4.
> + * For the main apply worker, if in streaming mode (receiving a block of
> + * streamed transaction), we send the data to the apply background worker.
>   *
> - * If in streaming mode (receiving a block of streamed transaction), we
> - * simply redirect it to a file for the proper toplevel transaction.
> 
> This comment is slightly confusing. Can we change it to something
> like: "In streaming case (receiving a block of streamed transaction),
> for SUBSTREAM_ON mode, we simply redirect it to a file for the proper
> toplevel transaction, and for SUBSTREAM_APPLY mode, we send the
> changes to background apply worker."?
Improve the comments as suggested.

> 5.
> +apply_handle_stream_abort(StringInfo s)
>  {
> ...
> ...
> + /*
> + * If the two XIDs are the same, it's in fact abort of toplevel xact,
> + * so just free the subxactlist.
> + */
> + if (subxid == xid)
> + {
> + set_apply_error_context_xact(subxid, InvalidXLogRecPtr);
> 
> - fd = BufFileOpenFileSet(MyLogicalRepWorker->stream_fileset, path,
> O_RDONLY,
> - false);
> + AbortCurrentTransaction();
> 
> - buffer = palloc(BLCKSZ);
> + EndTransactionBlock(false);
> + CommitTransactionCommand();
> +
> + in_remote_transaction = false;
> ...
> ...
> }
> 
> Here, can we update the replication origin as we are doing in
> apply_handle_rollback_prepared? Currently, we don't do it because we
> are just cleaning up temporary files for which we don't even have a
> transaction. Also, we don't have the required infrastructure to
> advance origins for aborts as we have for abort prepared. See commits
> [1eb6d6527a][8a812e5106]. If we think it is a good idea then I think
> we need to send abort_lsn and abort_time from the publisher and we
> need to be careful to make it work with lower subscriber versions that
> don't have the facility to process these additional values.
I think it is a good idea. I will consider this and add this part in next
version.

> 0003
> --------
> 6.
> + /*
> + * If any unique index exist, check that they are same as remoterel.
> + */
> + if (!rel->sameunique)
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("cannot replicate relation with different unique index"),
> + errhint("Please change the streaming option to 'on' instead of 'apply'.")));
> 
> I think we can do better here. Instead of simply erroring out and
> asking the user to change streaming mode, we can remember this in the
> system catalog probably in pg_subscription, and then on restart, we
> can change the streaming mode to 'on', perform the transaction, and
> again change the streaming mode to apply. I am not sure whether we
> want to do it in the first version or not, so if you agree with this,
> developing it as a separate patch would be a good idea.
> 
> Also, please update comments here as to why we don't handle such cases.
Yes, I think it is a good idea. I will develop it as a separate patch later.
And I added the comments atop function apply_bgworker_relation_check as
below:
```
 * Although we maintains the commit order by allowing only one process to
 * commit at a time, our access order to the relation has changed.
 * This could cause unexpected problems if the unique column on the replicated
 * table is inconsistent with the publisher-side or contains non-immutable
 * functions when applying transactions in the apply background worker.
```

I also made some other changes. The new patches and the modification details
were attached in [1].

[1] -
https://www.postgresql.org/message-id/OS3PR01MB62758A881FF3240171B7B21B9EDE9%40OS3PR01MB6275.jpnprd01.prod.outlook.com

Regards,
Wang wei

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "wangw.fnst@fujitsu.com"
Дата:
Сообщение: RE: Perform streaming logical transactions by background workers and parallel apply
Следующее
От: "wangw.fnst@fujitsu.com"
Дата:
Сообщение: RE: Perform streaming logical transactions by background workers and parallel apply