RE: Perform streaming logical transactions by background workers and parallel apply
От | wangw.fnst@fujitsu.com |
---|---|
Тема | RE: Perform streaming logical transactions by background workers and parallel apply |
Дата | |
Msg-id | OS3PR01MB62758A6AAED27B3A848CEB7A9E8F9@OS3PR01MB6275.jpnprd01.prod.outlook.com обсуждение исходный текст |
Ответ на | Re: Perform streaming logical transactions by background workers and parallel apply (Peter Smith <smithpb2250@gmail.com>) |
Ответы |
RE: Perform streaming logical transactions by background workers and parallel apply
("wangw.fnst@fujitsu.com" <wangw.fnst@fujitsu.com>)
|
Список | pgsql-hackers |
On Wed, Jul 13, 2022 at 13:49 PM Peter Smith <smithpb2250@gmail.com> wrote: > Below are my review comments for the v16* patch set: Thanks for your comments. > ======== > v16-0001 > ======== > > 1.0 <general> > > There are places (comments, docs, errmsgs, etc) in the patch referring > to "parallel mode". I think every one of those references should be > found and renamed to "parallel streaming mode" or "streaming=parallel" > or at the very least match sure that "streaming" is in the same > sentence. IMO it's too vague just saying "parallel" without also > saying the context is for the "streaming" parameter. > > I have commented on some of those examples below, but please search > everything anyway (including the docs) to catch the ones I haven't > explicitly mentioned. I checked all places in the patch where the word "parallel" is used (case insensitive), and I think it is clear that the description is related to stream transactions. So I am not so sure. Could you please give me some examples? I will improve them later. > 1.2 .../replication/logical/applybgworker.c - apply_bgworker_start > > + if (list_length(ApplyWorkersFreeList) > 0) > + { > + wstate = (ApplyBgworkerState *) llast(ApplyWorkersFreeList); > + ApplyWorkersFreeList = list_delete_last(ApplyWorkersFreeList); > + Assert(wstate->pstate->status == APPLY_BGWORKER_FINISHED); > + } > > The Assert that the entries in the free-list are FINISHED seems like > unnecessary checking. IIUC, code is already doing the Assert that > entries are FINISHED before allowing them into the free-list in the > first place. Just for robustness. > 1.3 .../replication/logical/applybgworker.c - apply_bgworker_find > > + if (found) > + { > + char status = entry->wstate->pstate->status; > + > + /* If any workers (or the postmaster) have died, we have failed. */ > + if (status == APPLY_BGWORKER_EXIT) > + ereport(ERROR, > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("background worker %u failed to apply transaction %u", > + entry->wstate->pstate->n, > + entry->wstate->pstate->stream_xid))); > + > + Assert(status == APPLY_BGWORKER_BUSY); > + > + return entry->wstate; > + } > > Why not remove that Assert but change the condition to be: > > if (status != APPLY_BGWORKER_BUSY) > ereport(...) When I check "APPLY_BGWORKER_EXIT", I use the function "ereport" to report the error, because "APPLY_BGWORKER_EXIT" is a possible use case. But for "APPLY_BGWORKER_BUSY", this use case should not happen here. So I think it's fine to only check this for developers when the compile option "--enable-cassert" is specified. > ======== > v16-0003 > ======== > > 3.0 <general> > > Same comment about "parallel mode" as in comment #1.0 > > ====== Please refer to the reply to #1.0. > 3.5 src/backend/replication/logical/proto.c - logicalrep_read_attrs > > @@ -1012,11 +1062,14 @@ logicalrep_read_attrs(StringInfo in, > LogicalRepRelation *rel) > { > uint8 flags; > > - /* Check for replica identity column */ > + /* Check for replica identity and unique column */ > flags = pq_getmsgbyte(in); > - if (flags & LOGICALREP_IS_REPLICA_IDENTITY) > + if (flags & ATTR_IS_REPLICA_IDENTITY) > attkeys = bms_add_member(attkeys, i); > > + if (flags & ATTR_IS_UNIQUE) > + attunique = bms_add_member(attunique, i); > > The code comment really applies to all 3 statements so maybe better > not to have the blank line here. I think it looks a bit messy without the blank line. So I tried to improve it to the following: ``` /* Check for replica identity column */ flags = pq_getmsgbyte(in); if (flags & ATTR_IS_REPLICA_IDENTITY) attkeys = bms_add_member(attkeys, i); /* Check for unique column */ if (flags & ATTR_IS_UNIQUE) attunique = bms_add_member(attunique, i); ``` > 3.6 src/backend/replication/logical/relation.c - logicalrep_rel_mark_parallel > > 3.6.a > + /* Fast path if we marked 'parallel' flag. */ > + if (entry->parallel != PARALLEL_APPLY_UNKNOWN) > + return; > > SUGGESTED > Fast path if 'parallel' flag is already known. > > ~ > > 3.6.b > + /* Initialize the flag. */ > + entry->parallel = PARALLEL_APPLY_SAFE; > > I think it makes more sense if assigning SAFE is the very *last* thing > this function does – not the first thing. > > ~ > > 3.6.c > + /* > + * First, we check if the unique column in the relation on the > + * subscriber-side is also the unique column on the publisher-side. > + */ > > "First, we check..." -> "First, check..." > > ~ > > 3.6.d > + /* > + * Then, We check if there is any non-immutable function in the local > + * table. Look for functions in the following places: > > > "Then, We check..." -> "Then, check" =>3.6.a =>3.6.c =>3.6.d Improved as suggested. =>3.6.b Not sure about this. > 3.7 src/backend/replication/logical/relation.c - logicalrep_rel_mark_parallel > > From [3] you wrote: > Personally, I do not like to use the `goto` syntax if it is not necessary, > because the `goto` syntax will forcibly change the flow of code execution. > > Yes, but OTOH readability is a major consideration too, and in this > function by simply saying goto parallel_unsafe; you can have 3 returns > instead of 7 returns, and it will take ~10 lines less code to do the > same functionality. I am still not sure about this, I think I will change this if some more people think `goto` is better here. > 4.3 doc/src/sgml/ref/create_subscription.sgml > > + <literal>parallel</literal> mode is disregarded when retrying; > + instead the transaction will be applied using <literal>on</literal> > + mode. > > "on mode" etc sounds strange. > > SUGGESTION > During the retry the streaming=parallel mode is ignored. The retried > transaction will be applied using streaming=on mode. Since it's part of the streaming option document. I think it's fine to directly say "<literal>parallel</literal> mode" > 4.4 src/backend/replication/logical/worker.c - set_subscription_retry > > + if (MySubscription->retry == retry || > + am_apply_bgworker()) > + return; > + > > Somehow I feel that this quick exit condition is not quite what it > seems. IIUC the purpose of this is really to avoid doing the tuple > updates if it is not necessary to do them. So if retry was already set > true then there is no need to update tuple to true again. So if retry > was already set false then there is no need to update the tuple to > false. But I just don't see how the (hypothetical) code below can work > as expected, because where is the code updating the value of > MySubscription->retry ??? > > set_subscription_retry(true); > set_subscription_retry(true); > > I think at least there needs to be some detailed comments explaining > what this quick exit is really doing because my guess is that > currently it is not quite working as expected. The subscription cache is be updated in maybe_reread_subscription() and is invoked at every transaction. And we reset the retry flag at transaction end, so it should be fine. And I think the quick exit check code is similar to clear_subscription_skip_lsn. Attach the news patches. [1] - https://www.postgresql.org/message-id/CAHut%2BPv0yWynWTmp4o34s0d98xVubys9fy%3Dp0YXsZ5_sUcNnMw%40mail.gmail.com Regards, Wang wei
Вложения
В списке pgsql-hackers по дате отправления:
Следующее
От: "David G. Johnston"Дата:
Сообщение: Re: Doc about how to set max_wal_senders when setting minimal wal_level