RE: Perform streaming logical transactions by background workers and parallel apply

Поиск
Список
Период
Сортировка
От wangw.fnst@fujitsu.com
Тема RE: Perform streaming logical transactions by background workers and parallel apply
Дата
Msg-id OS3PR01MB6275120502A4730AB9932FCA9E839@OS3PR01MB6275.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Perform streaming logical transactions by background workers and parallel apply  (Peter Smith <smithpb2250@gmail.com>)
Список pgsql-hackers
On Mon, Jul 4, 2022 at 12:12 AM Peter Smith <smithpb2250@gmail.com> wrote:
> Below are some review comments for patch v14-0003:

Thanks for your comments.

> 3.1 Commit message
> 
> If any of the following checks are violated, an error will be reported.
> 1. The unique columns between publisher and subscriber are difference.
> 2. There is any non-immutable function present in expression in
> subscriber's relation. Check from the following 4 items:
>    a. The function in triggers;
>    b. Column default value expressions and domain constraints;
>    c. Constraint expressions.
>    d. The foreign keys.
> 
> SUGGESTION (rewording to match the docs and the code).
> 
> Add some checks before using apply background worker to apply changes.
> streaming=parallel mode has two requirements:
> 1) The unique columns must be the same between publisher and subscriber
> 2) There cannot be any non-immutable functions in the subscriber-side
> replicated table. Look for functions in the following places:
> * a. Trigger functions
> * b. Column default value expressions and domain constraints
> * c. Constraint expressions
> * d. Foreign keys
> 
> ======
> 
> 3.2 doc/src/sgml/ref/create_subscription.sgml
> 
> +          To run in this mode, there are following two requirements. The first
> +          is that the unique column should be the same between publisher and
> +          subscriber; the second is that there should not be any non-immutable
> +          function in subscriber-side replicated table.
> 
> SUGGESTION
> Parallel mode has two requirements: 1) the unique columns must be the
> same between publisher and subscriber; 2) there cannot be any
> non-immutable functions in the subscriber-side replicated table.

I did not write clearly enough before. So I made some slight modifications to
the first requirement you suggested. Like this:
```
1) The unique column in the relation on the subscriber-side should also be the
unique column on the publisher-side;
```

> 3.9 src/backend/replication/logical/proto.c - logicalrep_write_attrs
> 
> This big slab of new code to get the BMS looks very similar to
> RelationGetIdentityKeyBitmap. So perhaps this code should be
> encapsulated in another function like that one (in relcache.c?) and
> then just called from logicalrep_write_attrs

I think the file relcache.c should contain cache-build operations, and the code
I added doesn't have this operation. So I didn't change.

> 3.12 src/backend/replication/logical/relation.c -
> logicalrep_rel_mark_safe_in_apply_bgworker
> 
> I did not really understand why you used an enum for one flag
> (volatility) but not the other one (sameunique); shouldn’t both of
> these be tri-values: unknown/yes/no?
> 
> For E.g. there is a quick exit from this function if the
> FUNCTION_UNKNOWN, but there is no equivalent quick exit for the
> sameunique? It seems inconsistent.

After rethinking patch 0003, I think we only need one flag. So I merged flags
'volatility' and 'sameunique' into a new flag 'parallel'. It is a tri-state
flag. And I also made some related modifications.

> 3.14 src/backend/replication/logical/relation.c -
> logicalrep_rel_mark_safe_in_apply_bgworker
> 
> There are lots of places setting FUNCTION_NONIMMUTABLE, so I think
> this code might be tidier if you just have a single return at the end
> of this function and 'goto' it.
> 
> e.g.
> if (...)
> goto function_not_immutable;
> 
> ...
> 
> return;
> 
> function_not_immutable:
> entry->volatility = FUNCTION_NONIMMUTABLE;

Personally, I do not like to use the `goto` syntax if it is not necessary,
because the `goto` syntax will forcibly change the flow of code execution.

> 3.17 src/backend/utils/cache/typcache.c
> 
> +/*
> + * GetDomainConstraints --- get DomainConstraintState list of
> specified domain type
> + */
> +List *
> +GetDomainConstraints(Oid type_id)
> 
> This is an unusual-looking function header comment, with the function
> name and the "---".

Not sure about this. Please refer to function lookup_rowtype_tupdesc_internal.

> 3.19
> 
> @@ -31,6 +42,11 @@ typedef struct LogicalRepRelMapEntry
>   Relation localrel; /* relcache entry (NULL when closed) */
>   AttrMap    *attrmap; /* map of local attributes to remote ones */
>   bool updatable; /* Can apply updates/deletes? */
> + bool sameunique; /* Are all unique columns of the local
> +    relation contained by the unique columns in
> +    remote? */
> 
> (This is similar to review comment 3.12)
> 
> I felt it was inconsistent for this to be a boolean but for the
> 'volatility' member to be an enum. AFAIK these 2 flags are similar
> kinds – e.g. essentially tri-state flags unknown/true/false so I
> thought they should be treated the same.  E.g. both enums?

Please refer to the reply to #3.12.

> 3.22 .../subscription/t/032_streaming_apply.pl
> 
> 3.22.a
> +# Setup structure on publisher
> 
> "structure"?
> 
> 3.22.b
> +# Setup structure on subscriber
> 
> "structure"?

Just refer to other subscription test.

> 3.23
> 
> +# Check that a background worker starts if "streaming" option is specified as
> +# "parallel".  We have to look for the DEBUG1 log messages about that, so
> +# temporarily bump up the log verbosity.
> +$node_subscriber->append_conf('postgresql.conf', "log_min_messages =
> debug1");
> +$node_subscriber->reload;
> +
> +$node_publisher->safe_psql('postgres',
> + "INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(1,
> 5000) s(i)"
> +);
> +
> +$node_subscriber->wait_for_log(qr/\[Apply BGW #\d+\] started/, 0);
> +$node_subscriber->append_conf('postgresql.conf',
> + "log_min_messages = warning");
> +$node_subscriber->reload;
> 
> I didn't really think it was necessary to bump this log level, and to
> verify that the bgworker is started, because this test is anyway going
> to ensure that the ERROR "cannot replicate relation with different
> unique index" happens, so that is already implicitly ensuring the
> bgworker was used.

Since it takes almost no time, I think a more detailed confirmation is fine.

The rest of the comments are improved as suggested.
The new patches were attached in [1].

[1] -
https://www.postgresql.org/message-id/OS3PR01MB62755C6C9A75EB09F7218B589E839%40OS3PR01MB6275.jpnprd01.prod.outlook.com

Regards,
Wang wei

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "wangw.fnst@fujitsu.com"
Дата:
Сообщение: RE: Perform streaming logical transactions by background workers and parallel apply
Следующее
От: "wangw.fnst@fujitsu.com"
Дата:
Сообщение: RE: Perform streaming logical transactions by background workers and parallel apply