Re: POC: enable logical decoding when wal_level = 'replica' without a server restart

Поиск
Список
Период
Сортировка
От Peter Smith
Тема Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Дата
Msg-id CAHut+Puq-i+-DSkN2T_G275zFedhWSHRBtMQRxUkt5rPM35RMA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: POC: enable logical decoding when wal_level = 'replica' without a server restart  (Masahiko Sawada <sawada.mshk@gmail.com>)
Ответы Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Список pgsql-hackers
On Mon, Dec 1, 2025 at 4:50 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Nov 27, 2025 at 12:33 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Hi Sawada-San.
> >
> > Some review comments for v30-0001.
>
> Thank you for the comments!
>
> > ======
> > src/backend/replication/logical/logicalctl.c
> >
> > 6b.
> > Would it be better to have a self-documenting enum {UNKNOWN, ENABLED,
> > DISABLED} instead of some magic int values that need lots of comments
> > explaining how to use them?
>
> No, we should not cast enum values directly to a boolean. We have a
> precedent of such usage like LocalXLogInsertAllowed, and I don't think
> we can reduce the comments even if we use a self-documenting enum
> since the usage is complicated. We could consider #define values 0, 1,
> and -1 but it seems to be overkill to this usage.
>

I never suggested casting enum values directly to booleans. FYI - I've
attached a diff patch to demonstrate what I meant. There are fewer
comments since there is no need to explain special values -1,0,1.

//////////

Anyway, here are some more review comments for v31-0001.

======
src/backend/replication/logical/decode.c

1.
+ /*
+ * a XLOG_PARAMETER_CHANGE record might change wal_level to
+ * 'replica' or 'minimal', but we don't check the logical decoding
+ * availability here because it's updated by a
+ * XLOG_LOGICAL_DECODING_STATUS_CHANGE record.
+ */


Missed a previous review comment. Start this comment with a capital 'A'.

======
src/backend/replication/logical/logicalctl.c

CheckXLogLogicalInfo:

2.
+ XLogLogicalInfoXactCache = (int) XLogLogicalInfo;
+
+ return XLogLogicalInfoXactCache;
+}


This function should be returning the bool -- i.e. the
XLogLogicalInfo, not the int XLogLogicalInfoCache.

======
src/test/recovery/t/050_effective_wal_level.pl

3.
+# Wait for the checkpointer to decrease effective_wal_level to 'replica'.
+sub wait_for_logical_decoding_disabled
+{
+ my ($node) = @_;
+
+ $node->poll_query_until('postgres',
+ qq[select current_setting('effective_wal_level') = 'replica';]);
+}

Should this subroutine also execute a CHECKPOINT here? Otherwise, it
might be a long wait before the checkpoint happens automatically.
Also, it would align better with the comment; otherwise, this
subroutine has nothing really to do with checkpoints at all.

~~~

4.
+# Initialize the primary server with wal_level = 'replica'.
+my $primary = PostgreSQL::Test::Cluster->new('primary');
+$primary->init(allows_streaming => 1);
+$primary->append_conf('postgresql.conf', "log_min_messages = debug1");
+$primary->start();

Would it be nicer to explicitly set wal_level? Otherwise, perhaps the
comment should mention that this is the default.

~~~

5.
+# Create a physical slot.
+$primary->safe_psql('postgres',
+ qq[select pg_create_physical_replication_slot('test_phy_slot', false, false)]
+);
+
+# Physical slots don't affect effective_wal_level.
+test_wal_level($primary, "replica|replica",
+ "effective_wal_level doesn't change with a new physical slot");
+$primary->safe_psql('postgres',
+ qq[select pg_drop_replication_slot('test_phy_slot')]);

That seems like a single test case, so the code/comments can be combined.

SUGGESTION
Check that creating a physical slot doesn't affect effective_wal_level.

~~~

6.
+# Create a temporary logical slot but exits without releasing it explicitly.
+# This enables logical decoding but skips disabling it and delegates to the
+# checkpointer.

typo: /but exits/ but exit/

~~~

7.
+# Create a new logical slot and check if effective_wal_level must be increased
+# to 'logical'.

typo: /if effective_wal_level must be increased/that
effective_wal_level is increased/

~~~

8.
+# Check if the server cannot start with wal_level='minimal' as long as there is
+# at least one replication slot.
+$primary->adjust_conf('postgresql.conf', 'wal_level', 'minimal');
+$primary->adjust_conf('postgresql.conf', 'max_wal_senders', '0');
+$primary->stop;
+
+command_fails(
+ [
+ 'pg_ctl',
+ '--pgdata' => $primary->data_dir,
+ '--log' => $primary->logfile,
+ 'start',
+ ],
+ "cannot server with wal_level='minimal' as there is in-use logical slot");
+

8a.
Tweak comment.

SUGGESTION
Verify that the server cannot start with wal_level='minimal' when
there is at least one replication slot.

~

8b.
Typo (missing word "start"?) in "cannot server"
Typo /is in-use/is an in-use/

~~~

9.
+# Revert the modified settings.
+$primary->adjust_conf('postgresql.conf', 'wal_level', 'replica');
+$primary->adjust_conf('postgresql.conf', 'max_wal_senders', '10');
+
+# Add other settings to test if we disable logical decoding when
invalidating the last
+# logical slot.
+$primary->append_conf(
+ 'postgresql.conf',
+ qq[
+min_wal_size = 32MB
+max_wal_size = 32MB
+max_slot_wal_keep_size = 16MB
+]);
+$primary->start;

Can those setting changes all be combined so there is only one primary->start?

~~~

10.
+# Check if logical decoding is disabled after invalidating the last
logical slot.

There are quite a few "check if" comments like this one that could be
worded more strongly. I won't make review comments for all of them,
but here is an example.

SUGGESTION:
Verify that logical decoding is disabled after invalidating the last
logical slot.

~~~

11.
+# Recreate the logical slot to enable logical decoding again.
+$primary->safe_psql('postgres',
+ qq[select pg_drop_replication_slot('test_slot')]);
+$primary->safe_psql('postgres',
+ qq[select pg_create_logical_replication_slot('test_slot', 'pgoutput')])

Should this also call 'test_wal_level' subroutine to verify the states?

~~~

12.
+# Initialize standby2 node and start it with wal_level = 'logical'.
+my $standby2 = PostgreSQL::Test::Cluster->new('standby2');
+$standby2->init_from_backup($primary, 'my_backup', has_streaming => 1);

I got confused around here ... Probably I am mistaken, but I thought:

First, there was $primary (a primary) and $standby1 (a standby)
Then, the $standby1 got promoted (so now $standby1 is a primary and
$primary is the standby)

And, now we are making another $standby2 from the $primary.

AFAIK, that $primary is no longer a primary instance, right?

(???)

So, if $primary is not really the primary anymore, then now I am
unsure if later test comments like below are correct:

+# Regardless of their wal_level values, effective_wal_level values on the
+# standby and the cascaded standby depend on the primary's value, 'logical'.

and

+# Drop the primary's last logical slot, decreasing effective_wal_level to
+# 'replica' on all nodes.

etc.

~~~

13.
+# Check if the effective_wal_level remains 'logical' on both nodes

typo: "the both"

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Вложения

В списке pgsql-hackers по дате отправления: