RE: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От Zhijie Hou (Fujitsu)
Тема RE: Synchronizing slots from primary to standby
Дата
Msg-id OS0PR01MB571663BCE8B28597D462FADE946A2@OS0PR01MB5716.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Synchronizing slots from primary to standby  (Peter Smith <smithpb2250@gmail.com>)
Ответы Re: Synchronizing slots from primary to standby  (Peter Smith <smithpb2250@gmail.com>)
Список pgsql-hackers
On Tuesday, January 9, 2024 9:17 AM Peter Smith <smithpb2250@gmail.com> wrote:
> 
> Here are some review comments for patch v57-0001.

Thanks for the comments!

> 
> ======
> doc/src/sgml/protocol.sgml
> 
> 1. CREATE_REPLICATION_SLOT ... FAILOVER
> 
> +       <varlistentry>
> +        <term><literal>FAILOVER [ <replaceable
> class="parameter">boolean</replaceable> ]</literal></term>
> +        <listitem>
> +         <para>
> +          If true, the slot is enabled to be synced to the physical
> +          standbys so that logical replication can be resumed after failover.
> +         </para>
> +        </listitem>
> +       </varlistentry>
> 
> This syntax says passing the boolean value is optional. So the default needs to
> the specified here in the docs (like what the TWO_PHASE option does).
> 
> ~~~
> 
> 2. ALTER_REPLICATION_SLOT ... FAILOVER
> 
> +      <variablelist>
> +       <varlistentry>
> +        <term><literal>FAILOVER [ <replaceable
> class="parameter">boolean</replaceable> ]</literal></term>
> +        <listitem>
> +         <para>
> +          If true, the slot is enabled to be synced to the physical
> +          standbys so that logical replication can be resumed after failover.
> +         </para>
> +        </listitem>
> +       </varlistentry>
> +      </variablelist>
> 
> This syntax says passing the boolean value is optional. So it needs to be
> specified here in the docs that not passing a value would be the same as
> passing the value true.

The behavior that "not passing a value would be the same as passing the value
true " is due to the rule of defGetBoolean(). And all the options of commands
in this document behave the same in this case, therefore I think we'd better
add document for it in a general place in a separate patch/thread instead of
mentioning this in each option's paragraph.

> 
> ======
> doc/src/sgml/ref/alter_subscription.sgml
> 
> 3.
> +   If <link
> linkend="sql-createsubscription-params-with-failover"><literal>failover</lit
> eral></link>
> +   is enabled, you can temporarily disable it in order to execute
> these commands.
> 
> /in order to/to/

This part has been removed due to design change.

> 
> ~~~
> 
> 4.
> +     <para>
> +      When altering the
> +      <link
> linkend="sql-createsubscription-params-with-slot-name"><literal>slot_nam
> e</literal></link>,
> +      the <literal>failover</literal> property of the new slot may
> differ from the
> +      <link
> linkend="sql-createsubscription-params-with-failover"><literal>failover</lit
> eral></link>
> +      parameter specified in the subscription. When creating the slot,
> +      ensure the slot failover property matches the
> +      <link
> linkend="sql-createsubscription-params-with-failover"><literal>failover</lit
> eral></link>
> +      parameter value of the subscription.
> +     </para>
> 
> 4a.
> the <literal>failover</literal> property of the new slot may differ
> 
> Maybe it would be more clear if that said "the failover property value of the
> named slot...".

Changed.

> 
> ~
> 
> 4b.
> In the "failover property matches" part should that failover also be rendered as
> <literal> like before in the same paragraph?

Added.

> 
> ======
> doc/src/sgml/system-views.sgml
> 
> 5.
> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>failover</structfield> <type>bool</type>
> +      </para>
> +      <para>
> +       True if this logical slot is enabled to be synced to the physical
> +       standbys so that logical replication can be resumed from the new
> primary
> +       after failover. Always false for physical slots.
> +      </para></entry>
> +     </row>
> 
> /True if this logical slot is enabled.../True if this is a logical slot enabled.../

Changed.

> 
> ======
> src/backend/commands/subscriptioncmds.c
> 
> 6. CreateSubscription
> 
> + /*
> + * Even if failover is set, don't create the slot with failover
> + * enabled. Will enable it once all the tables are synced and
> + * ready. The intention is that if failover happens at the time of
> + * table-sync, user should re-launch the subscription instead of
> + * relying on main slot (if synced) with no table-sync data
> + * present. When the subscription has no tables, leave failover as
> + * false to allow ALTER SUBSCRIPTION ... REFRESH PUBLICATION to
> + * work.
> + */
> + if (opts.failover && !opts.copy_data && tables != NIL)
> + failover_enabled = true;
> 
> AFAICT it might be possible for this to set failover_enabled = true if copy_data
> is false. So failover_enabled would be true when later
> calling:
>   walrcv_create_slot(wrconn, opts.slot_name, false, twophase_enabled,
>     failover_enabled, CRS_NOEXPORT_SNAPSHOT, NULL);
> 
> Isn't that contrary to what this comment said: "Even if failover is set, don't
> create the slot with failover enabled"

This part has been removed due to design change.

> 
> ~~~
> 
> 7. AlterSubscription. case ALTER_SUBSCRIPTION_OPTIONS:
> 
> + if (!sub->slotname)
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("cannot set failover for subscription that does not have slot
> + name")));
> 
> /for subscription that does not have slot name/for a subscription that does not
> have a slot name/

Changed.

> 
> ======
> .../libpqwalreceiver/libpqwalreceiver.c
> 
> 8.
> + if (PQresultStatus(res) != PGRES_COMMAND_OK) ereport(ERROR,
> + (errcode(ERRCODE_PROTOCOL_VIOLATION),
> + errmsg("could not alter replication slot \"%s\"", slotname)));
> 
> This used to display the error message like
> pchomp(PQerrorMessage(conn->streamConn)) but it was removed. Is it OK?

I added this back.

> 
> ======
> src/backend/replication/logical/tablesync.c
> 
> 9.
> + if (MySubscription->twophasestate ==
> + LOGICALREP_TWOPHASE_STATE_PENDING)
> + ereport(LOG,
> + /* translator: %s is a subscription option */ (errmsg("logical
> + replication apply worker for subscription \"%s\"
> will restart so that %s can be enabled",
> + MySubscription->name, "two_phase")));
> +
> + if (MySubscription->failoverstate ==
> + LOGICALREP_FAILOVER_STATE_PENDING)
> + ereport(LOG,
> + /* translator: %s is a subscription option */ (errmsg("logical
> + replication apply worker for subscription \"%s\"
> will restart so that %s can be enabled",
> + MySubscription->name, "failover")));
> 
> Those errors have multiple %s, so the translator's comment should say "the
> 2nd %s is a..."

This part has been removed due to design change.

> 
> ~~~
> 
> 10.
>  void
> -UpdateTwoPhaseState(Oid suboid, char new_state)
> +EnableTwoPhaseFailoverTriState(Oid suboid, bool enable_twophase,
> +    bool enable_failover)
> 
> I felt the function name was a bit confusing. Maybe it is simpler to call it like
> "EnableTriState" or "EnableSubTriState" -- the parameters anyway specify what
> actual state(s) will be set.

This part has been removed due to design change.

> 
> ======
> src/backend/replication/logical/worker.c
> 
> 11.
> + /* Update twophase and/or failover */
> + EnableTwoPhaseFailoverTriState(MySubscription->oid, twophase_pending,
> +    failover_pending);
> + if (twophase_pending)
> + MySubscription->twophasestate =
> LOGICALREP_TWOPHASE_STATE_ENABLED;
> +
> + if (failover_pending)
> + MySubscription->failoverstate = LOGICALREP_FAILOVER_STATE_ENABLED;
> 
> Can't you pass the MySubscription as a parameter and then the
> EnableTwoPhaseFailoverTriState can also set these
> LOGICALREP_TWOPHASE_STATE_ENABLED/LOGICALREP_FAILOVER_STATE_EN
> ABLED
> states within the Enable* function?

This part has been removed due to design change.

> 
> ======
> src/backend/replication/repl_gram.y
> 
> 12.
>  %token K_CREATE_REPLICATION_SLOT
>  %token K_DROP_REPLICATION_SLOT
> +%token K_ALTER_REPLICATION_SLOT
> 
> and
> 
> + create_replication_slot drop_replication_slot alter_replication_slot
> + identify_system read_replication_slot timeline_history show
> + upload_manifest
> 
> and
> 
>   | create_replication_slot
>   | drop_replication_slot
> + | alter_replication_slot
> 
> and
> 
>   | K_CREATE_REPLICATION_SLOT { $$ = "create_replication_slot"; }
>   | K_DROP_REPLICATION_SLOT { $$ = "drop_replication_slot"; }
> + | K_ALTER_REPLICATION_SLOT { $$ = "alter_replication_slot"; }
> 
> etc.
> 
> ~
> 
> Although it makes no difference IMO it is more natural to code everything in
> the order: create, alter, drop.
> 
> ======
> src/backend/replication/repl_scanner.l
> 
> 13.
>  CREATE_REPLICATION_SLOT { return K_CREATE_REPLICATION_SLOT; }
> DROP_REPLICATION_SLOT { return K_DROP_REPLICATION_SLOT; }
> +ALTER_REPLICATION_SLOT { return K_ALTER_REPLICATION_SLOT; }
> 
> and
> 
>   case K_CREATE_REPLICATION_SLOT:
>   case K_DROP_REPLICATION_SLOT:
> + case K_ALTER_REPLICATION_SLOT:
> 
> Although it makes no difference IMO it is more natural to code everything in
> the order: create, alter, drop.

Personally, I am not sure if it looks better, so I didn’t change this.

> 
> ======
> src/backend/replication/slot.c
> 
> 14.
> + if (SlotIsPhysical(MyReplicationSlot))
> + ereport(ERROR,
> + errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("cannot use %s with a physical replication slot",
> +    "ALTER_REPLICATION_SLOT"));
> 
> /with a/for a/

This is to be consistent with another error message, so I didn’t change this.

    errmsg("cannot use %s with a logical replication slot",
        "READ_REPLICATION_SLOT"));

> 
> ======
> src/backend/replication/walsender.c
> 
> 15.
> +static void
> +ParseAlterReplSlotOptions(AlterReplicationSlotCmd *cmd, bool *failover)
> +{
> + ListCell   *lc;
> + bool failover_given = false;
> +
> + /* Parse options */
> + foreach(lc, cmd->options)
> + {
> + DefElem    *defel = (DefElem *) lfirst(lc);
> 
> AFAIK there are some new-style macros now you can use for this code.
> e.g. foreach_ptr? See [1].

Changed.

> 
> ~~~
> 
> 16.
> + if (strcmp(defel->defname, "failover") == 0) { if (failover_given)
> + ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("conflicting or
> + redundant options"))); failover_given = true; *failover =
> + defGetBoolean(defel); }
> 
> The documented syntax showed that passing the boolean value for the
> FAILOVER option is not mandatory. Does this code work if the boolean value is
> not passed?

It works, defGetBoolean will handle this case.
 
> 
> ======
> src/bin/psql/tab-complete.c
> 
> 17.
> I think "ALTER SUBSCRIPTION ... SET (failover)" is possible, but the ALTER
> SUBSCRIPTION tab completion code is missing.

Added.

> ======
> .../t/050_standby_failover_slots_sync.pl
> 
> 19.
> +
> +# Copyright (c) 2023, PostgreSQL Global Development Group
> +
> 
> /2023/2024/

Changed.

> 
> ~~~
> 
> 20.
> +# Create another subscription (using the same slot created above) that
> +enables # failover.
> +$subscriber1->safe_psql(
> + 'postgres', qq[
> + CREATE TABLE tab_int (a int PRIMARY KEY);  CREATE SUBSCRIPTION
> +regress_mysub1 CONNECTION '$publisher_connstr'
> PUBLICATION regress_mypub WITH (slot_name = lsub1_slot, copy_data=false,
> failover = true, create_slot = false);
> 
> The comment should not say "Create another subscription" because this is the
> first subscription being created.
> 
> /another/a/

Changed.

> 
> ~~~
> 
> 21.
> +##################################################
> +# Test if changing the failover property of a subscription updates the
> +# corresponding failover property of the slot.
> +##################################################
> 
> /Test if/Test that/

Changed.

> 
> ======
> src/test/regress/sql/subscription.sql
> 
> 22.
> +CREATE SUBSCRIPTION regress_testsub CONNECTION
> 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false,
> failover = true);
> +
> +\dRs+
> 
> This is currently only testing the explicit "failover=true".
> 
> Maybe you can also test the other kinds work as expected:
> - explicit "SET (failover=false)"
> - explicit "SET (failover)" with no value specified

I think these tests don't add enough value to catch future bugs,
so I prefer not to add these.

Best Regards,
Hou zj

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Zhijie Hou (Fujitsu)"
Дата:
Сообщение: RE: Synchronizing slots from primary to standby
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Synchronizing slots from primary to standby