[PATCH] Stop ALTER SYSTEM from making bad assumptions

Поиск
Список
Период
Сортировка
От Ian Barwick
Тема [PATCH] Stop ALTER SYSTEM from making bad assumptions
Дата
Msg-id aed6cc9f-98f3-2693-ac81-52bb0052307e@2ndquadrant.com
обсуждение исходный текст
Ответы Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions  (Stephen Frost <sfrost@snowman.net>)
Список pgsql-hackers
Hi

Consider the following cascading standby setup with PostgreSQL 12:

- there exists a running primary "A"
- standby "B" is cloned from primary "A" using "pg_basebackup --write-recovery-conf"
- standby "C" is cloned from standby "B" using "pg_basebackup --write-recovery-conf"

So far, so good, everything works as expected.

Now, for whatever reason, the user wishes standby "C" to follow another upstream
node (which one is not important here), so the user, in the comfort of their own psql
command line (no more pesky recovery.conf editing!) issues the following:

     ALTER SYSTEM SET primary_conninfo = 'host=someothernode';

and restarts the instance, and... it stays connected to the original upstream node.

Which is unexpected.

Examining the the restarted instance, "SHOW primary_conninfo" still displays
the original value for "primary_conninfo". Mysteriouser and mysteriouser.

What has happened here is that with the option "--write-recovery-conf", Pg12's
pg_basebackup (correctly IMHO) appends the recovery settings to "postgresql.auto.conf".

However, on standby "C", pg_basebackup has dutifully copied over standby "B"'s
existing "postgresql.auto.conf", which already contains standby "B"'s
replication configuration, and appended standby "C"'s replication configuration
to that, which (before "ALTER SYSTEM" was invoked) looked something like this:

    # Do not edit this file manually!
    # It will be overwritten by the ALTER SYSTEM command.
    primary_conninfo = 'host=node_A'
    primary_conninfo = 'host=node_B'

which is expected, and works because the last entry in the file is evaluated, so
on startup, standby "C" follows standby "B".

However, executing "ALTER SYSTEM SET primary_conninfo = 'host=someothernode'" left
standby "C"'s "postgresql.auto.conf" file looking like this:

    # Do not edit this file manually!
    # It will be overwritten by the ALTER SYSTEM command.
    primary_conninfo = 'host=someothernode'
    primary_conninfo = 'host=node_B'

which seems somewhat broken, to say the least.

As-is, the user will either need to repeatedly issue "ALTER SYSTEM RESET primary_conninfo"
until the duplicates are cleared (which means "ALTER SYSTEM RESET ..." doesn't work as
advertised, and is not an obvious solution anyway), or ignore the "Do not edit this file manually!"
warning and remove the offending entry/entries (which, if done safely, should involve
shutting down the instance first).

Note this issue is not specific to pg_basebackup, primary_conninfo (or any other settings
formerly in recovery.conf), it has just manifested itself as the built-in toolset as of now
provides a handy way of getting into this situation without too much effort (and any
utilities which clone standbys and append the replication configuration to
"postgresql.auto.conf" in lieu of creating "recovery.conf" will be prone to running into
the same situation).

I had previously always assumed that ALTER SYSTEM  would change the *last* occurrence for
the parameter in "postgresql.auto.conf", in the same way you'd need to be sure to change
the last occurrence in the normal configuration files, however this actually not the case -
as per replace_auto_config_value() ( src/backend/utils/misc/guc.c ):

     /* Search the list for an existing match (we assume there's only one) */

the *first* match is replaced.

Attached patch attempts to rectify this situation by having replace_auto_config_value()
deleting any duplicate entries first, before making any changes to the last entry.

Arguably it might be sufficient (and simpler) to just scan the list for the last
entry, but removing preceding duplicates seems cleaner, as it's pointless
(and a potential source of confusion) keeping entries around which will never be used.

Also attached is a set of TAP tests to check ALTER SYSTEM works as expected (or
at least as seems correct to me).


Regards

Ian Barwick

-- 
  Ian Barwick                   https://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Timur Birsh
Дата:
Сообщение: Re: [PATCH] vacuumlo: print the number of large objects going to be removed
Следующее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: SQL/JSON path issues/questions