Обсуждение: [GENERAL] issue performing a switchover with repmgr

Поиск
Список
Период
Сортировка

[GENERAL] issue performing a switchover with repmgr

От
Dylan Luong
Дата:

Hi

 

I have setup a master/standby on PostgreSQL95 on two test servers and trialing out repmgr. (https://github.com/2ndQuadrant/repmgr/)

 

I am testing a switchover using the following:

 

-bash-4.1$ repmgr -f /etc/repmgr/9.5/repmgr.conf -C /etc/repmgr/9.5/repmgr.conf standby switchover -L DEBUG -v

 

The switchover appears to hang at the last part of the switchover process….

 

NOTICE: restarting server using '/usr/pgsql-9.5/bin/pg_ctl  -w -D /var/lib/pgsql/9.5/data -m fast restart'

pg_ctl: PID file "/var/lib/pgsql/9.5/data/postmaster.pid" does not exist

Is server running?

starting server anyway

 

It appears to have worked though as when I run the cluster show command on both servers it showing the switchover.

-bash-4.1$ repmgr -f /etc/repmgr/9.5/repmgr.conf cluster show

Role      | Name           | Upstream       | Connection String

----------+----------------|----------------|-------------------------------------------

* master  | itupl-postgen2 |                | host=10.70.3.252 dbname=repmgr user=repmgr

  standby | itupl-postgen1 | itupl-postgen2 | host=10.70.3.251 dbname=repmgr user=repmgr

 

It is also showing correctly in repl_nodes table of the two databases.

 

Why is it hanging?? Thank you for your help…

 

Here is the complete output:

-----------------------------------------------

-bash-4.1$ repmgr -f /etc/repmgr/9.5/repmgr.conf -C /etc/repmgr/9.5/repmgr.conf standby switchover -L DEBUG -v

NOTICE: using configuration file "/etc/repmgr/9.5/repmgr.conf"

NOTICE: switching current node 2 to master server and demoting current master to standby...

DEBUG: connecting to: 'host=10.70.3.252 dbname=repmgr user=repmgr fallback_application_name='repmgr''

DEBUG: is_standby(): SELECT pg_catalog.pg_is_in_recovery()

INFO: retrieving node list for cluster 'repmgr_cluster'

DEBUG: get_master_connection():

  SELECT id, conninfo,          CASE WHEN type = 'master' THEN 1 ELSE 2 END AS type_priority    FROM "repmgr_repmgr_cluster".repl_nodes    WHERE cluster = 'repmgr_cluster'      AND type != 'witness' ORDER BY active DESC, type_priority, priority, id

INFO: checking role of cluster node '1'

DEBUG: connecting to: 'host=10.70.3.251 dbname=repmgr user=repmgr fallback_application_name='repmgr''

DEBUG: is_standby(): SELECT pg_catalog.pg_is_in_recovery()

DEBUG: get_master_connection(): current master node is 1

DEBUG: get_node_record():

SELECT id, type, upstream_node_id, name, conninfo,        slot_name, priority, active  FROM "repmgr_repmgr_cluster".repl_nodes  WHERE cluster = 'repmgr_cluster'    AND id = 1

DEBUG: remote node name is "itupl-postgen1"

DEBUG: test_ssh_connection(): executing ssh -o Batchmode=yes  10.70.3.251 /bin/true 2>/dev/null

DEBUG: get_pg_setting(): SELECT name, setting   FROM pg_catalog.pg_settings WHERE name = 'data_directory'

DEBUG: get_pg_setting(): returned value is "/var/lib/pgsql/9.5/data"

DEBUG: master's data directory is: /var/lib/pgsql/9.5/data

DEBUG: remote_command(): ssh -o Batchmode=yes  10.70.3.251 ls '/var/lib/pgsql/9.5/data/PG_VERSION' >/dev/null 2>&1 && echo 1 || echo 0

DEBUG: remote_command(): output returned was:

1

DEBUG: PG_VERSION found in /var/lib/pgsql/9.5/data

DEBUG: remote_command(): ssh -o Batchmode=yes  10.70.3.251 ls '/usr/pgsql-9.5/bin/pg_rewind' >/dev/null 2>&1 && echo 1 || echo 0

DEBUG: remote_command(): output returned was:

1

DEBUG: guc_set():

SELECT true FROM pg_catalog.pg_settings  WHERE name = 'full_page_writes' AND setting = 'off'

DEBUG: guc_set():

SELECT true FROM pg_catalog.pg_settings  WHERE name = 'wal_log_hints' AND setting = 'on'

INFO: looking for file "/etc/repmgr/9.5/repmgr.conf" on remote server "10.70.3.251"

DEBUG: remote_command(): ssh -o Batchmode=yes  10.70.3.251 ls '/etc/repmgr/9.5/repmgr.conf' >/dev/null 2>&1 && echo 1 || echo 0

DEBUG: remote_command(): output returned was:

1

INFO: remote configuration file "/etc/repmgr/9.5/repmgr.conf" found on remote server

DEBUG: remote_archive_config_dir: /tmp/repmgr-itupl-postgen1-archive

DEBUG: Executing:

/usr/pgsql-9.5/bin/repmgr standby archive-config -f '/etc/repmgr/9.5/repmgr.conf' --config-archive-dir='/tmp/repmgr-itupl-postgen1-archive'

DEBUG: remote_command(): ssh -o Batchmode=yes  10.70.3.251 /usr/pgsql-9.5/bin/repmgr standby archive-config -f '/etc/repmgr/9.5/repmgr.conf' --config-archive-dir='/tmp/repmgr-itupl-postgen1-archive'

 

WARNING:  nonstandard use of escape in a string literal

LINE 1: ...config_file,          regexp_replace(config_file, '^.*\/',''...

                                                             ^

HINT:  Use the escape string syntax for escapes, e.g., E'\r\n'.

NOTICE: 3 files copied to /tmp/repmgr-itupl-postgen1-archive

DEBUG: remote_command(): output returned was:

DEBUG: remote_command(): ssh -o Batchmode=yes  10.70.3.251 /usr/pgsql-9.5/bin/pg_ctl -D '/var/lib/pgsql/9.5/data' -m fast -W stop >/dev/null 2>&1 && echo 1 || echo 0

DEBUG: remote_command(): output returned was:

1

DEBUG: remote_command(): ssh -o Batchmode=yes  10.70.3.251 ls '/var/lib/pgsql/9.5/data/postmaster.pid' >/dev/null 2>&1 && echo 1 || echo 0

DEBUG: remote_command(): output returned was:

0

NOTICE: current master has been stopped

INFO: connecting to standby database

DEBUG: connecting to: 'host=10.70.3.252 dbname=repmgr user=repmgr fallback_application_name='repmgr''

INFO: connected to standby, checking its state

DEBUG: is_standby(): SELECT pg_catalog.pg_is_in_recovery()

INFO: retrieving node list for cluster 'repmgr_cluster'

DEBUG: get_master_connection():

  SELECT id, conninfo,          CASE WHEN type = 'master' THEN 1 ELSE 2 END AS type_priority    FROM "repmgr_repmgr_cluster".repl_nodes    WHERE cluster = 'repmgr_cluster'      AND type != 'witness' ORDER BY active DESC, type_priority, priority, id

INFO: checking role of cluster node '1'

DEBUG: connecting to: 'host=10.70.3.251 dbname=repmgr user=repmgr fallback_application_name='repmgr''

ERROR: connection to database failed: could not connect to server: Connection refused

        Is the server running on host "10.70.3.251" and accepting

        TCP/IP connections on port 5432?

 

INFO: checking role of cluster node '2'

DEBUG: connecting to: 'host=10.70.3.252 dbname=repmgr user=repmgr fallback_application_name='repmgr''

DEBUG: is_standby(): SELECT pg_catalog.pg_is_in_recovery()

NOTICE: promoting standby

DEBUG: get_pg_setting(): SELECT name, setting   FROM pg_catalog.pg_settings WHERE name = 'data_directory'

DEBUG: get_pg_setting(): returned value is "/var/lib/pgsql/9.5/data"

NOTICE: promoting server using '/usr/pgsql-9.5/bin/pg_ctl -D /var/lib/pgsql/9.5/data promote'

server promoting

INFO: reconnecting to promoted server

DEBUG: connecting to: 'host=10.70.3.252 dbname=repmgr user=repmgr fallback_application_name='repmgr''

DEBUG: is_standby(): SELECT pg_catalog.pg_is_in_recovery()

DEBUG: is_standby(): SELECT pg_catalog.pg_is_in_recovery()

DEBUG: setting node 2 as master and marking existing master as failed

DEBUG: begin_transaction()

DEBUG: commit_transaction()

NOTICE: STANDBY PROMOTE successful

DEBUG: create_event_record():

INSERT INTO "repmgr_repmgr_cluster".repl_events (              node_id,              event,              successful,              details             )       VALUES ($1, $2, $3, $4)    RETURNING event_timestamp

DEBUG: create_event_record(): Event timestamp is "2017-05-22 16:56:06.860066+09:30"

NOTICE: Executing pg_rewind on old master server

DEBUG: pg_rewind command is:

'/usr/pgsql-9.5/bin/pg_rewind' -D '/var/lib/pgsql/9.5/data' --source-server=\'host=10.70.3.252 dbname=repmgr user=repmgr\'

DEBUG: remote_command(): ssh -o Batchmode=yes  10.70.3.251 '/usr/pgsql-9.5/bin/pg_rewind' -D '/var/lib/pgsql/9.5/data' --source-server=\'host=10.70.3.252 dbname=repmgr user=repmgr\'

 

DEBUG: remote_command(): output returned was:

servers diverged at WAL position 1/1D000098 on timeline 11

no rewind required

DEBUG: remote_command(): ssh -o Batchmode=yes  10.70.3.251 /usr/pgsql-9.5/bin/repmgr standby restore-config -D '/var/lib/pgsql/9.5/data'  --config-archive-dir='/tmp/repmgr-itupl-postgen1-archive'

 

ERROR: unable to determine cluster name - please provide a valid configuration file with -c/--config-file

HINT: Use -F/--force to continue anyway

DEBUG: remote_command(): output returned was:

DEBUG: remote_command(): ssh -o Batchmode=yes  10.70.3.251 test -e '/var/lib/pgsql/9.5/data/recovery.done' && rm -f '/var/lib/pgsql/9.5/data/recovery.done'

 

DEBUG: remote_command(): output returned was:

DEBUG: Executing:

/usr/pgsql-9.5/bin/repmgr -D '/var/lib/pgsql/9.5/data' -f '/etc/repmgr/9.5/repmgr.conf' -h 10.70.3.252 -d repmgr -U repmgr  standby follow

DEBUG: remote_command(): ssh -o Batchmode=yes  10.70.3.251 /usr/pgsql-9.5/bin/repmgr -D '/var/lib/pgsql/9.5/data' -f '/etc/repmgr/9.5/repmgr.conf' -h 10.70.3.252 -d repmgr -U repmgr  standby follow

 

NOTICE: restarting server using '/usr/pgsql-9.5/bin/pg_ctl  -w -D /var/lib/pgsql/9.5/data -m fast restart'

pg_ctl: PID file "/var/lib/pgsql/9.5/data/postmaster.pid" does not exist

Is server running?

starting server anyway

 

 

Regards

Dylan

 

Re: [GENERAL] issue performing a switchover with repmgr

От
Adrian Klaver
Дата:
On 05/22/2017 01:15 AM, Dylan Luong wrote:
> Hi
>
> I have setup a master/standby on PostgreSQL95 on two test servers and
> trialing out repmgr. (https://github.com/2ndQuadrant/repmgr/)
>
> I am testing a switchover using the following:
>
> -bash-4.1$ repmgr -f /etc/repmgr/9.5/repmgr.conf -C
> /etc/repmgr/9.5/repmgr.conf standby switchover -L DEBUG -v
>
> The switchover appears to hang at the last part of the switchover process….
>
> /NOTICE: restarting server using '/usr/pgsql-9.5/bin/pg_ctl  -w -D
> /var/lib/pgsql/9.5/data -m fast restart'/
>
> /pg_ctl: PID file "/var/lib/pgsql/9.5/data/postmaster.pid" does not exist/
>
> /Is server running?/
>
> /starting server anyway/
>
> It appears to have worked though as when I run the cluster show command
> on both servers it showing the switchover.
>
> /-bash-4.1$ repmgr -f /etc/repmgr/9.5/repmgr.conf cluster show/
>
> /Role      | Name           | Upstream       | Connection String/
>
> /----------+----------------|----------------|-------------------------------------------/
>
> /* master  | itupl-postgen2 |                | host=10.70.3.252
> dbname=repmgr user=repmgr/
>
> /  standby | itupl-postgen1 | itupl-postgen2 | host=10.70.3.251
> dbname=repmgr user=repmgr/
>
> It is also showing correctly in repl_nodes table of the two databases.
>
> Why is it hanging?? Thank you for your help…

You are using -w

https://www.postgresql.org/docs/9.5/static/app-pg-ctl.html

"-w

     Wait for the startup or shutdown to complete. Waiting is the
default option for shutdowns, but not startups. When waiting for
startup, pg_ctl repeatedly attempts to connect to the server. When
waiting for shutdown, pg_ctl waits for the server to remove its PID
file. This option allows the entry of an SSL passphrase on startup.
pg_ctl returns an exit code based on the success of the startup or shutdown.
"

So pg_ctl was trying to connect the server and did not find it at first:

"pg_ctl: PID file "/var/lib/pgsql/9.5/data/postmaster.pid" does not
exist. Is server running?"

but continued with the process:

"starting server anyway"

FYI in Postgres 10+ -w is the default for pg_ctl.



>
> Here is the complete output:
>
> /----------------------------------------------- /
>



--
Adrian Klaver
adrian.klaver@aklaver.com


Re: [GENERAL] issue performing a switchover with repmgr

От
Dylan Luong
Дата:
Thanks for you answer. So is there a way to remove this -w from the repmgr switchover process?

-----Original Message-----
From: Adrian Klaver [mailto:adrian.klaver@aklaver.com]
Sent: Monday, 22 May 2017 10:27 PM
To: Dylan Luong <Dylan.Luong@unisa.edu.au>; pgsql-general@postgresql.org
Subject: Re: [GENERAL] issue performing a switchover with repmgr

On 05/22/2017 01:15 AM, Dylan Luong wrote:
> Hi
>
> I have setup a master/standby on PostgreSQL95 on two test servers and
> trialing out repmgr. (https://github.com/2ndQuadrant/repmgr/)
>
> I am testing a switchover using the following:
>
> -bash-4.1$ repmgr -f /etc/repmgr/9.5/repmgr.conf -C
> /etc/repmgr/9.5/repmgr.conf standby switchover -L DEBUG -v
>
> The switchover appears to hang at the last part of the switchover process....
>
> /NOTICE: restarting server using '/usr/pgsql-9.5/bin/pg_ctl  -w -D
> /var/lib/pgsql/9.5/data -m fast restart'/
>
> /pg_ctl: PID file "/var/lib/pgsql/9.5/data/postmaster.pid" does not
> exist/
>
> /Is server running?/
>
> /starting server anyway/
>
> It appears to have worked though as when I run the cluster show
> command on both servers it showing the switchover.
>
> /-bash-4.1$ repmgr -f /etc/repmgr/9.5/repmgr.conf cluster show/
>
> /Role      | Name           | Upstream       | Connection String/
>
> /----------+----------------|----------------|------------------------
> -------------------/
>
> /* master  | itupl-postgen2 |                | host=10.70.3.252
> dbname=repmgr user=repmgr/
>
> /  standby | itupl-postgen1 | itupl-postgen2 | host=10.70.3.251
> dbname=repmgr user=repmgr/
>
> It is also showing correctly in repl_nodes table of the two databases.
>
> Why is it hanging?? Thank you for your help...

You are using -w

https://www.postgresql.org/docs/9.5/static/app-pg-ctl.html

"-w

     Wait for the startup or shutdown to complete. Waiting is the default option for shutdowns, but not startups. When
waitingfor startup, pg_ctl repeatedly attempts to connect to the server. When waiting for shutdown, pg_ctl waits for
theserver to remove its PID file. This option allows the entry of an SSL passphrase on startup.  
pg_ctl returns an exit code based on the success of the startup or shutdown.
"

So pg_ctl was trying to connect the server and did not find it at first:

"pg_ctl: PID file "/var/lib/pgsql/9.5/data/postmaster.pid" does not exist. Is server running?"

but continued with the process:

"starting server anyway"

FYI in Postgres 10+ -w is the default for pg_ctl.



>
> Here is the complete output:
>
> /----------------------------------------------- /
>



--
Adrian Klaver
adrian.klaver@aklaver.com


Re: [GENERAL] issue performing a switchover with repmgr

От
Adrian Klaver
Дата:
On 05/22/2017 05:14 PM, Dylan Luong wrote:
> Thanks for you answer. So is there a way to remove this -w from the repmgr switchover process?
>

There might be but I would not do it. Pretty sure there is a reason why
repmgr wants to make sure the server is accepting connections before
moving on from the pg_ctl command. As I mentioned earlier, in Postgres
10+ this is going to be the default behavior for pg_ctl anyway. The
bottom line is, as you stated earlier, it does not affect the outcome of
the switch over.



--
Adrian Klaver
adrian.klaver@aklaver.com