Re: received immediate shutdown request caused cluster failover

Поиск
Список
Период
Сортировка
От Yi Sun
Тема Re: received immediate shutdown request caused cluster failover
Дата
Msg-id CABWY_HCqpj26Svco_AnveXXJorbHcyCok2QQP4arpc2-RyJN2Q@mail.gmail.com
обсуждение исходный текст
Ответ на received immediate shutdown request caused cluster failover  (Yi Sun <yinan81@gmail.com>)
Ответы Re: received immediate shutdown request caused cluster failover  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
Hi guys,

Besides command run(like pg_ctl) can cause "received immediate shutdown request"  any other reason can cause this please?

This production DB, support colleague said didn't run it

Yi Sun <yinan81@gmail.com> 于2020年11月18日周三 上午11:54写道:
Hi all

There are 3 nodes in our prd db in patroni cluster, vm01 is leader, vm02 and vm03 are standby, vm01 received immediate shutdown request  caused failover to vm02, after that vm03 received fast shutdown request 

As vm03 not in cluster so have to reinit vm03

What's the possible root caused vm01 received immediate shutdown request  please?

and What's the possible root caused vm03 received fast shutdown request

Below are detail info, please check, if need any other info, I will provide, thanks

postgresql version: 11.4
OS: centos 7
HA: patroni upgraded from 1.6.3 to 2.0.1 these days

--vm01 pg log

[2020-11-16 12:03:56.592 UTC] p=31485:3@ c=pgwatch2@127.0.0.1/eu4_baas_bckp_epmgr:pgwatch2 LOG:  disconnection: session time: 0:00:00.037 user=pgwatch2 database=eu4_baas_bckp_epmgr host=127.0.0.1 port=36916
[2020-11-16 12:03:57.005 UTC] p=32103:14@ c=@/: LOG:  received immediate shutdown request
[2020-11-16 12:03:57.017 UTC] p=31189:3@314/0 c=atp_si_user@10.253.85.203/eu4_baas_software_inventory:[unknown] WARNING:  terminating connection because of crash of another server process
[2020-11-16 12:03:57.017 UTC] p=31189:4@314/0 c=atp_si_user@10.253.85.203/eu4_baas_software_inventory:[unknown] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
[2020-11-16 12:03:57.017 UTC] p=31189:5@314/0 c=atp_si_user@10.253.85.203/eu4_baas_software_inventory:[unknown] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
[2020-11-16 12:03:57.017 UTC] p=31183:3@313/0 c=atp_pm@10.253.86.62/eu4_baas_atp_pm:[unknown] WARNING:  terminating connection because of crash of another server process
[2020-11-16 12:03:57.017 UTC] p=31183:4@313/0 c=atp_pm@10.253.86.62/eu4_baas_atp_pm:[unknown] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
[2020-11-16 12:03:57.017 UTC] p=31183:5@313/0 c=atp_pm@10.253.86.62/eu4_baas_atp_pm:[unknown] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
[2020-11-16 12:03:57.017 UTC] p=31182:3@310/281059 c=bckp_epmgr@10.253.85.202/eu4_baas_bckp_epmgr:[unknown] WARNING:  terminating connection because of crash of another server process
[2020-11-16 12:03:57.017 UTC] p=31182:4@310/281059 c=bckp_epmgr@10.253.85.202/eu4_baas_bckp_epmgr:[unknown] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
[2020-11-16 12:03:57.017 UTC] p=31182:5@310/281059 c=bckp_epmgr@10.253.85.202/eu4_baas_bckp_epmgr:[unknown] HINT:  In a moment you should be able to reconnect to the database and repeat your command.

--vm01 patroni log
Nov 16 12:03:56 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time": "2020-11-16 12:03:56,922", "name": "patroni.dcs.consul", "process": 32085, "thread": 140350381303616, "level": "WARNING", "message": "Could not register service: unknown role type promoted"}
Nov 16 12:03:56 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time": "2020-11-16 12:03:56,923", "name": "patroni.ha", "process": 32085, "thread": 140350381303616, "level": "INFO", "message": "Lock owner: eu4-baas-patroni-cluster-vm02; I am eu4-baas-patroni-cluster-vm01"}
Nov 16 12:03:56 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time": "2020-11-16 12:03:56,923", "name": "patroni.ha", "process": 32085, "thread": 140350381303616, "level": "INFO", "message": "does not have lock"}
Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time": "2020-11-16 12:03:59,158", "name": "patroni.dcs.consul", "process": 32085, "thread": 140350381303616, "level": "INFO", "message": "Deregister service pgcluster11/eu4-baas-patroni-cluster-vm01"}
Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time": "2020-11-16 12:03:59,180", "name": "patroni", "process": 32085, "thread": 140350381303616, "level": "INFO", "message": "demoting self because i do not have the lock and i was a leader"}
Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time": "2020-11-16 12:03:59,181", "name": "patroni", "process": 32085, "thread": 140350381303616, "level": "WARNING", "message": "Loop time exceeded, rescheduling immediately."}
Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time": "2020-11-16 12:03:59,184", "name": "patroni.ha", "process": 32085, "thread": 140350381303616, "level": "INFO", "message": "Lock owner: eu4-baas-patroni-cluster-vm02; I am eu4-baas-patroni-cluster-vm01"}
Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time": "2020-11-16 12:03:59,185", "name": "patroni", "process": 32085, "thread": 140350381303616, "level": "INFO", "message": "starting after demotion in progress"}
Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time": "2020-11-16 12:03:59,186", "name": "patroni.postgresql.connection", "process": 32085, "thread": 140349984724736, "level": "INFO", "message": "closed patroni connection to the postgresql cluster"}
Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time": "2020-11-16 12:03:59,668", "name": "patroni.postgresql.postmaster", "process": 32085, "thread": 140349984724736, "level": "INFO", "message": "postmaster pid=31495"}
Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: localhost:5432 - no response

--vm03 pg log
[2020-11-16 12:03:53.484 UTC] p=24090:1@ c=[unknown]@::1/[unknown]:[unknown] LOG:  connection received: host=::1 port=39442
[2020-11-16 12:03:53.485 UTC] p=24090:2@9/1878954 c=replicator@::1/[unknown]:[unknown] LOG:  replication connection authorized: user=replicator
[2020-11-16 12:03:53.487 UTC] p=24090:3@ c=replicator@::1/[unknown]:[unknown] LOG:  disconnection: session time: 0:00:00.002 user=replicator database= host=::1 port=39442
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
[2020-11-16 12:03:57.108 UTC] p=27711:6@ c=@/: FATAL:  could not receive data from WAL stream: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[2020-11-16 12:03:57.144 UTC] p=27701:7@1/0 c=@/: LOG:  invalid record length at E052/93EED170: wanted 24, got 0
[2020-11-16 12:03:57.419 UTC] p=24097:1@ c=@/: FATAL:  could not connect to the primary server: FATAL:  the database system is shutting down

[2020-11-16 12:05:38.581 UTC] p=25380:2@ c=postgres@::1/postgres:[unknown] FATAL:  the database system is starting up
[2020-11-16 12:05:41.010 UTC] p=24113:4@1/0 c=@/: LOG:  consistent recovery state reached at E052/93EED170
[2020-11-16 12:05:41.010 UTC] p=24113:5@1/0 c=@/: LOG:  invalid record length at E052/93EED170: wanted 24, got 0
[2020-11-16 12:05:41.011 UTC] p=24109:8@ c=@/: LOG:  database system is ready to accept read only connections
[2020-11-16 12:05:43.280 UTC] p=25400:1@ c=[unknown]@::1/[unknown]:[unknown] LOG:  connection received: host=::1 port=40922
[2020-11-16 12:05:43.392 UTC] p=25401:1@ c=[unknown]@::1/[unknown]:[unknown] LOG:  connection received: host=::1 port=40924
[2020-11-16 12:05:43.393 UTC] p=25401:2@2/2 c=postgres@::1/postgres:[unknown] LOG:  connection authorized: user=postgres database=postgres
[2020-11-16 12:05:43.580 UTC] p=25402:1@ c=[unknown]@::1/[unknown]:[unknown] LOG:  connection received: host=::1 port=40926
[2020-11-16 12:05:43.628 UTC] p=25402:2@3/1 c=replicator@::1/[unknown]:[unknown] LOG:  replication connection authorized: user=replicator
[2020-11-16 12:05:43.630 UTC] p=25402:3@ c=replicator@::1/[unknown]:[unknown] LOG:  disconnection: session time: 0:00:00.049 user=replicator database= host=::1 port=40926
[2020-11-16 12:05:43.647 UTC] p=25406:1@ c=[unknown]@::1/[unknown]:[unknown] LOG:  connection received: host=::1 port=40932
[2020-11-16 12:05:43.648 UTC] p=25406:2@3/2 c=replicator@::1/[unknown]:[unknown] LOG:  replication connection authorized: user=replicator
[2020-11-16 12:05:43.650 UTC] p=25406:3@ c=replicator@::1/[unknown]:[unknown] LOG:  disconnection: session time: 0:00:00.002 user=replicator database= host=::1 port=40932
[2020-11-16 12:05:44.006 UTC] p=24109:9@ c=@/: LOG:  received fast shutdown request
[2020-11-16 12:05:44.015 UTC] p=24109:10@ c=@/: LOG:  aborting any active transactions
[2020-11-16 12:05:44.015 UTC] p=25401:3@2/0 c=postgres@::1/postgres:Patroni FATAL:  terminating connection due to administrator command
[2020-11-16 12:05:44.015 UTC] p=25401:4@ c=postgres@::1/postgres:Patroni LOG:  disconnection: session time: 0:00:00.622 user=postgres database=postgres host=::1 port=40924
[2020-11-16 12:05:44.509 UTC] p=24120:1@ c=@/: LOG:  shutting down
[2020-11-16 12:05:45.621 UTC] p=24109:11@ c=@/: LOG:  database system is shut down

--vm03 patroni log
Nov 16 12:03:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time": "2020-11-16 12:03:51,436", "name": "patroni", "process": 27664, "thread": 140371489281856, "level": "INFO", "message": "following a different leader because i am not the healthiest node"}

Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: pg_controldata: could not read file "/var/lib/pgsql/11/data/global/pg_control": read 0 of 288
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time": "2020-11-16 12:05:51,016", "name": "patroni.postgresql", "process": 27664, "thread": 140371489281856, "level": "ERROR", "message": "Error when calling pg_controldata"}
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: Traceback (most recent call last):
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/__init__.py", line 718, in controldata
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: data = subprocess.check_output([self.pgcommand('pg_controldata'), self._data_dir], env=env)
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: File "/usr/lib64/python3.6/subprocess.py", line 356, in check_output
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: **kwargs).stdout
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: File "/usr/lib64/python3.6/subprocess.py", line 438, in run
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: output=stdout, stderr=stderr)
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: subprocess.CalledProcessError: Command '['/usr/pgsql-11/bin/pg_controldata', '/var/lib/pgsql/11/data']' returned non-zero exit status 1.
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time": "2020-11-16 12:05:51,018", "name": "patroni.ha", "process": 27664, "thread": 140371489281856, "level": "INFO", "message": "pg_controldata:
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: "}
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time": "2020-11-16 12:05:51,022", "name": "patroni.ha", "process": 27664, "thread": 140371489281856, "level": "INFO", "message": "Lock owner: eu4-baas-patroni-cluster-vm02; I am eu4-baas-patroni-cluster-vm03"}
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time": "2020-11-16 12:05:51,022", "name": "patroni.ha", "process": 27664, "thread": 140371489281856, "level": "INFO", "message": "Lock owner: eu4-baas-patroni-cluster-vm02; I am eu4-baas-patroni-cluster-vm03"}
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time": "2020-11-16 12:05:51,031", "name": "patroni.dcs.consul", "process": 27664, "thread": 140371489281856, "level": "INFO", "message": "Deregister service pgcluster11/eu4-baas-patroni-cluster-vm03"}
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time": "2020-11-16 12:05:51,048", "name": "patroni", "process": 27664, "thread": 140371489281856, "level": "INFO", "message": "starting as a secondary"}
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time": "2020-11-16 12:05:51,049", "name": "patroni.postgresql.connection", "process": 27664, "thread": 140371151345408, "level": "INFO", "message": "closed patroni connection to the postgresql cluster"}


Best regards
Oliver Sun







В списке pgsql-general по дате отправления:

Предыдущее
От: Steve Baldwin
Дата:
Сообщение: Locking and postgres_fdw extension
Следующее
От: Tom Lane
Дата:
Сообщение: Re: received immediate shutdown request caused cluster failover