Обсуждение: Database system was interrupted. Possible reasons for a database to suddenly stop accepting connections?

Поиск
Список
Период
Сортировка

Database system was interrupted. Possible reasons for a database to suddenly stop accepting connections?

От
"Buzenets, Yuriy (GE Renewable Energy, consultant)"
Дата:

Some time ago the database at my work suddenly stopped accepting connections. In the logs there was a message “the database system was interrupted; last known up at 2020-10-29 12:03:16 PDT”, followed by a lot of “the database system is starting up” messages. It seems like the database tried to recover from whatever failure occurred but recovery was not successful until the database process was shut down and restarted. The latest error message before “system was interrupted” occurs 13 minutes before so it’s not clear to us what could cause this issue and why the database process had to be restarted. If anyone could help us understand what might’ve been the cause, we would greatly appreciate it. We’re using PostgreSQL 9.4 on CentOS 6.8, we know it reached EOL and plan to upgrade to a newer version soon. I’ve also attached the log file from the time the database system was unavailable if it helps to figure out the cause of the outage

 

Thanks,

Yuriy Buzenets

Вложения
On 11/9/20 5:53 AM, Buzenets, Yuriy (GE Renewable Energy, consultant) wrote:
> Some time ago the database at my work suddenly stopped accepting 
> connections. In the logs there was a message “the database system was 
> interrupted; last known up at 2020-10-29 12:03:16 PDT”, followed by a 
> lot of “the database system is starting up” messages. It seems like the 
> database tried to recover from whatever failure occurred but recovery 
> was not successful until the database process was shut down and 
> restarted. The latest error message before “system was interrupted” 
> occurs 13 minutes before so it’s not clear to us what could cause this 
> issue and why the database process had to be restarted. If anyone could 
> help us understand what might’ve been the cause, we would greatly 
> appreciate it. We’re using PostgreSQL 9.4 on CentOS 6.8, we know it 
> reached EOL and plan to upgrade to a newer version soon. I’ve also 
> attached the log file from the time the database system was unavailable 
> if it helps to figure out the cause of the outage

There are a lot of 'terminating connection due to administrator command' 
lines. How many connections where/are being used to this server?

> 
> Thanks,
> 
> Yuriy Buzenets
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



On Mon, 2020-11-09 at 13:53 +0000, Buzenets, Yuriy (GE Renewable Energy, consultant) wrote:

If I delete all the noise from the log, file, this remains:

> < 2020-10-29 11:51:59.345 PDT >STATEMENT:  SELECT NULL AS TABLE_CAT, n.nspname AS TABLE_SCHEM, [...]

> < 2020-10-29 12:04:09.700 PDT >LOG:  database system was interrupted; last known up at 2020-10-29 12:03:16 PDT
> < 2020-10-29 12:04:09.812 PDT >LOG:  database system was not properly shut down; automatic recovery in progress
> < 2020-10-29 12:04:09.817 PDT >LOG:  redo starts at 43C/E887E1C8
> < 2020-10-29 12:04:09.916 PDT >LOG:  record with zero length at 43C/E8ED94D8
> < 2020-10-29 12:04:09.917 PDT >LOG:  redo done at 43C/E8ED94A8
> < 2020-10-29 12:04:09.917 PDT >LOG:  last completed transaction was at log time 2020-10-29 12:03:46.422937-07
> < 2020-10-29 12:04:09.938 PDT >LOG:  MultiXact member wraparound protections are now enabled
> < 2020-10-29 12:04:09.941 PDT >LOG:  autovacuum launcher started
> < 2020-10-29 12:04:09.941 PDT >LOG:  database system is ready to accept connections

> < 2020-10-29 12:26:44.453 PDT >LOG:  received fast shutdown request
> < 2020-10-29 12:26:44.453 PDT >LOG:  aborting any active transactions
> < 2020-10-29 12:26:44.544 PDT >LOG:  shutting down
> < 2020-10-29 12:26:44.587 PDT >LOG:  database system is shut down

> < 2020-10-29 12:27:13.156 PDT >LOG:  database system was shut down at 2020-10-29 12:26:44 PDT
> < 2020-10-29 12:27:13.178 PDT >LOG:  MultiXact member wraparound protections are now enabled
> < 2020-10-29 12:27:13.179 PDT >LOG:  autovacuum launcher started
> < 2020-10-29 12:27:13.179 PDT >LOG:  database system is ready to accept connections
> < 2020-10-29 15:02:42.588 PDT >LOG:  received fast shutdown request
> < 2020-10-29 15:02:42.588 PDT >LOG:  aborting any active transactions
> < 2020-10-29 15:02:42.712 PDT >LOG:  shutting down
> < 2020-10-29 15:02:42.732 PDT >LOG:  database system is shut down

> < 2020-10-29 15:02:43.711 PDT >LOG:  database system was shut down at 2020-10-29 15:02:42 PDT
> < 2020-10-29 15:02:43.714 PDT >LOG:  MultiXact member wraparound protections are now enabled
> < 2020-10-29 15:02:43.715 PDT >LOG:  autovacuum launcher started
> < 2020-10-29 15:02:43.716 PDT >LOG:  database system is ready to accept connections

There was a operating system crash or a "kill -9" somewhere between 11:51 and 12:04 on Oct. 29.

PostgreSQL got restarted and recovered at 12:04.

Then there were two more clean shutdowns and restarts at 12:26 and 15:02.

Your problem is probably the first crash.  If you don't have any indication that the machine crashed,
look into the kernel log - perkaps the out-of-memory killer struck (assuming this is Linux).

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com