Re: PosgreSQL backend process crashed with signal 9
От | Pavel Suderevsky |
---|---|
Тема | Re: PosgreSQL backend process crashed with signal 9 |
Дата | |
Msg-id | CAEBTBzuuSR7X70sK_iVmVLg-yhOAABG3Yrbdj+jhN=-yvx9eYQ@mail.gmail.com обсуждение исходный текст |
Ответ на | PosgreSQL backend process crashed with signal 9 (Aleksey Romanov <drednout.by@gmail.com>) |
Список | pgsql-bugs |
I've faced similar behaviour in much more simple situation. It crashed on simple nested loop. And all other child postmaster processes were terminated likewise. Repoduced on: 1. 9.5.1 + Ubuntu 14.04.4 LTS 2. 9.5.1 + CentOS 7.2.1511 2. 9.5.2 + CentOS 7.2.1511 DDL: > platform_db=# \d node_statuses > Table "platform_db.node_statuses" > Column | Type | Modifiers > > > -----------------------+-------------------------+--------------------------------- > node_id | integer | not null > status_id | integer | not null > msg | character varying(1024) | default NULL::character > varying > node_status_timestamp | bigint | not null > Indexes: > "NODE_STATUSES_NODE_ID_FK" btree (node_id) > "NODE_STATUSES_NODE_ID_TIMESTAMP_IDX" btree (node_id, > node_status_timestamp) > "NODE_STATUSES_TIMESTAMP_IDX" btree (node_status_timestamp) > > platform_db=# \d node_status_values > Table "platform_db.node_status_values" > Column | Type | Modifiers > ----------------------+---------+----------- > node_id | integer | not null > status_id | integer | not null > node_value_timestamp | bigint | not null > Indexes: > "NODE_STATUS_VALUES_NODE_ID_FK" btree (node_id) > "NODE_STATUS_VALUES_NODE_ID_TIMESTAMP_IDX" btree (node_id, > node_value_timestamp) SQL: > SELECT * FROM node_statuses ns INNER JOIN node_status_values nsv ON > ns.node_id = nsv.node_id where ns.node_id = 1147; Postgres log: > 2016-04-05 18:37:31 MSK LOG: server process (PID 2848) was terminated by > signal 9: Killed > 2016-04-05 18:37:31 MSK DETAIL: Failed process was running: COMMIT > 2016-04-05 18:37:32 MSK LOG: terminating any other active server > processes > 2016-04-05 18:37:32 MSK WARNING: terminating connection because of crash > of another server process > 2016-04-05 18:37:32 MSK DETAIL: The postmaster has commanded this server > process to roll back the current transaction and exit, because another > server process exited abnormally and possib > ly corrupted shared memory. > 2016-04-05 18:37:32 MSK HINT: In a moment you should be able to > reconnect to the database and repeat your command. > 2016-04-05 18:37:32 MSK 127.0.0.1 platform_db@platform_db [unknown] % > WARNING: terminating connection because of crash of another server process > 2016-04-05 18:37:32 MSK 127.0.0.1 platform_db@platform_db [unknown] % > DETAIL: The postmaster has commanded this server process to roll back the > current transaction and exit, because another > server process exited abnormally and possibly corrupted shared memory. > 2016-04-05 18:37:32 MSK 127.0.0.1 platform_db@platform_db [unknown] % > HINT: In a moment you should be able to reconnect to the database and > repeat your command. > 2016-04-05 18:37:32 MSK ::1 zabbix@zabbix [unknown] % WARNING: > terminating connection because of crash of another server process > 2016-04-05 18:37:32 MSK ::1 zabbix@zabbix [unknown] % DETAIL: The > postmaster has commanded this server process to roll back the current > transaction and exit, because another server process > exited abnormally and possibly corrupted shared memory. > 2016-04-05 18:37:32 MSK ::1 zabbix@zabbix [unknown] % HINT: In a moment > you should be able to reconnect to the database and repeat your command. > 2016-04-05 18:37:32 MSK ::1 zabbix@zabbix [unknown] % WARNING: > terminating connection because of crash of another server process > 2016-04-05 18:37:32 MSK ::1 zabbix@zabbix [unknown] % DETAIL: The > postmaster has commanded this server process to roll back the current > transaction and exit, because another server process > exited abnormally and possibly corrupted shared memory. > 2016-04-05 18:37:37 MSK LOG: statistics collector process (PID 1012) was > terminated by signal 9: Killed > 2016-04-05 18:37:37 MSK LOG: all server processes terminated; > reinitializing > 2016-04-05 18:37:39 MSK LOG: database system was interrupted; last known > up at 2016-04-05 18:31:44 MSK > 2016-04-05 18:37:39 MSK ::1 zabbix@zabbix [unknown] % FATAL: the > database system is in recovery mode > 2016-04-05 18:37:41 MSK LOG: database system was not properly shut down; > automatic recovery in progress > 2016-04-05 18:37:41 MSK LOG: redo starts at 298/AB146828 > 2016-04-05 18:37:41 MSK ::1 zabbix@zabbix [unknown] % FATAL: the > database system is in recovery mode > 2016-04-05 18:37:44 MSK [local] postgres@platform_db [unknown] % FATAL: > the database system is in recovery mode > 2016-04-05 18:37:46 MSK LOG: invalid record length at 298/C07ACA78 > 2016-04-05 18:37:46 MSK LOG: redo done at 298/C07ACA50 > 2016-04-05 18:37:46 MSK LOG: last completed transaction was at log time > 2016-04-05 18:37:28.455638+03 > 2016-04-05 18:37:48 MSK ::1 zabbix@zabbix [unknown] % FATAL: the > database system is in recovery mode > 2016-04-05 18:38:08 MSK LOG: MultiXact member wraparound protections are > now enabled > 2016-04-05 18:38:08 MSK LOG: autovacuum launcher started > 2016-04-05 18:38:08 MSK LOG: database system is ready to accept > connections In CentOS just: > 2016-04-06 13:58:58 GMT [local] postgres@platform_db psql % LOG: could > not send data to client: Broken pipe > 2016-04-06 13:58:58 GMT [local] postgres@platform_db psql % STATEMENT: > SELECT * FROM node_statuses ns INNER JOIN node_status_values nsv ON > ns.node_id = nsv.node_id where ns.node_id = 1147; > 2016-04-06 13:58:58 GMT [local] postgres@platform_db psql % FATAL: > connection to client lost System log [ubuntu]: > Apr 5 18:37:30 db_server kernel: [3905852.822122] postgres invoked > oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 > Apr 5 18:37:30 db_server kernel: [3905852.822131] postgres cpuset=/ > mems_allowed=0-1 > Apr 5 18:37:30 db_server kernel: [3905852.822136] CPU: 3 PID: 9191 Comm: > postgres Not tainted 3.13.0-77-generic #121-Ubuntu > Apr 5 18:37:30 db_server kernel: [3905852.822139] Hardware name: > Supermicro H8DMR-82/H8DMR-82, BIOS 080014 12/15/2006 > ... > Apr 5 18:37:31 db_server kernel: [3905854.860083] Out of memory: Kill > process 2847 (postgres) score 391 or sacrifice child > Apr 5 18:37:31 db_server kernel: [3905854.860201] Killed process 2847 > (postgres) total-vm:6613344kB, anon-rss:34224kB, file-rss:6387036kB Counts: > platform_db=# SELECT count(*) FROM node_statuses ns where node_id = 1147; > count > ------- > 6819 > (1 row) > platform_db=# SELECT count(*) FROM node_status_values nsv where node_id = > 1147; > count > ------- > 6814 > (1 row) > platform_db=# SELECT count(*) FROM node_statuses ns INNER JOIN > node_status_values nsv ON ns.node_id = nsv.node_id where ns.node_id = 1147; > > count > ---------- > 46423776 > (1 row) Query plan: > Nested Loop (cost=477.14..570040.76 rows=45261752 width=53) (actual > time=3.223..252614.947 rows=46382904 loops=1) > -> Bitmap Heap Scan on node_statuses ns (cost=241.60..2336.38 > rows=6862 width=37) (actual time=1.724..21.248 rows=6813 loops=1) > Recheck Cond: (node_id = 1147) > Heap Blocks: exact=1397 > -> Bitmap Index Scan on "NODE_STATUSES_NODE_ID_FK" > (cost=0.00..239.89 rows=6862 width=0) (actual time=1.472..1.472 rows=6813 > loops=1) > Index Cond: (node_id = 1147) > -> Materialize (cost=235.54..1948.97 rows=6596 width=16) (actual > time=0.002..12.092 rows=6808 loops=6813) > -> Bitmap Heap Scan on node_status_values nsv > (cost=235.54..1915.99 rows=6596 width=16) (actual time=1.471..17.282 > rows=6808 loops=1) > Recheck Cond: (node_id = 1147) > Heap Blocks: exact=1357 > -> Bitmap Index Scan on "NODE_STATUS_VALUES_NODE_ID_FK" > (cost=0.00..233.89 rows=6596 width=0) (actual time=1.238..1.238 rows=6808 > loops=1) > Index Cond: (node_id = 1147) > Planning time: 0.210 ms > Execution time: 334897.948 ms Memory on the host: > total used free shared buffers cached > Mem: 15G 15G 453M 2.7G 195M 12G > -/+ buffers/cache: 2.2G 13G > Swap: 0B 0B 0B
В списке pgsql-bugs по дате отправления:
Предыдущее
От: Pavel SuderevskyДата:
Сообщение: Re: PosgreSQL backend process crashed with signal 9