Re: PosgreSQL backend process crashed with signal 9

Поиск
Список
Период
Сортировка
От Pavel Suderevsky
Тема Re: PosgreSQL backend process crashed with signal 9
Дата
Msg-id CAEBTBzuuSR7X70sK_iVmVLg-yhOAABG3Yrbdj+jhN=-yvx9eYQ@mail.gmail.com
обсуждение исходный текст
Ответ на PosgreSQL backend process crashed with signal 9  (Aleksey Romanov <drednout.by@gmail.com>)
Список pgsql-bugs
I've faced similar behaviour in much more simple situation. It crashed on
simple nested loop. And all other child postmaster processes were
terminated likewise.

Repoduced on:
1. 9.5.1 + Ubuntu 14.04.4 LTS
2. 9.5.1 + CentOS 7.2.1511
2. 9.5.2 + CentOS 7.2.1511

DDL:

> platform_db=# \d node_statuses
>                          Table "platform_db.node_statuses"
>         Column         |          Type           |            Modifiers
>
>
> -----------------------+-------------------------+---------------------------------
>  node_id               | integer                 | not null
>  status_id             | integer                 | not null
>  msg                   | character varying(1024) | default NULL::character
> varying
>  node_status_timestamp | bigint                  | not null
> Indexes:
>     "NODE_STATUSES_NODE_ID_FK" btree (node_id)
>     "NODE_STATUSES_NODE_ID_TIMESTAMP_IDX" btree (node_id,
> node_status_timestamp)
>     "NODE_STATUSES_TIMESTAMP_IDX" btree (node_status_timestamp)
>
> platform_db=# \d node_status_values
>    Table "platform_db.node_status_values"
>         Column        |  Type   | Modifiers
> ----------------------+---------+-----------
>  node_id              | integer | not null
>  status_id            | integer | not null
>  node_value_timestamp | bigint  | not null
> Indexes:
>     "NODE_STATUS_VALUES_NODE_ID_FK" btree (node_id)
>     "NODE_STATUS_VALUES_NODE_ID_TIMESTAMP_IDX" btree (node_id,
> node_value_timestamp)

SQL:

> SELECT * FROM node_statuses ns INNER JOIN node_status_values nsv ON
> ns.node_id = nsv.node_id where ns.node_id = 1147;


Postgres log:

> 2016-04-05 18:37:31 MSK  LOG:  server process (PID 2848) was terminated by
> signal 9: Killed
> 2016-04-05 18:37:31 MSK  DETAIL:  Failed process was running: COMMIT
> 2016-04-05 18:37:32 MSK  LOG:  terminating any other active server
> processes
> 2016-04-05 18:37:32 MSK  WARNING:  terminating connection because of crash
> of another server process
> 2016-04-05 18:37:32 MSK  DETAIL:  The postmaster has commanded this server
> process to roll back the current transaction and exit, because another
> server process exited abnormally and possib
> ly corrupted shared memory.
> 2016-04-05 18:37:32 MSK  HINT:  In a moment you should be able to
> reconnect to the database and repeat your command.
> 2016-04-05 18:37:32 MSK 127.0.0.1 platform_db@platform_db [unknown] %
> WARNING:  terminating connection because of crash of another server process
> 2016-04-05 18:37:32 MSK 127.0.0.1 platform_db@platform_db [unknown] %
> DETAIL:  The postmaster has commanded this server process to roll back the
> current transaction and exit, because another
> server process exited abnormally and possibly corrupted shared memory.
> 2016-04-05 18:37:32 MSK 127.0.0.1 platform_db@platform_db [unknown] %
> HINT:  In a moment you should be able to reconnect to the database and
> repeat your command.
> 2016-04-05 18:37:32 MSK ::1 zabbix@zabbix [unknown] % WARNING:
>  terminating connection because of crash of another server process
> 2016-04-05 18:37:32 MSK ::1 zabbix@zabbix [unknown] % DETAIL:  The
> postmaster has commanded this server process to roll back the current
> transaction and exit, because another server process
>  exited abnormally and possibly corrupted shared memory.
> 2016-04-05 18:37:32 MSK ::1 zabbix@zabbix [unknown] % HINT:  In a moment
> you should be able to reconnect to the database and repeat your command.
> 2016-04-05 18:37:32 MSK ::1 zabbix@zabbix [unknown] % WARNING:
>  terminating connection because of crash of another server process
> 2016-04-05 18:37:32 MSK ::1 zabbix@zabbix [unknown] % DETAIL:  The
> postmaster has commanded this server process to roll back the current
> transaction and exit, because another server process
>  exited abnormally and possibly corrupted shared memory.
> 2016-04-05 18:37:37 MSK  LOG:  statistics collector process (PID 1012) was
> terminated by signal 9: Killed
> 2016-04-05 18:37:37 MSK  LOG:  all server processes terminated;
> reinitializing
> 2016-04-05 18:37:39 MSK  LOG:  database system was interrupted; last known
> up at 2016-04-05 18:31:44 MSK
> 2016-04-05 18:37:39 MSK ::1 zabbix@zabbix [unknown] % FATAL:  the
> database system is in recovery mode
> 2016-04-05 18:37:41 MSK  LOG:  database system was not properly shut down;
> automatic recovery in progress
> 2016-04-05 18:37:41 MSK  LOG:  redo starts at 298/AB146828
> 2016-04-05 18:37:41 MSK ::1 zabbix@zabbix [unknown] % FATAL:  the
> database system is in recovery mode
> 2016-04-05 18:37:44 MSK [local] postgres@platform_db [unknown] % FATAL:
>  the database system is in recovery mode
> 2016-04-05 18:37:46 MSK  LOG:  invalid record length at 298/C07ACA78
> 2016-04-05 18:37:46 MSK  LOG:  redo done at 298/C07ACA50
> 2016-04-05 18:37:46 MSK  LOG:  last completed transaction was at log time
> 2016-04-05 18:37:28.455638+03
> 2016-04-05 18:37:48 MSK ::1 zabbix@zabbix [unknown] % FATAL:  the
> database system is in recovery mode
> 2016-04-05 18:38:08 MSK  LOG:  MultiXact member wraparound protections are
> now enabled
> 2016-04-05 18:38:08 MSK  LOG:  autovacuum launcher started
> 2016-04-05 18:38:08 MSK  LOG:  database system is ready to accept
> connections

In CentOS just:

> 2016-04-06 13:58:58 GMT [local] postgres@platform_db psql % LOG:  could
> not send data to client: Broken pipe
> 2016-04-06 13:58:58 GMT [local] postgres@platform_db psql % STATEMENT:
>  SELECT * FROM node_statuses ns INNER JOIN node_status_values nsv ON
> ns.node_id = nsv.node_id where ns.node_id = 1147;
> 2016-04-06 13:58:58 GMT [local] postgres@platform_db psql % FATAL:
>  connection to client lost


System log [ubuntu]:

> Apr  5 18:37:30 db_server kernel: [3905852.822122] postgres invoked
> oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0
> Apr  5 18:37:30 db_server kernel: [3905852.822131] postgres cpuset=/
> mems_allowed=0-1
> Apr  5 18:37:30 db_server kernel: [3905852.822136] CPU: 3 PID: 9191 Comm:
> postgres Not tainted 3.13.0-77-generic #121-Ubuntu
> Apr  5 18:37:30 db_server kernel: [3905852.822139] Hardware name:
> Supermicro H8DMR-82/H8DMR-82, BIOS 080014  12/15/2006
> ...
> Apr  5 18:37:31 db_server kernel: [3905854.860083] Out of memory: Kill
> process 2847 (postgres) score 391 or sacrifice child
> Apr  5 18:37:31 db_server kernel: [3905854.860201] Killed process 2847
> (postgres) total-vm:6613344kB, anon-rss:34224kB, file-rss:6387036kB


Counts:

> platform_db=# SELECT count(*) FROM node_statuses ns where node_id = 1147;
>  count
> -------
>   6819
> (1 row)
> platform_db=# SELECT count(*) FROM node_status_values nsv where node_id =
> 1147;
>  count
> -------
>   6814
> (1 row)
> platform_db=# SELECT count(*) FROM node_statuses ns INNER JOIN
> node_status_values nsv ON ns.node_id = nsv.node_id where ns.node_id = 1147;
>
>   count
> ----------
>  46423776
> (1 row)


Query plan:

> Nested Loop  (cost=477.14..570040.76 rows=45261752 width=53) (actual
> time=3.223..252614.947 rows=46382904 loops=1)
>    ->  Bitmap Heap Scan on node_statuses ns  (cost=241.60..2336.38
> rows=6862 width=37) (actual time=1.724..21.248 rows=6813 loops=1)
>          Recheck Cond: (node_id = 1147)
>          Heap Blocks: exact=1397
>          ->  Bitmap Index Scan on "NODE_STATUSES_NODE_ID_FK"
>  (cost=0.00..239.89 rows=6862 width=0) (actual time=1.472..1.472 rows=6813
> loops=1)
>                Index Cond: (node_id = 1147)
>    ->  Materialize  (cost=235.54..1948.97 rows=6596 width=16) (actual
> time=0.002..12.092 rows=6808 loops=6813)
>          ->  Bitmap Heap Scan on node_status_values nsv
>  (cost=235.54..1915.99 rows=6596 width=16) (actual time=1.471..17.282
> rows=6808 loops=1)
>                Recheck Cond: (node_id = 1147)
>                Heap Blocks: exact=1357
>                ->  Bitmap Index Scan on "NODE_STATUS_VALUES_NODE_ID_FK"
>  (cost=0.00..233.89 rows=6596 width=0) (actual time=1.238..1.238 rows=6808
> loops=1)
>                      Index Cond: (node_id = 1147)
>  Planning time: 0.210 ms
>  Execution time: 334897.948 ms


Memory on the host:

>              total       used       free     shared    buffers     cached
> Mem:           15G        15G       453M       2.7G       195M        12G
> -/+ buffers/cache:       2.2G        13G
> Swap:           0B         0B         0B

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Pavel Suderevsky
Дата:
Сообщение: Re: PosgreSQL backend process crashed with signal 9
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: Wrong result of
персональных данных