Обсуждение: psql 9.3 automatic recovery in progress

Поиск
Список
Период
Сортировка

psql 9.3 automatic recovery in progress

От
Periko Support
Дата:
I got some issues with my DB under ubuntu 14.x.
PSQL 9.3, odoo 7.x.

This machine is under KVM with centos  6.x

It has a Raid1 with ssd drives only for this vm.

I detect some unexpected shutdows, see this lines:

2016-09-12 08:59:25 PDT ERROR:  missing FROM-clause entry for table
"rp" at character 73
2016-09-12 08:59:25 PDT STATEMENT:  select
pp.default_code,pc.product_code,pp.name_template,pc.product_name,rp.name
from product_product pp inner join product_customer_code pc on
pc.product_id=pp.id
2016-09-12 08:59:26 PDT LOG:  connection received: host=192.168.2.153 port=59335
2016-09-12 08:59:26 PDT LOG:  connection authorized: user=openerp
database=Mueblex
2016-09-12 09:00:01 PDT LOG:  connection received: host=::1 port=43536
2016-09-12 09:00:01 PDT LOG:  connection authorized: user=openerp
database=template1
2016-09-12 09:00:01 PDT LOG:  server process (PID 23958) was
terminated by signal 9: Killed
2016-09-12 09:00:01 PDT DETAIL:  Failed process was running: select
pp.default_code,pc.product_code,pp.name_template,pc.product_name,rp.name
from product_product pp inner join product_customer_code pc on
pc.product_id=pp.id
2016-09-12 09:00:01 PDT LOG:  terminating any other active server processes
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT LOG:  archiver process (PID 989) exited with exit code 1
2016-09-12 09:00:01 PDT LOG:  all server processes terminated; reinitializing
2016-09-12 09:00:03 PDT LOG:  database system was interrupted; last
known up at 2016-09-12 08:59:27 PDT
2016-09-12 09:00:07 PDT LOG:  database system was not properly shut
down; automatic recovery in progress
2016-09-12 09:00:07 PDT LOG:  redo starts at 29A/AF000028
2016-09-12 09:00:07 PDT LOG:  record with zero length at 29A/BC001958
2016-09-12 09:00:07 PDT LOG:  redo done at 29A/BC001928
2016-09-12 09:00:07 PDT LOG:  last completed transaction was at log
time 2016-09-12 09:00:01.768271-07
2016-09-12 09:00:08 PDT LOG:  MultiXact member wraparound protections
are now enabled
2016-09-12 09:00:08 PDT LOG:  database system is ready to accept connections
2016-09-12 09:00:08 PDT LOG:  autovacuum launcher started
2016-09-12 09:00:15 PDT LOG:  connection received: host=127.0.0.1 port=45508

Latter another one:
2016-09-12 09:51:32 PDT ERROR:  duplicate key value violates unique
constraint "upc_scan_plant_uniq"
2016-09-12 09:51:32 PDT DETAIL:  Key (name, plant_id)=(2016-09-12, 6)
already exists.
2016-09-12 09:51:32 PDT STATEMENT:  insert into "upc_scan"
(id,"plant_id","state","name",create_uid,create_date,write_uid,write_date)
values (1438,6,'draft','2016-09-12',87,(now() at time zone
'UTC'),87,(now() at time zone 'UTC'))
2016-09-12 09:51:40 PDT ERROR:  duplicate key value violates unique
constraint "upc_scan_plant_uniq"
2016-09-12 09:51:40 PDT DETAIL:  Key (name, plant_id)=(2016-09-12, 6)
already exists.
2016-09-12 09:51:40 PDT STATEMENT:  insert into "upc_scan"
(id,"plant_id","state","name",create_uid,create_date,write_uid,write_date)
values (1439,6,'draft','2016-09-12',87,(now() at time zone
'UTC'),87,(now() at time zone 'UTC'))
2016-09-12 09:51:43 PDT ERROR:  duplicate key value violates unique
constraint "upc_scan_plant_uniq"
2016-09-12 09:51:43 PDT DETAIL:  Key (name, plant_id)=(2016-09-12, 6)
already exists.
2016-09-12 09:51:43 PDT STATEMENT:  insert into "upc_scan"
(id,"plant_id","state","name",create_uid,create_date,write_uid,write_date)
values (1440,6,'draft','2016-09-12',87,(now() at time zone
'UTC'),87,(now() at time zone 'UTC'))
2016-09-12 10:00:01 PDT LOG:  connection received: host=::1 port=43667
2016-09-12 10:00:01 PDT LOG:  connection authorized: user=openerp
database=template1
2016-09-12 10:00:01 PDT LOG:  server process (PID 30766) was
terminated by signal 9: Killed
2016-09-12 10:00:01 PDT DETAIL:  Failed process was running: select
pp.default_code,pc.product_code,pp.name_template,pc.product_name,rp.name
from product_product pp inner join product_customer_code pc on
pc.product_id=pp.id
2016-09-12 10:00:01 PDT LOG:  terminating any other active server processes
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT LOG:  archiver process (PID 29336) exited with
exit code 1
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT LOG:  all server processes terminated; reinitializing
2016-09-12 10:00:02 PDT LOG:  database system was interrupted; last
known up at 2016-09-12 09:58:45 PDT
2016-09-12 10:00:06 PDT LOG:  database system was not properly shut
down; automatic recovery in progress
2016-09-12 10:00:06 PDT LOG:  redo starts at 29C/DC000028
2016-09-12 10:00:06 PDT LOG:  unexpected pageaddr 29B/ED01A000 in log
segment 000000010000029C000000F2, offset 106496
2016-09-12 10:00:06 PDT LOG:  redo done at 29C/F2018D90
2016-09-12 10:00:06 PDT LOG:  last completed transaction was at log
time 2016-09-12 10:00:00.558978-07
2016-09-12 10:00:07 PDT LOG:  MultiXact member wraparound protections
are now enabled
2016-09-12 10:00:07 PDT LOG:  database system is ready to accept connections
2016-09-12 10:00:07 PDT LOG:  autovacuum launcher started
2016-09-12 10:00:11 PDT LOG:  connection received: host=127.0.0.1 port=45639

Other one:

2016-09-12 15:00:01 PDT LOG:  server process (PID 22030) was
terminated by signal 9: Killed
2016-09-12 15:00:01 PDT DETAIL:  Failed process was running: SELECT
"name", "model", "description", "month" FROM "etiquetas_temp"
2016-09-12 15:00:01 PDT LOG:  terminating any other active server processes
2016-09-12 15:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 15:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 15:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 15:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 15:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 15:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 15:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 15:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 15:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 15:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 15:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 15:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 15:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 15:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 15:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 15:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 15:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 15:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 15:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 15:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 15:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 15:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 15:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 15:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 15:00:01 PDT LOG:  archiver process (PID 3254) exited with
exit code 1
2016-09-12 15:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 15:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 15:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 15:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 15:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 15:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 15:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 15:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 15:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 15:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 15:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 15:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 15:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 15:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 15:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 15:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 15:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 15:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 15:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 15:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 15:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 15:00:02 PDT LOG:  all server processes terminated; reinitializing
2016-09-12 15:00:02 PDT LOG:  database system was interrupted; last
known up at 2016-09-12 14:59:55 PDT
2016-09-12 15:00:06 PDT LOG:  database system was not properly shut
down; automatic recovery in progress
2016-09-12 15:00:07 PDT LOG:  redo starts at 2A8/69000028
2016-09-12 15:00:07 PDT LOG:  record with zero length at 2A8/7201B4F8
2016-09-12 15:00:07 PDT LOG:  redo done at 2A8/7201B4C8
2016-09-12 15:00:07 PDT LOG:  last completed transaction was at log
time 2016-09-12 15:00:01.664762-07
2016-09-12 15:00:08 PDT LOG:  MultiXact member wraparound protections
are now enabled
2016-09-12 15:00:08 PDT LOG:  database system is ready to accept connections
2016-09-12 15:00:08 PDT LOG:  autovacuum launcher started

What I can see is not because server load, maximum idle is 79%

free
             total       used       free     shared    buffers     cached
Mem:      82493268   82027060     466208    3157228     136084   77526460
-/+ buffers/cache:    4364516   78128752
Swap:      1000444       9540     990904

Little use of swap, I will add more memory next week.

In your experience this looks like HW issue?

Thanks for your time!!!


Re: psql 9.3 automatic recovery in progress

От
Periko Support
Дата:
I want to add my server load normally, please see attachment, thanks.




On Mon, Oct 10, 2016 at 9:24 AM, Periko Support <pheriko.support@gmail.com> wrote:
I got some issues with my DB under ubuntu 14.x.
PSQL 9.3, odoo 7.x.

This machine is under KVM with centos  6.x

It has a Raid1 with ssd drives only for this vm.

I detect some unexpected shutdows, see this lines:

2016-09-12 08:59:25 PDT ERROR:  missing FROM-clause entry for table
"rp" at character 73
2016-09-12 08:59:25 PDT STATEMENT:  select
pp.default_code,pc.product_code,pp.name_template,pc.product_name,rp.name
from product_product pp inner join product_customer_code pc on
pc.product_id=pp.id
2016-09-12 08:59:26 PDT LOG:  connection received: host=192.168.2.153 port=59335
2016-09-12 08:59:26 PDT LOG:  connection authorized: user=openerp
database=Mueblex
2016-09-12 09:00:01 PDT LOG:  connection received: host=::1 port=43536
2016-09-12 09:00:01 PDT LOG:  connection authorized: user=openerp
database=template1
2016-09-12 09:00:01 PDT LOG:  server process (PID 23958) was
terminated by signal 9: Killed
2016-09-12 09:00:01 PDT DETAIL:  Failed process was running: select
pp.default_code,pc.product_code,pp.name_template,pc.product_name,rp.name
from product_product pp inner join product_customer_code pc on
pc.product_id=pp.id
2016-09-12 09:00:01 PDT LOG:  terminating any other active server processes
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 09:00:01 PDT LOG:  archiver process (PID 989) exited with exit code 1
2016-09-12 09:00:01 PDT LOG:  all server processes terminated; reinitializing
2016-09-12 09:00:03 PDT LOG:  database system was interrupted; last
known up at 2016-09-12 08:59:27 PDT
2016-09-12 09:00:07 PDT LOG:  database system was not properly shut
down; automatic recovery in progress
2016-09-12 09:00:07 PDT LOG:  redo starts at 29A/AF000028
2016-09-12 09:00:07 PDT LOG:  record with zero length at 29A/BC001958
2016-09-12 09:00:07 PDT LOG:  redo done at 29A/BC001928
2016-09-12 09:00:07 PDT LOG:  last completed transaction was at log
time 2016-09-12 09:00:01.768271-07
2016-09-12 09:00:08 PDT LOG:  MultiXact member wraparound protections
are now enabled
2016-09-12 09:00:08 PDT LOG:  database system is ready to accept connections
2016-09-12 09:00:08 PDT LOG:  autovacuum launcher started
2016-09-12 09:00:15 PDT LOG:  connection received: host=127.0.0.1 port=45508

Latter another one:
2016-09-12 09:51:32 PDT ERROR:  duplicate key value violates unique
constraint "upc_scan_plant_uniq"
2016-09-12 09:51:32 PDT DETAIL:  Key (name, plant_id)=(2016-09-12, 6)
already exists.
2016-09-12 09:51:32 PDT STATEMENT:  insert into "upc_scan"
(id,"plant_id","state","name",create_uid,create_date,write_uid,write_date)
values (1438,6,'draft','2016-09-12',87,(now() at time zone
'UTC'),87,(now() at time zone 'UTC'))
2016-09-12 09:51:40 PDT ERROR:  duplicate key value violates unique
constraint "upc_scan_plant_uniq"
2016-09-12 09:51:40 PDT DETAIL:  Key (name, plant_id)=(2016-09-12, 6)
already exists.
2016-09-12 09:51:40 PDT STATEMENT:  insert into "upc_scan"
(id,"plant_id","state","name",create_uid,create_date,write_uid,write_date)
values (1439,6,'draft','2016-09-12',87,(now() at time zone
'UTC'),87,(now() at time zone 'UTC'))
2016-09-12 09:51:43 PDT ERROR:  duplicate key value violates unique
constraint "upc_scan_plant_uniq"
2016-09-12 09:51:43 PDT DETAIL:  Key (name, plant_id)=(2016-09-12, 6)
already exists.
2016-09-12 09:51:43 PDT STATEMENT:  insert into "upc_scan"
(id,"plant_id","state","name",create_uid,create_date,write_uid,write_date)
values (1440,6,'draft','2016-09-12',87,(now() at time zone
'UTC'),87,(now() at time zone 'UTC'))
2016-09-12 10:00:01 PDT LOG:  connection received: host=::1 port=43667
2016-09-12 10:00:01 PDT LOG:  connection authorized: user=openerp
database=template1
2016-09-12 10:00:01 PDT LOG:  server process (PID 30766) was
terminated by signal 9: Killed
2016-09-12 10:00:01 PDT DETAIL:  Failed process was running: select
pp.default_code,pc.product_code,pp.name_template,pc.product_name,rp.name
from product_product pp inner join product_customer_code pc on
pc.product_id=pp.id
2016-09-12 10:00:01 PDT LOG:  terminating any other active server processes
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT LOG:  archiver process (PID 29336) exited with
exit code 1
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 10:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 10:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 10:00:01 PDT LOG:  all server processes terminated; reinitializing
2016-09-12 10:00:02 PDT LOG:  database system was interrupted; last
known up at 2016-09-12 09:58:45 PDT
2016-09-12 10:00:06 PDT LOG:  database system was not properly shut
down; automatic recovery in progress
2016-09-12 10:00:06 PDT LOG:  redo starts at 29C/DC000028
2016-09-12 10:00:06 PDT LOG:  unexpected pageaddr 29B/ED01A000 in log
segment 000000010000029C000000F2, offset 106496
2016-09-12 10:00:06 PDT LOG:  redo done at 29C/F2018D90
2016-09-12 10:00:06 PDT LOG:  last completed transaction was at log
time 2016-09-12 10:00:00.558978-07
2016-09-12 10:00:07 PDT LOG:  MultiXact member wraparound protections
are now enabled
2016-09-12 10:00:07 PDT LOG:  database system is ready to accept connections
2016-09-12 10:00:07 PDT LOG:  autovacuum launcher started
2016-09-12 10:00:11 PDT LOG:  connection received: host=127.0.0.1 port=45639

Other one:

2016-09-12 15:00:01 PDT LOG:  server process (PID 22030) was
terminated by signal 9: Killed
2016-09-12 15:00:01 PDT DETAIL:  Failed process was running: SELECT
"name", "model", "description", "month" FROM "etiquetas_temp"
2016-09-12 15:00:01 PDT LOG:  terminating any other active server processes
2016-09-12 15:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 15:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 15:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 15:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 15:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 15:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 15:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 15:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 15:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 15:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 15:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 15:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 15:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 15:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 15:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 15:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 15:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 15:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 15:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 15:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 15:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 15:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 15:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 15:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 15:00:01 PDT LOG:  archiver process (PID 3254) exited with
exit code 1
2016-09-12 15:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 15:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 15:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 15:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 15:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 15:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 15:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 15:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 15:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 15:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 15:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 15:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 15:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 15:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 15:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 15:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 15:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 15:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 15:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-09-12 15:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-09-12 15:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-09-12 15:00:02 PDT LOG:  all server processes terminated; reinitializing
2016-09-12 15:00:02 PDT LOG:  database system was interrupted; last
known up at 2016-09-12 14:59:55 PDT
2016-09-12 15:00:06 PDT LOG:  database system was not properly shut
down; automatic recovery in progress
2016-09-12 15:00:07 PDT LOG:  redo starts at 2A8/69000028
2016-09-12 15:00:07 PDT LOG:  record with zero length at 2A8/7201B4F8
2016-09-12 15:00:07 PDT LOG:  redo done at 2A8/7201B4C8
2016-09-12 15:00:07 PDT LOG:  last completed transaction was at log
time 2016-09-12 15:00:01.664762-07
2016-09-12 15:00:08 PDT LOG:  MultiXact member wraparound protections
are now enabled
2016-09-12 15:00:08 PDT LOG:  database system is ready to accept connections
2016-09-12 15:00:08 PDT LOG:  autovacuum launcher started

What I can see is not because server load, maximum idle is 79%

free
             total       used       free     shared    buffers     cached
Mem:      82493268   82027060     466208    3157228     136084   77526460
-/+ buffers/cache:    4364516   78128752
Swap:      1000444       9540     990904

Little use of swap, I will add more memory next week.

In your experience this looks like HW issue?

Thanks for your time!!!

Вложения

Re: psql 9.3 automatic recovery in progress

От
Tom Lane
Дата:
Periko Support <pheriko.support@gmail.com> writes:
> I got some issues with my DB under ubuntu 14.x.
> PSQL 9.3, odoo 7.x.

> 2016-09-12 09:00:01 PDT LOG:  server process (PID 23958) was
> terminated by signal 9: Killed

Usually, SIGKILLs coming out of nowhere indicate that the Linux OOM killer
has decided to target some database process.  You need to do something to
reduce memory pressure and/or disable memory overcommit so that that
doesn't happen.

            regards, tom lane


Re: psql 9.3 automatic recovery in progress

От
Periko Support
Дата:
My current server has 82GB memory.

Default settings but the only parameter I had chance is shared_buffers
from 128MB to 6G.

This server is dedicated to postgresql+odoo.

Is the only parameter I can thing can reduce my memory utilization?

Thanks Tom.


On Mon, Oct 10, 2016 at 10:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Periko Support <pheriko.support@gmail.com> writes:
>> I got some issues with my DB under ubuntu 14.x.
>> PSQL 9.3, odoo 7.x.
>
>> 2016-09-12 09:00:01 PDT LOG:  server process (PID 23958) was
>> terminated by signal 9: Killed
>
> Usually, SIGKILLs coming out of nowhere indicate that the Linux OOM killer
> has decided to target some database process.  You need to do something to
> reduce memory pressure and/or disable memory overcommit so that that
> doesn't happen.
>
>                         regards, tom lane


Re: psql 9.3 automatic recovery in progress

От
Periko Support
Дата:
Or add more memory to my server?

On Mon, Oct 10, 2016 at 11:05 AM, Periko Support
<pheriko.support@gmail.com> wrote:
> My current server has 82GB memory.
>
> Default settings but the only parameter I had chance is shared_buffers
> from 128MB to 6G.
>
> This server is dedicated to postgresql+odoo.
>
> Is the only parameter I can thing can reduce my memory utilization?
>
> Thanks Tom.
>
>
> On Mon, Oct 10, 2016 at 10:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Periko Support <pheriko.support@gmail.com> writes:
>>> I got some issues with my DB under ubuntu 14.x.
>>> PSQL 9.3, odoo 7.x.
>>
>>> 2016-09-12 09:00:01 PDT LOG:  server process (PID 23958) was
>>> terminated by signal 9: Killed
>>
>> Usually, SIGKILLs coming out of nowhere indicate that the Linux OOM killer
>> has decided to target some database process.  You need to do something to
>> reduce memory pressure and/or disable memory overcommit so that that
>> doesn't happen.
>>
>>                         regards, tom lane


Re: [SPAM] psql 9.3 automatic recovery in progress

От
Moreno Andreo
Дата:
Il 10/10/2016 18:24, Periko Support ha scritto:
> 2016-09-12 09:00:01 PDT LOG:  server process (PID 23958) was
> terminated by signal 9: Killed

> 2016-09-12 10:00:01 PDT LOG:  server process (PID 30766) was
> terminated by signal 9: Killed

> 2016-09-12 15:00:01 PDT LOG:  server process (PID 22030) was
> terminated by signal 9: Killed
>
>
These datetimes could be suspect. Every crash (kill) is done at
"00"minutes and "01" minutes, that makes me ask "Isn't there something
like cron running something that interfere with postgres?"

Cheers,
Moreno.




Re: [SPAM] psql 9.3 automatic recovery in progress

От
Melvin Davidson
Дата:


On Mon, Oct 10, 2016 at 2:14 PM, Moreno Andreo <moreno.andreo@evolu-s.it> wrote:

Il 10/10/2016 18:24, Periko Support ha scritto:
2016-09-12 09:00:01 PDT LOG:  server process (PID 23958) was
terminated by signal 9: Killed

2016-09-12 10:00:01 PDT LOG:  server process (PID 30766) was
terminated by signal 9: Killed

2016-09-12 15:00:01 PDT LOG:  server process (PID 22030) was
terminated by signal 9: Killed


These datetimes could be suspect. Every crash (kill) is done at "00"minutes and "01" minutes, that makes me ask "Isn't there something like cron running something that interfere with postgres?"

Cheers,
Moreno.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

The general philosphy is to start by setting shared_memory to 1/4 system memory, so
shared_buffers should be 20480 MB

--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: psql 9.3 automatic recovery in progress

От
Tom Lane
Дата:
Periko Support <pheriko.support@gmail.com> writes:
> My current server has 82GB memory.

You said this was running inside a VM, though --- maybe the VM is
resource-constrained?

In any case, turning off memory overcommit would be a good idea if
you're not concerned about running anything but Postgres.

            regards, tom lane


Re: [SPAM] psql 9.3 automatic recovery in progress

От
Adrian Klaver
Дата:
On 10/10/2016 11:14 AM, Moreno Andreo wrote:
>
> Il 10/10/2016 18:24, Periko Support ha scritto:
>> 2016-09-12 09:00:01 PDT LOG:  server process (PID 23958) was
>> terminated by signal 9: Killed
>
>> 2016-09-12 10:00:01 PDT LOG:  server process (PID 30766) was
>> terminated by signal 9: Killed
>
>> 2016-09-12 15:00:01 PDT LOG:  server process (PID 22030) was
>> terminated by signal 9: Killed
>>
>>
> These datetimes could be suspect. Every crash (kill) is done at
> "00"minutes and "01" minutes, that makes me ask "Isn't there something
> like cron running something that interfere with postgres?"

While we on the subject, the datetimes are almost a month old.

Does that mean this problem was just noticed or are the datetimes wrong?

>
> Cheers,
> Moreno.
>
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: psql 9.3 automatic recovery in progress

От
Periko Support
Дата:
Andreo u got a good observation here.

I got a script that run every hour why?

Odoo got some issues with IDLE connections, if we don't check our current psql connections after a while the system eat all connections and a lot of them are IDLE and stop answering users, we create a script that runs every hour, this is:

""" Script is used to kill database connection which are idle from last 15 minutes """
#!/usr/bin/env python
import psycopg2
import sys
import os
from os.path import join, expanduser
import subprocess, signal, psutil
import time

def get_conn():

    conn_string = "host='localhost' dbname='template1' user='openerp' password='s$p_p@r70'"

    try:
        # get a connection, if a connect cannot be made an exception will be raised here
        conn = psycopg2.connect(conn_string)
        cursor = conn.cursor()
#        print "successful Connection"
        return cursor
    except:
        exceptionType, exceptionValue, exceptionTraceback = sys.exc_info()
        sys.exit("Database connection failed!\n ->%s" % (exceptionValue))


def get_pid():

    SQL="select pid, datname, usename from pg_stat_activity where usename = 'openerp' AND query_start < current_timestamp - INTERVAL '15' MINUTE;"
    cursor = get_conn()
    cursor.execute(SQL)
    idle_record = cursor.fetchall()
    print "---------------------------------------------------------------------------------------------------"
    print "Date:",time.strftime("%d/%m/%Y")
    print "idle record list: ", idle_record
    print "---------------------------------------------------------------------------------------------------"
    for pid in idle_record:
        try:
#            print "process details",pid
#            os.system("kill -9 %s" % (int(pid[0]), ))
            os.kill(int(pid[0]), signal.SIGKILL)
        except OSError as ex:
            continue

get_pid()

I will move this to run not every hour and see the reaction.

Is a easy move, about Tim, our current KVM server is good for me, see picture please:


free
             total       used       free     shared    buffers     cached
Mem:     181764228  136200312   45563916        468      69904     734652
-/+ buffers/cache:  135395756   46368472
Swap:       261948          0     261948

I got other vm but they are on other raid setup.

Tim u mention that u recommend reduce memory pressure, u mean to lower down my values like shared_buffers or increase memory?

Melvin I try that value before but my server cry, I will add more memory in a few weeks.

Any comment I will appreciated, thanks.

On Mon, Oct 10, 2016 at 11:22 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Periko Support <pheriko.support@gmail.com> writes:
> My current server has 82GB memory.

You said this was running inside a VM, though --- maybe the VM is
resource-constrained?

In any case, turning off memory overcommit would be a good idea if
you're not concerned about running anything but Postgres.

                        regards, tom lane

Вложения

Re: [SPAM] psql 9.3 automatic recovery in progress

От
Periko Support
Дата:
I was on vacation, but the issue have the same behavior:

2016-10-10 07:50:09 PDT WARNING:  terminating connection because of
crash of another server process
2016-10-10 07:50:09 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-10-10 07:50:09 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-10-10 07:50:09 PDT WARNING:  terminating connection because of
crash of another server process
2016-10-10 07:50:09 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-10-10 07:50:09 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-10-10 07:50:09 PDT LOG:  archiver process (PID 13066) exited with
exit code 1
2016-10-10 07:50:10 PDT LOG:  all server processes terminated; reinitializing
2016-10-10 07:50:10 PDT LOG:  connection received: host=192.168.2.6 port=37700
2016-10-10 07:50:10 PDT LOG:  database system was interrupted; last
known up at 2016-10-10 07:49:15 PDT
2016-10-10 07:50:10 PDT FATAL:  the database system is in recovery mode
2016-10-10 07:50:10 PDT LOG:  connection received: host=192.168.2.6 port=37702
2016-10-10 07:50:10 PDT FATAL:  the database system is in recovery mode
2016-10-10 07:50:15 PDT LOG:  database system was not properly shut
down; automatic recovery in progress
2016-10-10 07:50:15 PDT LOG:  redo starts at 517/C9000028
2016-10-10 07:50:15 PDT LOG:  unexpected pageaddr 517/77000000 in log
segment 0000000100000517000000D2, offset 0
2016-10-10 07:50:15 PDT LOG:  redo done at 517/D10000C8
2016-10-10 07:50:15 PDT LOG:  last completed transaction was at log
time 2016-10-10 07:49:09.891669-07
2016-10-10 07:50:15 PDT LOG:  connection received: host=192.168.2.6 port=37704
2016-10-10 07:50:15 PDT FATAL:  the database system is in recovery mode
2016-10-10 07:50:15 PDT LOG:  connection received: host=192.168.2.6 port=37706
2016-10-10 07:50:15 PDT FATAL:  the database system is in recovery mode
2016-10-10 07:50:16 PDT LOG:  MultiXact member wraparound protections
are now enabled
2016-10-10 07:50:16 PDT LOG:  database system is ready to accept connections
2016-10-10 07:50:16 PDT LOG:  autovacuum launcher started

2016-10-10 09:00:01 PDT LOG:  archiver process (PID 14004) exited with
exit code 1
2016-10-10 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-10-10 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-10-10 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-10-10 09:00:01 PDT WARNING:  terminating connection because of
crash of another server process
2016-10-10 09:00:01 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2016-10-10 09:00:01 PDT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2016-10-10 09:00:01 PDT LOG:  all server processes terminated; reinitializing
2016-10-10 09:00:02 PDT LOG:  database system was interrupted; last
known up at 2016-10-10 08:59:33 PDT
2016-10-10 09:00:03 PDT LOG:  connection received: host=127.0.0.1 port=35950
2016-10-10 09:00:03 PDT FATAL:  the database system is in recovery mode
2016-10-10 09:00:03 PDT LOG:  connection received: host=127.0.0.1 port=35951
2016-10-10 09:00:03 PDT FATAL:  the database system is in recovery mode
2016-10-10 09:00:03 PDT LOG:  connection received: host=127.0.0.1 port=35952
2016-10-10 09:00:03 PDT FATAL:  the database system is in recovery mode
2016-10-10 09:00:03 PDT LOG:  connection received: host=127.0.0.1 port=35953
2016-10-10 09:00:03 PDT FATAL:  the database system is in recovery mode
2016-10-10 09:00:03 PDT LOG:  connection received: host=127.0.0.1 port=35954
2016-10-10 09:00:03 PDT FATAL:  the database system is in recovery mode
2016-10-10 09:00:03 PDT LOG:  connection received: host=127.0.0.1 port=35955
2016-10-10 09:00:03 PDT FATAL:  the database system is in recovery mode
2016-10-10 09:00:05 PDT LOG:  connection received: host=192.168.2.6 port=39380
2016-10-10 09:00:05 PDT FATAL:  the database system is in recovery mode
2016-10-10 09:00:05 PDT LOG:  connection received: host=192.168.2.6 port=39382
2016-10-10 09:00:05 PDT FATAL:  the database system is in recovery mode
2016-10-10 09:00:07 PDT LOG:  database system was not properly shut
down; automatic recovery in progress
2016-10-10 09:00:07 PDT LOG:  redo starts at 51A/82000028
2016-10-10 09:00:07 PDT LOG:  record with zero length at 51A/8E0126B0
2016-10-10 09:00:07 PDT LOG:  redo done at 51A/8E012680
2016-10-10 09:00:07 PDT LOG:  last completed transaction was at log
time 2016-10-10 09:00:01.142505-07
2016-10-10 09:00:08 PDT LOG:  connection received: host=127.0.0.1 port=35956
2016-10-10 09:00:08 PDT FATAL:  the database system is in recovery mode
2016-10-10 09:00:08 PDT LOG:  connection received: host=127.0.0.1 port=35957
2016-10-10 09:00:08 PDT FATAL:  the database system is in recovery mode
2016-10-10 09:00:08 PDT LOG:  MultiXact member wraparound protections
are now enabled
2016-10-10 09:00:08 PDT LOG:  database system is ready to accept connections
2016-10-10 09:00:08 PDT LOG:  autovacuum launcher started

Thanks.

On Mon, Oct 10, 2016 at 11:32 AM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:
> On 10/10/2016 11:14 AM, Moreno Andreo wrote:
>>
>>
>> Il 10/10/2016 18:24, Periko Support ha scritto:
>>>
>>> 2016-09-12 09:00:01 PDT LOG:  server process (PID 23958) was
>>> terminated by signal 9: Killed
>>
>>
>>> 2016-09-12 10:00:01 PDT LOG:  server process (PID 30766) was
>>> terminated by signal 9: Killed
>>
>>
>>> 2016-09-12 15:00:01 PDT LOG:  server process (PID 22030) was
>>> terminated by signal 9: Killed
>>>
>>>
>> These datetimes could be suspect. Every crash (kill) is done at
>> "00"minutes and "01" minutes, that makes me ask "Isn't there something
>> like cron running something that interfere with postgres?"
>
>
> While we on the subject, the datetimes are almost a month old.
>
> Does that mean this problem was just noticed or are the datetimes wrong?
>
>>
>> Cheers,
>> Moreno.
>>
>>
>>
>>
>
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


Re: psql 9.3 automatic recovery in progress

От
Pavel Stehule
Дата:


2016-10-10 21:12 GMT+02:00 Periko Support <pheriko.support@gmail.com>:
Andreo u got a good observation here.

I got a script that run every hour why?

Odoo got some issues with IDLE connections, if we don't check our current psql connections after a while the system eat all connections and a lot of them are IDLE and stop answering users, we create a script that runs every hour, this is:

""" Script is used to kill database connection which are idle from last 15 minutes """
#!/usr/bin/env python
import psycopg2
import sys
import os
from os.path import join, expanduser
import subprocess, signal, psutil
import time

def get_conn():

    conn_string = "host='localhost' dbname='template1' user='openerp' password='s$p_p@r70'"

    try:
        # get a connection, if a connect cannot be made an exception will be raised here
        conn = psycopg2.connect(conn_string)
        cursor = conn.cursor()
#        print "successful Connection"
        return cursor
    except:
        exceptionType, exceptionValue, exceptionTraceback = sys.exc_info()
        sys.exit("Database connection failed!\n ->%s" % (exceptionValue))


def get_pid():

    SQL="select pid, datname, usename from pg_stat_activity where usename = 'openerp' AND query_start < current_timestamp - INTERVAL '15' MINUTE;"
    cursor = get_conn()
    cursor.execute(SQL)
    idle_record = cursor.fetchall()
    print "---------------------------------------------------------------------------------------------------"
    print "Date:",time.strftime("%d/%m/%Y")
    print "idle record list: ", idle_record
    print "---------------------------------------------------------------------------------------------------"
    for pid in idle_record:
        try:
#            print "process details",pid
#            os.system("kill -9 %s" % (int(pid[0]), ))
            os.kill(int(pid[0]), signal.SIGKILL)
        except OSError as ex:
            continue

get_pid()

I will move this to run not every hour and see the reaction.

Is a easy move, about Tim, our current KVM server is good for me, see picture please:


free
             total       used       free     shared    buffers     cached
Mem:     181764228  136200312   45563916        468      69904     734652
-/+ buffers/cache:  135395756   46368472
Swap:       261948          0     261948

I got other vm but they are on other raid setup.

Tim u mention that u recommend reduce memory pressure, u mean to lower down my values like shared_buffers or increase memory?

try to decrease lifetime of odoo sessions - then memory will be returned back to system - set limit_memory_soft less in odoo config - I found some manuals on net with wrong settings on net.

the odoo sessions should be refreshed more often.

Regards

Pavel

 


Melvin I try that value before but my server cry, I will add more memory in a few weeks.

Any comment I will appreciated, thanks.

On Mon, Oct 10, 2016 at 11:22 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Periko Support <pheriko.support@gmail.com> writes:
> My current server has 82GB memory.

You said this was running inside a VM, though --- maybe the VM is
resource-constrained?

In any case, turning off memory overcommit would be a good idea if
you're not concerned about running anything but Postgres.

                        regards, tom lane


Вложения

Re: psql 9.3 automatic recovery in progress

От
Alban Hertroys
Дата:
> On 10 Oct 2016, at 21:12, Periko Support <pheriko.support@gmail.com> wrote:
>
>     for pid in idle_record:
>         try:
> #            print "process details",pid
> #            os.system("kill -9 %s" % (int(pid[0]), ))
>             os.kill(int(pid[0]), signal.SIGKILL)
>         except OSError as ex:
>             continue

That query returns PostgreSQL backends and you're sending them SIGKILL. Not a recommended practice far as I know.
Shouldn'tyou rather be sending those kill signals to the clients connecting to the db? 
Worse, apparently at some time in the past (a month ago matching those logs, perhaps?) it used to send kill -9! That's
absolutelya very bad idea. 

While on the topic, there is a PG function to cancel a backend query from within PG:
https://www.postgresql.org/docs/9.5/static/functions-admin.html
I think that's the best way to go about this, and best of all, you can combine that with your select statement.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



Re: psql 9.3 automatic recovery in progress

От
Alban Hertroys
Дата:
> On 10 Oct 2016, at 21:28, Alban Hertroys <haramrae@gmail.com> wrote:
>
>
>> On 10 Oct 2016, at 21:12, Periko Support <pheriko.support@gmail.com> wrote:
>>
>>    for pid in idle_record:
>>        try:
>> #            print "process details",pid
>> #            os.system("kill -9 %s" % (int(pid[0]), ))
>>            os.kill(int(pid[0]), signal.SIGKILL)
>>        except OSError as ex:
>>            continue
>
> That query returns PostgreSQL backends and you're sending them SIGKILL. Not a recommended practice far as I know.
Shouldn'tyou rather be sending those kill signals to the clients connecting to the db? 
> Worse, apparently at some time in the past (a month ago matching those logs, perhaps?) it used to send kill -9!
That'sabsolutely a very bad idea. 
>
> While on the topic, there is a PG function to cancel a backend query from within PG:
https://www.postgresql.org/docs/9.5/static/functions-admin.html
> I think that's the best way to go about this, and best of all, you can combine that with your select statement.

Another idea struck me; if that script is under version control, you can check when that change was committed. If it
isn't,perhaps you should. My current favourite is Hg (aka Mercurial), which happens to be written in Python, just like
yourscript. 

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



Re: psql 9.3 automatic recovery in progress

От
Periko Support
Дата:
That script was from a vendor called 'allianzgrp.com'.

Was their solution.

Them I have a lot of work to do here.

On Mon, Oct 10, 2016 at 12:32 PM, Alban Hertroys <haramrae@gmail.com> wrote:
>
>> On 10 Oct 2016, at 21:28, Alban Hertroys <haramrae@gmail.com> wrote:
>>
>>
>>> On 10 Oct 2016, at 21:12, Periko Support <pheriko.support@gmail.com> wrote:
>>>
>>>    for pid in idle_record:
>>>        try:
>>> #            print "process details",pid
>>> #            os.system("kill -9 %s" % (int(pid[0]), ))
>>>            os.kill(int(pid[0]), signal.SIGKILL)
>>>        except OSError as ex:
>>>            continue
>>
>> That query returns PostgreSQL backends and you're sending them SIGKILL. Not a recommended practice far as I know.
Shouldn'tyou rather be sending those kill signals to the clients connecting to the db? 
>> Worse, apparently at some time in the past (a month ago matching those logs, perhaps?) it used to send kill -9!
That'sabsolutely a very bad idea. 
>>
>> While on the topic, there is a PG function to cancel a backend query from within PG:
https://www.postgresql.org/docs/9.5/static/functions-admin.html
>> I think that's the best way to go about this, and best of all, you can combine that with your select statement.
>
> Another idea struck me; if that script is under version control, you can check when that change was committed. If it
isn't,perhaps you should. My current favourite is Hg (aka Mercurial), which happens to be written in Python, just like
yourscript. 
>
> Alban Hertroys
> --
> If you can't see the forest for the trees,
> cut the trees and you'll find there is no forest.
>


Re: psql 9.3 automatic recovery in progress

От
Periko Support
Дата:
For the life time in odoo session, can u point me where I can manage that setting?

The configuration /etc/openerp-server.conf doesn't have any parameter for that.

That must be in a odoo file...?

Thanks.

On Mon, Oct 10, 2016 at 12:25 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2016-10-10 21:12 GMT+02:00 Periko Support <pheriko.support@gmail.com>:
Andreo u got a good observation here.

I got a script that run every hour why?

Odoo got some issues with IDLE connections, if we don't check our current psql connections after a while the system eat all connections and a lot of them are IDLE and stop answering users, we create a script that runs every hour, this is:

""" Script is used to kill database connection which are idle from last 15 minutes """
#!/usr/bin/env python
import psycopg2
import sys
import os
from os.path import join, expanduser
import subprocess, signal, psutil
import time

def get_conn():

    conn_string = "host='localhost' dbname='template1' user='openerp' password='s$p_p@r70'"

    try:
        # get a connection, if a connect cannot be made an exception will be raised here
        conn = psycopg2.connect(conn_string)
        cursor = conn.cursor()
#        print "successful Connection"
        return cursor
    except:
        exceptionType, exceptionValue, exceptionTraceback = sys.exc_info()
        sys.exit("Database connection failed!\n ->%s" % (exceptionValue))


def get_pid():

    SQL="select pid, datname, usename from pg_stat_activity where usename = 'openerp' AND query_start < current_timestamp - INTERVAL '15' MINUTE;"
    cursor = get_conn()
    cursor.execute(SQL)
    idle_record = cursor.fetchall()
    print "---------------------------------------------------------------------------------------------------"
    print "Date:",time.strftime("%d/%m/%Y")
    print "idle record list: ", idle_record
    print "---------------------------------------------------------------------------------------------------"
    for pid in idle_record:
        try:
#            print "process details",pid
#            os.system("kill -9 %s" % (int(pid[0]), ))
            os.kill(int(pid[0]), signal.SIGKILL)
        except OSError as ex:
            continue

get_pid()

I will move this to run not every hour and see the reaction.

Is a easy move, about Tim, our current KVM server is good for me, see picture please:


free
             total       used       free     shared    buffers     cached
Mem:     181764228  136200312   45563916        468      69904     734652
-/+ buffers/cache:  135395756   46368472
Swap:       261948          0     261948

I got other vm but they are on other raid setup.

Tim u mention that u recommend reduce memory pressure, u mean to lower down my values like shared_buffers or increase memory?

try to decrease lifetime of odoo sessions - then memory will be returned back to system - set limit_memory_soft less in odoo config - I found some manuals on net with wrong settings on net.

the odoo sessions should be refreshed more often.

Regards

Pavel

 


Melvin I try that value before but my server cry, I will add more memory in a few weeks.

Any comment I will appreciated, thanks.

On Mon, Oct 10, 2016 at 11:22 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Periko Support <pheriko.support@gmail.com> writes:
> My current server has 82GB memory.

You said this was running inside a VM, though --- maybe the VM is
resource-constrained?

In any case, turning off memory overcommit would be a good idea if
you're not concerned about running anything but Postgres.

                        regards, tom lane



Вложения

Re: [SPAM] psql 9.3 automatic recovery in progress

От
Adrian Klaver
Дата:
On 10/10/2016 12:18 PM, Periko Support wrote:
> I was on vacation, but the issue have the same behavior:

Actually no. Before you had:

2016-09-12 09:00:01 PDT LOG:  server process (PID 23958) was
terminated by signal 9: Killed

Now you have:

2016-10-10 07:50:09 PDT WARNING:  terminating connection because of
crash of another server process
2016-10-10 07:50:09 PDT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.

Which corresponds to this from your subsequent post:

#            os.system("kill -9 %s" % (int(pid[0]), ))
             os.kill(int(pid[0]), signal.SIGKILL)

>
> 2016-10-10 07:50:09 PDT WARNING:  terminating connection because of
> crash of another server process
> 2016-10-10 07:50:09 PDT DETAIL:  The postmaster has commanded this
> server process to roll back the current transaction and exit, because
> another server process exited abnormally and possibly corrupted shared
> memory.
> 2016-10-10 07:50:09 PDT HINT:  In a moment you should be able to
> reconnect to the database and repeat your command.
> 2016-10-10 07:50:09 PDT WARNING:  terminating connection because of
> crash of another server process
> 2016-10-10 07:50:09 PDT DETAIL:  The postmaster has commanded this
> server process to roll back the current transaction and exit, because
> another server process exited abnormally and possibly corrupted shared
> memory.
> 2016-10-10 07:50:09 PDT HINT:  In a moment you should be able to
> reconnect to the database and repeat your command.
> 2016-10-10 07:50:09 PDT LOG:  archiver process (PID 13066) exited with
> exit code 1
> 2016-10-10 07:50:10 PDT LOG:  all server processes terminated; reinitializing
> 2016-10-10 07:50:10 PDT LOG:  connection received: host=192.168.2.6 port=37700
> 2016-10-10 07:50:10 PDT LOG:  database system was interrupted; last
> known up at 2016-10-10 07:49:15 PDT
> 2016-10-10 07:50:10 PDT FATAL:  the database system is in recovery mode
> 2016-10-10 07:50:10 PDT LOG:  connection received: host=192.168.2.6 port=37702
> 2016-10-10 07:50:10 PDT FATAL:  the database system is in recovery mode
> 2016-10-10 07:50:15 PDT LOG:  database system was not properly shut
> down; automatic recovery in progress
> 2016-10-10 07:50:15 PDT LOG:  redo starts at 517/C9000028
> 2016-10-10 07:50:15 PDT LOG:  unexpected pageaddr 517/77000000 in log
> segment 0000000100000517000000D2, offset 0
> 2016-10-10 07:50:15 PDT LOG:  redo done at 517/D10000C8
> 2016-10-10 07:50:15 PDT LOG:  last completed transaction was at log
> time 2016-10-10 07:49:09.891669-07
> 2016-10-10 07:50:15 PDT LOG:  connection received: host=192.168.2.6 port=37704
> 2016-10-10 07:50:15 PDT FATAL:  the database system is in recovery mode
> 2016-10-10 07:50:15 PDT LOG:  connection received: host=192.168.2.6 port=37706
> 2016-10-10 07:50:15 PDT FATAL:  the database system is in recovery mode
> 2016-10-10 07:50:16 PDT LOG:  MultiXact member wraparound protections
> are now enabled
> 2016-10-10 07:50:16 PDT LOG:  database system is ready to accept connections
> 2016-10-10 07:50:16 PDT LOG:  autovacuum launcher started
>
> 2016-10-10 09:00:01 PDT LOG:  archiver process (PID 14004) exited with
> exit code 1
> 2016-10-10 09:00:01 PDT WARNING:  terminating connection because of
> crash of another server process
> 2016-10-10 09:00:01 PDT DETAIL:  The postmaster has commanded this
> server process to roll back the current transaction and exit, because
> another server process exited abnormally and possibly corrupted shared
> memory.
> 2016-10-10 09:00:01 PDT HINT:  In a moment you should be able to
> reconnect to the database and repeat your command.
> 2016-10-10 09:00:01 PDT WARNING:  terminating connection because of
> crash of another server process
> 2016-10-10 09:00:01 PDT DETAIL:  The postmaster has commanded this
> server process to roll back the current transaction and exit, because
> another server process exited abnormally and possibly corrupted shared
> memory.
> 2016-10-10 09:00:01 PDT HINT:  In a moment you should be able to
> reconnect to the database and repeat your command.
> 2016-10-10 09:00:01 PDT LOG:  all server processes terminated; reinitializing
> 2016-10-10 09:00:02 PDT LOG:  database system was interrupted; last
> known up at 2016-10-10 08:59:33 PDT
> 2016-10-10 09:00:03 PDT LOG:  connection received: host=127.0.0.1 port=35950
> 2016-10-10 09:00:03 PDT FATAL:  the database system is in recovery mode
> 2016-10-10 09:00:03 PDT LOG:  connection received: host=127.0.0.1 port=35951
> 2016-10-10 09:00:03 PDT FATAL:  the database system is in recovery mode
> 2016-10-10 09:00:03 PDT LOG:  connection received: host=127.0.0.1 port=35952
> 2016-10-10 09:00:03 PDT FATAL:  the database system is in recovery mode
> 2016-10-10 09:00:03 PDT LOG:  connection received: host=127.0.0.1 port=35953
> 2016-10-10 09:00:03 PDT FATAL:  the database system is in recovery mode
> 2016-10-10 09:00:03 PDT LOG:  connection received: host=127.0.0.1 port=35954
> 2016-10-10 09:00:03 PDT FATAL:  the database system is in recovery mode
> 2016-10-10 09:00:03 PDT LOG:  connection received: host=127.0.0.1 port=35955
> 2016-10-10 09:00:03 PDT FATAL:  the database system is in recovery mode
> 2016-10-10 09:00:05 PDT LOG:  connection received: host=192.168.2.6 port=39380
> 2016-10-10 09:00:05 PDT FATAL:  the database system is in recovery mode
> 2016-10-10 09:00:05 PDT LOG:  connection received: host=192.168.2.6 port=39382
> 2016-10-10 09:00:05 PDT FATAL:  the database system is in recovery mode
> 2016-10-10 09:00:07 PDT LOG:  database system was not properly shut
> down; automatic recovery in progress
> 2016-10-10 09:00:07 PDT LOG:  redo starts at 51A/82000028
> 2016-10-10 09:00:07 PDT LOG:  record with zero length at 51A/8E0126B0
> 2016-10-10 09:00:07 PDT LOG:  redo done at 51A/8E012680
> 2016-10-10 09:00:07 PDT LOG:  last completed transaction was at log
> time 2016-10-10 09:00:01.142505-07
> 2016-10-10 09:00:08 PDT LOG:  connection received: host=127.0.0.1 port=35956
> 2016-10-10 09:00:08 PDT FATAL:  the database system is in recovery mode
> 2016-10-10 09:00:08 PDT LOG:  connection received: host=127.0.0.1 port=35957
> 2016-10-10 09:00:08 PDT FATAL:  the database system is in recovery mode
> 2016-10-10 09:00:08 PDT LOG:  MultiXact member wraparound protections
> are now enabled
> 2016-10-10 09:00:08 PDT LOG:  database system is ready to accept connections
> 2016-10-10 09:00:08 PDT LOG:  autovacuum launcher started
>
> Thanks.
>
> On Mon, Oct 10, 2016 at 11:32 AM, Adrian Klaver
> <adrian.klaver@aklaver.com> wrote:
>> On 10/10/2016 11:14 AM, Moreno Andreo wrote:
>>>
>>>
>>> Il 10/10/2016 18:24, Periko Support ha scritto:
>>>>
>>>> 2016-09-12 09:00:01 PDT LOG:  server process (PID 23958) was
>>>> terminated by signal 9: Killed
>>>
>>>
>>>> 2016-09-12 10:00:01 PDT LOG:  server process (PID 30766) was
>>>> terminated by signal 9: Killed
>>>
>>>
>>>> 2016-09-12 15:00:01 PDT LOG:  server process (PID 22030) was
>>>> terminated by signal 9: Killed
>>>>
>>>>
>>> These datetimes could be suspect. Every crash (kill) is done at
>>> "00"minutes and "01" minutes, that makes me ask "Isn't there something
>>> like cron running something that interfere with postgres?"
>>
>>
>> While we on the subject, the datetimes are almost a month old.
>>
>> Does that mean this problem was just noticed or are the datetimes wrong?
>>
>>>
>>> Cheers,
>>> Moreno.
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Adrian Klaver
>> adrian.klaver@aklaver.com
>>
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: psql 9.3 automatic recovery in progress

От
Alban Hertroys
Дата:
> On 10 Oct 2016, at 21:43, Periko Support <pheriko.support@gmail.com> wrote:
>
> For the life time in odoo session, can u point me where I can manage that setting?
>
> The configuration /etc/openerp-server.conf doesn't have any parameter for that.
>
> That must be in a odoo file...?
>
> Thanks.
>
> On Mon, Oct 10, 2016 at 12:25 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
>
> 2016-10-10 21:12 GMT+02:00 Periko Support <pheriko.support@gmail.com>:
> Andreo u got a good observation here.
>
> I got a script that run every hour why?
>
> Odoo got some issues with IDLE connections, if we don't check our current psql connections after a while the system
eatall connections and a lot of them are IDLE and stop answering users, we create a script that runs every hour, this
is:

That's all part of Odoo (formerly known as OpenERP), isn't it? Did you contact them about this behaviour yet? Might
justbe that they're familiar with the problem and have a solution for it. 

I suspect the Python script you're running was implemented as a rather rough workaround by people from allianzgrp who
knewjust enough to be harmful. (Kill -9 on a database process, jeez! Keyboards should have an electroshock feature for
peoplelike that…) 

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



Re: [SPAM] psql 9.3 automatic recovery in progress

От
Periko Support
Дата:
Adrian

2016-10-10 12:00:01 PDT LOG:  connection authorized: user=openerp
database=template1
2016-10-10 12:00:01 PDT LOG:  server process (PID 30394) was
terminated by signal 9: Killed
2016-10-10 12:00:01 PDT DETAIL:  Failed process was running: SELECT
"name", "model", "description", "month" FROM "etiquetas_temp"
2016-10-10 12:00:01 PDT LOG:  terminating any other active server processes

I will  do some changes to my server and see if I can fix the issue.

Thanks for your comment and all of u for your great help.


On Mon, Oct 10, 2016 at 2:03 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:
> On 10/10/2016 12:18 PM, Periko Support wrote:
>>
>> I was on vacation, but the issue have the same behavior:
>
>
> Actually no. Before you had:
>
> 2016-09-12 09:00:01 PDT LOG:  server process (PID 23958) was
> terminated by signal 9: Killed
>
> Now you have:
>
> 2016-10-10 07:50:09 PDT WARNING:  terminating connection because of
> crash of another server process
> 2016-10-10 07:50:09 PDT DETAIL:  The postmaster has commanded this
> server process to roll back the current transaction and exit, because
> another server process exited abnormally and possibly corrupted shared
> memory.
>
> Which corresponds to this from your subsequent post:
>
> #            os.system("kill -9 %s" % (int(pid[0]), ))
>             os.kill(int(pid[0]), signal.SIGKILL)
>
>>
>> 2016-10-10 07:50:09 PDT WARNING:  terminating connection because of
>> crash of another server process
>> 2016-10-10 07:50:09 PDT DETAIL:  The postmaster has commanded this
>> server process to roll back the current transaction and exit, because
>> another server process exited abnormally and possibly corrupted shared
>> memory.
>> 2016-10-10 07:50:09 PDT HINT:  In a moment you should be able to
>> reconnect to the database and repeat your command.
>> 2016-10-10 07:50:09 PDT WARNING:  terminating connection because of
>> crash of another server process
>> 2016-10-10 07:50:09 PDT DETAIL:  The postmaster has commanded this
>> server process to roll back the current transaction and exit, because
>> another server process exited abnormally and possibly corrupted shared
>> memory.
>> 2016-10-10 07:50:09 PDT HINT:  In a moment you should be able to
>> reconnect to the database and repeat your command.
>> 2016-10-10 07:50:09 PDT LOG:  archiver process (PID 13066) exited with
>> exit code 1
>> 2016-10-10 07:50:10 PDT LOG:  all server processes terminated;
>> reinitializing
>> 2016-10-10 07:50:10 PDT LOG:  connection received: host=192.168.2.6
>> port=37700
>> 2016-10-10 07:50:10 PDT LOG:  database system was interrupted; last
>> known up at 2016-10-10 07:49:15 PDT
>> 2016-10-10 07:50:10 PDT FATAL:  the database system is in recovery mode
>> 2016-10-10 07:50:10 PDT LOG:  connection received: host=192.168.2.6
>> port=37702
>> 2016-10-10 07:50:10 PDT FATAL:  the database system is in recovery mode
>> 2016-10-10 07:50:15 PDT LOG:  database system was not properly shut
>> down; automatic recovery in progress
>> 2016-10-10 07:50:15 PDT LOG:  redo starts at 517/C9000028
>> 2016-10-10 07:50:15 PDT LOG:  unexpected pageaddr 517/77000000 in log
>> segment 0000000100000517000000D2, offset 0
>> 2016-10-10 07:50:15 PDT LOG:  redo done at 517/D10000C8
>> 2016-10-10 07:50:15 PDT LOG:  last completed transaction was at log
>> time 2016-10-10 07:49:09.891669-07
>> 2016-10-10 07:50:15 PDT LOG:  connection received: host=192.168.2.6
>> port=37704
>> 2016-10-10 07:50:15 PDT FATAL:  the database system is in recovery mode
>> 2016-10-10 07:50:15 PDT LOG:  connection received: host=192.168.2.6
>> port=37706
>> 2016-10-10 07:50:15 PDT FATAL:  the database system is in recovery mode
>> 2016-10-10 07:50:16 PDT LOG:  MultiXact member wraparound protections
>> are now enabled
>> 2016-10-10 07:50:16 PDT LOG:  database system is ready to accept
>> connections
>> 2016-10-10 07:50:16 PDT LOG:  autovacuum launcher started
>>
>> 2016-10-10 09:00:01 PDT LOG:  archiver process (PID 14004) exited with
>> exit code 1
>> 2016-10-10 09:00:01 PDT WARNING:  terminating connection because of
>> crash of another server process
>> 2016-10-10 09:00:01 PDT DETAIL:  The postmaster has commanded this
>> server process to roll back the current transaction and exit, because
>> another server process exited abnormally and possibly corrupted shared
>> memory.
>> 2016-10-10 09:00:01 PDT HINT:  In a moment you should be able to
>> reconnect to the database and repeat your command.
>> 2016-10-10 09:00:01 PDT WARNING:  terminating connection because of
>> crash of another server process
>> 2016-10-10 09:00:01 PDT DETAIL:  The postmaster has commanded this
>> server process to roll back the current transaction and exit, because
>> another server process exited abnormally and possibly corrupted shared
>> memory.
>> 2016-10-10 09:00:01 PDT HINT:  In a moment you should be able to
>> reconnect to the database and repeat your command.
>> 2016-10-10 09:00:01 PDT LOG:  all server processes terminated;
>> reinitializing
>> 2016-10-10 09:00:02 PDT LOG:  database system was interrupted; last
>> known up at 2016-10-10 08:59:33 PDT
>> 2016-10-10 09:00:03 PDT LOG:  connection received: host=127.0.0.1
>> port=35950
>> 2016-10-10 09:00:03 PDT FATAL:  the database system is in recovery mode
>> 2016-10-10 09:00:03 PDT LOG:  connection received: host=127.0.0.1
>> port=35951
>> 2016-10-10 09:00:03 PDT FATAL:  the database system is in recovery mode
>> 2016-10-10 09:00:03 PDT LOG:  connection received: host=127.0.0.1
>> port=35952
>> 2016-10-10 09:00:03 PDT FATAL:  the database system is in recovery mode
>> 2016-10-10 09:00:03 PDT LOG:  connection received: host=127.0.0.1
>> port=35953
>> 2016-10-10 09:00:03 PDT FATAL:  the database system is in recovery mode
>> 2016-10-10 09:00:03 PDT LOG:  connection received: host=127.0.0.1
>> port=35954
>> 2016-10-10 09:00:03 PDT FATAL:  the database system is in recovery mode
>> 2016-10-10 09:00:03 PDT LOG:  connection received: host=127.0.0.1
>> port=35955
>> 2016-10-10 09:00:03 PDT FATAL:  the database system is in recovery mode
>> 2016-10-10 09:00:05 PDT LOG:  connection received: host=192.168.2.6
>> port=39380
>> 2016-10-10 09:00:05 PDT FATAL:  the database system is in recovery mode
>> 2016-10-10 09:00:05 PDT LOG:  connection received: host=192.168.2.6
>> port=39382
>> 2016-10-10 09:00:05 PDT FATAL:  the database system is in recovery mode
>> 2016-10-10 09:00:07 PDT LOG:  database system was not properly shut
>> down; automatic recovery in progress
>> 2016-10-10 09:00:07 PDT LOG:  redo starts at 51A/82000028
>> 2016-10-10 09:00:07 PDT LOG:  record with zero length at 51A/8E0126B0
>> 2016-10-10 09:00:07 PDT LOG:  redo done at 51A/8E012680
>> 2016-10-10 09:00:07 PDT LOG:  last completed transaction was at log
>> time 2016-10-10 09:00:01.142505-07
>> 2016-10-10 09:00:08 PDT LOG:  connection received: host=127.0.0.1
>> port=35956
>> 2016-10-10 09:00:08 PDT FATAL:  the database system is in recovery mode
>> 2016-10-10 09:00:08 PDT LOG:  connection received: host=127.0.0.1
>> port=35957
>> 2016-10-10 09:00:08 PDT FATAL:  the database system is in recovery mode
>> 2016-10-10 09:00:08 PDT LOG:  MultiXact member wraparound protections
>> are now enabled
>> 2016-10-10 09:00:08 PDT LOG:  database system is ready to accept
>> connections
>> 2016-10-10 09:00:08 PDT LOG:  autovacuum launcher started
>>
>> Thanks.
>>
>> On Mon, Oct 10, 2016 at 11:32 AM, Adrian Klaver
>> <adrian.klaver@aklaver.com> wrote:
>>>
>>> On 10/10/2016 11:14 AM, Moreno Andreo wrote:
>>>>
>>>>
>>>>
>>>> Il 10/10/2016 18:24, Periko Support ha scritto:
>>>>>
>>>>>
>>>>> 2016-09-12 09:00:01 PDT LOG:  server process (PID 23958) was
>>>>> terminated by signal 9: Killed
>>>>
>>>>
>>>>
>>>>> 2016-09-12 10:00:01 PDT LOG:  server process (PID 30766) was
>>>>> terminated by signal 9: Killed
>>>>
>>>>
>>>>
>>>>> 2016-09-12 15:00:01 PDT LOG:  server process (PID 22030) was
>>>>> terminated by signal 9: Killed
>>>>>
>>>>>
>>>> These datetimes could be suspect. Every crash (kill) is done at
>>>> "00"minutes and "01" minutes, that makes me ask "Isn't there something
>>>> like cron running something that interfere with postgres?"
>>>
>>>
>>>
>>> While we on the subject, the datetimes are almost a month old.
>>>
>>> Does that mean this problem was just noticed or are the datetimes wrong?
>>>
>>>>
>>>> Cheers,
>>>> Moreno.
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Adrian Klaver
>>> adrian.klaver@aklaver.com
>>>
>>>
>>>
>>> --
>>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-general
>>
>>
>>
>
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com


Re: [SPAM] psql 9.3 automatic recovery in progress

От
Tom Lane
Дата:
Adrian Klaver <adrian.klaver@aklaver.com> writes:
> On 10/10/2016 12:18 PM, Periko Support wrote:
>> I was on vacation, but the issue have the same behavior:

> Actually no. Before you had:

> 2016-09-12 09:00:01 PDT LOG:  server process (PID 23958) was
> terminated by signal 9: Killed

> Now you have:

> 2016-10-10 07:50:09 PDT WARNING:  terminating connection because of
> crash of another server process

Most likely it *is* the same thing but the OP trimmed the second log
excerpt too much.  The "crash of another server process" complaints
suggest strongly that there was already another problem and this
is just part of the postmaster's kill-all-children-and-restart
recovery procedure.

Now, if there really is nothing before this in the log, another possible
theory is that something decided to send the child processes a SIGQUIT
signal, which would cause them to believe that the postmaster had told
them to commit hara-kiri.  I only bring this up because we were already
shown a script sending random SIGKILLs ... so random SIGQUITs wouldn't be
too hard to credit either.  But the subsequent log entries don't quite
square with that idea; if the postmaster weren't already expecting the
children to die, it would have reacted differently.

            regards, tom lane


Re: [SPAM] psql 9.3 automatic recovery in progress

От
Scott Mead
Дата:


On Mon, Oct 10, 2016 at 6:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Adrian Klaver <adrian.klaver@aklaver.com> writes:
> On 10/10/2016 12:18 PM, Periko Support wrote:
>> I was on vacation, but the issue have the same behavior:

> Actually no. Before you had:

> 2016-09-12 09:00:01 PDT LOG:  server process (PID 23958) was
> terminated by signal 9: Killed

> Now you have:

> 2016-10-10 07:50:09 PDT WARNING:  terminating connection because of
> crash of another server process

Most likely it *is* the same thing but the OP trimmed the second log
excerpt too much.  The "crash of another server process" complaints
suggest strongly that there was already another problem and this
is just part of the postmaster's kill-all-children-and-restart
recovery procedure.

Now, if there really is nothing before this in the log, another possible
theory is that something decided to send the child processes a SIGQUIT
signal, which would cause them to believe that the postmaster had told
them to commit hara-kiri.  I only bring this up because we were already
shown a script sending random SIGKILLs ... so random SIGQUITs wouldn't be
too hard to credit either.  But the subsequent log entries don't quite
square with that idea; if the postmaster weren't already expecting the
children to die, it would have reacted differently.


The better solution is to do this in one query and more safely:

select pid, usename, datname, pg_terminate_backend(pid) 
   FROM pg_stat_activity 
  WHERE usename = 'openerp' 
       AND  now() - query_start > '15 minutes'::interval;

   This will use the builtin 'pg_terminate_backend for you in one shot.

 --Scott


 

                        regards, tom lane


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
--
Scott Mead
Sr. Architect
OpenSCG

Re: [SPAM] psql 9.3 automatic recovery in progress

От
Periko Support
Дата:
Scott your script is very clean, I'm testing, thanks.

On Mon, Oct 10, 2016 at 3:28 PM, Scott Mead <scottm@openscg.com> wrote:
>
>
> On Mon, Oct 10, 2016 at 6:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> Adrian Klaver <adrian.klaver@aklaver.com> writes:
>> > On 10/10/2016 12:18 PM, Periko Support wrote:
>> >> I was on vacation, but the issue have the same behavior:
>>
>> > Actually no. Before you had:
>>
>> > 2016-09-12 09:00:01 PDT LOG:  server process (PID 23958) was
>> > terminated by signal 9: Killed
>>
>> > Now you have:
>>
>> > 2016-10-10 07:50:09 PDT WARNING:  terminating connection because of
>> > crash of another server process
>>
>> Most likely it *is* the same thing but the OP trimmed the second log
>> excerpt too much.  The "crash of another server process" complaints
>> suggest strongly that there was already another problem and this
>> is just part of the postmaster's kill-all-children-and-restart
>> recovery procedure.
>>
>> Now, if there really is nothing before this in the log, another possible
>> theory is that something decided to send the child processes a SIGQUIT
>> signal, which would cause them to believe that the postmaster had told
>> them to commit hara-kiri.  I only bring this up because we were already
>> shown a script sending random SIGKILLs ... so random SIGQUITs wouldn't be
>> too hard to credit either.  But the subsequent log entries don't quite
>> square with that idea; if the postmaster weren't already expecting the
>> children to die, it would have reacted differently.
>
>
>
> The better solution is to do this in one query and more safely:
>
> select pid, usename, datname, pg_terminate_backend(pid)
>    FROM pg_stat_activity
>   WHERE usename = 'openerp'
>        AND  now() - query_start > '15 minutes'::interval;
>
>    This will use the builtin 'pg_terminate_backend for you in one shot.
>
>  --Scott
>
>
>
>>
>>
>>                         regards, tom lane
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>
>
>
>
> --
> --
> Scott Mead
> Sr. Architect
> OpenSCG
> http://openscg.com


Re: psql 9.3 automatic recovery in progress

От
Pavel Stehule
Дата:


2016-10-10 21:43 GMT+02:00 Periko Support <pheriko.support@gmail.com>:
For the life time in odoo session, can u point me where I can manage that setting?

The configuration /etc/openerp-server.conf doesn't have any parameter for that.

That must be in a odoo file...?
Regards

Pavel

 
Thanks.

On Mon, Oct 10, 2016 at 12:25 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2016-10-10 21:12 GMT+02:00 Periko Support <pheriko.support@gmail.com>:
Andreo u got a good observation here.

I got a script that run every hour why?

Odoo got some issues with IDLE connections, if we don't check our current psql connections after a while the system eat all connections and a lot of them are IDLE and stop answering users, we create a script that runs every hour, this is:

""" Script is used to kill database connection which are idle from last 15 minutes """
#!/usr/bin/env python
import psycopg2
import sys
import os
from os.path import join, expanduser
import subprocess, signal, psutil
import time

def get_conn():

    conn_string = "host='localhost' dbname='template1' user='openerp' password='s$p_p@r70'"

    try:
        # get a connection, if a connect cannot be made an exception will be raised here
        conn = psycopg2.connect(conn_string)
        cursor = conn.cursor()
#        print "successful Connection"
        return cursor
    except:
        exceptionType, exceptionValue, exceptionTraceback = sys.exc_info()
        sys.exit("Database connection failed!\n ->%s" % (exceptionValue))


def get_pid():

    SQL="select pid, datname, usename from pg_stat_activity where usename = 'openerp' AND query_start < current_timestamp - INTERVAL '15' MINUTE;"
    cursor = get_conn()
    cursor.execute(SQL)
    idle_record = cursor.fetchall()
    print "---------------------------------------------------------------------------------------------------"
    print "Date:",time.strftime("%d/%m/%Y")
    print "idle record list: ", idle_record
    print "---------------------------------------------------------------------------------------------------"
    for pid in idle_record:
        try:
#            print "process details",pid
#            os.system("kill -9 %s" % (int(pid[0]), ))
            os.kill(int(pid[0]), signal.SIGKILL)
        except OSError as ex:
            continue

get_pid()

I will move this to run not every hour and see the reaction.

Is a easy move, about Tim, our current KVM server is good for me, see picture please:


free
             total       used       free     shared    buffers     cached
Mem:     181764228  136200312   45563916        468      69904     734652
-/+ buffers/cache:  135395756   46368472
Swap:       261948          0     261948

I got other vm but they are on other raid setup.

Tim u mention that u recommend reduce memory pressure, u mean to lower down my values like shared_buffers or increase memory?

try to decrease lifetime of odoo sessions - then memory will be returned back to system - set limit_memory_soft less in odoo config - I found some manuals on net with wrong settings on net.

the odoo sessions should be refreshed more often.

Regards

Pavel

 


Melvin I try that value before but my server cry, I will add more memory in a few weeks.

Any comment I will appreciated, thanks.

On Mon, Oct 10, 2016 at 11:22 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Periko Support <pheriko.support@gmail.com> writes:
> My current server has 82GB memory.

You said this was running inside a VM, though --- maybe the VM is
resource-constrained?

In any case, turning off memory overcommit would be a good idea if
you're not concerned about running anything but Postgres.

                        regards, tom lane




Вложения

Re: psql 9.3 automatic recovery in progress

От
Periko Support
Дата:
Just to inform u that my log Yesterday didn't show any 'recovery' which is the main goal of this post.

Looks like the script was causing the issue, I test and change it to the one from Scott I'.

On odoo I switch user sessions from default value 1 week(:-/) to 1 hour.

I found that option in the file: /opt/openerp/web/addons/web/http.py
google help.

def session_gc(session_store):
    if random.random() < 0.001:
        # we keep session one week
        #last_week = time.time() - 60*60*24*7 <===== 1 week, don't know why they chose this value.
        last_week = time.time() - 3600   <====change it to 1 hour.
        for fname in os.listdir(session_store.path):
            path = os.path.join(session_store.path, fname)
            try:
                if os.path.getmtime(path) < last_week:
                    os.unlink(path)
            except OSError:
                pass

Other thing, my server was using swap memory+5MB, checking logs and system, the task that was causing my server to eat swap, not much but don't want to use this, the backup process at night is the one that
force the system to use swap, I increase my memory 20GB+ and today I just see swap =0 :-).

I will continue monitoring my system, I still have other goals to finish, but this error was a big one.

Thanks all for your help and contribution.

On Mon, Oct 10, 2016 at 6:50 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2016-10-10 21:43 GMT+02:00 Periko Support <pheriko.support@gmail.com>:
For the life time in odoo session, can u point me where I can manage that setting?

The configuration /etc/openerp-server.conf doesn't have any parameter for that.

That must be in a odoo file...?
Regards

Pavel

 
Thanks.

On Mon, Oct 10, 2016 at 12:25 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:


2016-10-10 21:12 GMT+02:00 Periko Support <pheriko.support@gmail.com>:
Andreo u got a good observation here.

I got a script that run every hour why?

Odoo got some issues with IDLE connections, if we don't check our current psql connections after a while the system eat all connections and a lot of them are IDLE and stop answering users, we create a script that runs every hour, this is:

""" Script is used to kill database connection which are idle from last 15 minutes """
#!/usr/bin/env python
import psycopg2
import sys
import os
from os.path import join, expanduser
import subprocess, signal, psutil
import time

def get_conn():

    conn_string = "host='localhost' dbname='template1' user='openerp' password='s$p_p@r70'"

    try:
        # get a connection, if a connect cannot be made an exception will be raised here
        conn = psycopg2.connect(conn_string)
        cursor = conn.cursor()
#        print "successful Connection"
        return cursor
    except:
        exceptionType, exceptionValue, exceptionTraceback = sys.exc_info()
        sys.exit("Database connection failed!\n ->%s" % (exceptionValue))


def get_pid():

    SQL="select pid, datname, usename from pg_stat_activity where usename = 'openerp' AND query_start < current_timestamp - INTERVAL '15' MINUTE;"
    cursor = get_conn()
    cursor.execute(SQL)
    idle_record = cursor.fetchall()
    print "---------------------------------------------------------------------------------------------------"
    print "Date:",time.strftime("%d/%m/%Y")
    print "idle record list: ", idle_record
    print "---------------------------------------------------------------------------------------------------"
    for pid in idle_record:
        try:
#            print "process details",pid
#            os.system("kill -9 %s" % (int(pid[0]), ))
            os.kill(int(pid[0]), signal.SIGKILL)
        except OSError as ex:
            continue

get_pid()

I will move this to run not every hour and see the reaction.

Is a easy move, about Tim, our current KVM server is good for me, see picture please:


free
             total       used       free     shared    buffers     cached
Mem:     181764228  136200312   45563916        468      69904     734652
-/+ buffers/cache:  135395756   46368472
Swap:       261948          0     261948

I got other vm but they are on other raid setup.

Tim u mention that u recommend reduce memory pressure, u mean to lower down my values like shared_buffers or increase memory?

try to decrease lifetime of odoo sessions - then memory will be returned back to system - set limit_memory_soft less in odoo config - I found some manuals on net with wrong settings on net.

the odoo sessions should be refreshed more often.

Regards

Pavel

 


Melvin I try that value before but my server cry, I will add more memory in a few weeks.

Any comment I will appreciated, thanks.

On Mon, Oct 10, 2016 at 11:22 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Periko Support <pheriko.support@gmail.com> writes:
> My current server has 82GB memory.

You said this was running inside a VM, though --- maybe the VM is
resource-constrained?

In any case, turning off memory overcommit would be a good idea if
you're not concerned about running anything but Postgres.

                        regards, tom lane





Вложения