Обсуждение: semop hanging - Postgres 9.4.4
Hello, Since couple of days on my Postgres cluster i've seen few times a hanged process. It looks like one of postgres workers stuckon simple INSERT query. PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND 12173 postgres 20 0 1372M 345M run 166:18 24.47% 99.90% postgres: db01 db01 10.0.0.101(58713) INSERT It utilizes 100% of one CPU and after 1-2 hours it starts to utilize memory. After another 1-2 hours OOM-Killer have to takeaction. At the same time other postgres workers are properly processing queries. There's no error on syslog/dmesg. Strace: semop(58261738, {{12, -1, 0}}, 1) = 0 semop(58261738, {{12, -1, 0}}, 1) = 0 semop(58261738, {{12, -1, 0}}, 1) = 0 semop(58851580, {{0, 1, 0}}, 1) = 0 semop(58261738, {{12, -1, 0}}, 1) = 0 semop(58261738, {{12, -1, 0}}, 1) = 0 semop(58261738, {{12, -1, 0}}, 1) = 0 semop(57802972, {{1, 1, 0}}, 1) = 0 select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) semop(58261738, {{12, -1, 0}}, 1) = 0 semop(58261738, {{12, -1, 0}}, 1) = 0 semop(58261738, {{12, -1, 0}}, 1) = 0 Is this related to incorrect semaphores configuration? Here's my current configuration: sql01:~# ipcs -l ------ Shared Memory Limits -------- max number of segments = 4096 max seg size (kbytes) = 18014398509465599 max total shared memory (kbytes) = 18446744073642442748 min seg size (bytes) = 1 ------ Semaphore Limits -------- max number of arrays = 2048 max semaphores per array = 250 max semaphores system wide = 512000 max ops per semop call = 100 semaphore max value = 32767 ------ Messages Limits -------- max queues system wide = 32000 max size of message (bytes) = 8192 default max size of queue (bytes) = 16384 sql01:~# ipcs -u ------ Shared Memory Status -------- segments allocated 1 pages allocated 1 pages resident 1 pages swapped 0 Swap performance: 0 attempts 0 successes ------ Semaphore Status -------- used arrays = 257 allocated semaphores = 4369 ------ Messages Status -------- allocated queues = 0 used headers = 0 used space = 0 bytes Some values from postgresql.conf: max_connections = 4096 effective_cache_size = 12GB work_mem = 16MB shared_buffers = 1GB postgres=# select version(); version ------------------------------------------------------------------------------------------------------ PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit I would appreciate any help. Best Regards, Marcin
On Wed, Dec 16, 2015 at 1:04 AM, Michał Nowak <minowack@wp.pl> wrote:
Hello,
Since couple of days on my Postgres cluster i've seen few times a hanged process. It looks like one of postgres workers stuck on simple INSERT query.
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
12173 postgres 20 0 1372M 345M run 166:18 24.47% 99.90% postgres: db01 db01 10.0.0.101(58713) INSERT
It utilizes 100% of one CPU and after 1-2 hours it starts to utilize memory. After another 1-2 hours OOM-Killer have to take action. At the same time other postgres workers are properly processing queries.
There's no error on syslog/dmesg. Strace:
semop(58261738, {{12, -1, 0}}, 1) = 0
semop(58261738, {{12, -1, 0}}, 1) = 0
semop(58261738, {{12, -1, 0}}, 1) = 0
semop(58851580, {{0, 1, 0}}, 1) = 0
semop(58261738, {{12, -1, 0}}, 1) = 0
semop(58261738, {{12, -1, 0}}, 1) = 0
semop(58261738, {{12, -1, 0}}, 1) = 0
semop(57802972, {{1, 1, 0}}, 1) = 0
select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)
semop(58261738, {{12, -1, 0}}, 1) = 0
semop(58261738, {{12, -1, 0}}, 1) = 0
semop(58261738, {{12, -1, 0}}, 1) = 0
Is this related to incorrect semaphores configuration?
Here's my current configuration:
sql01:~# ipcs -l
------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 18014398509465599
max total shared memory (kbytes) = 18446744073642442748
min seg size (bytes) = 1
------ Semaphore Limits --------
max number of arrays = 2048
max semaphores per array = 250
max semaphores system wide = 512000
max ops per semop call = 100
semaphore max value = 32767
------ Messages Limits --------
max queues system wide = 32000
max size of message (bytes) = 8192
default max size of queue (bytes) = 16384
sql01:~# ipcs -u
------ Shared Memory Status --------
segments allocated 1
pages allocated 1
pages resident 1
pages swapped 0
Swap performance: 0 attempts 0 successes
------ Semaphore Status --------
used arrays = 257
allocated semaphores = 4369
------ Messages Status --------
allocated queues = 0
used headers = 0
used space = 0 bytes
Some values from postgresql.conf:
max_connections = 4096
effective_cache_size = 12GB
work_mem = 16MB
shared_buffers = 1GB
postgres=# select version();
version
------------------------------------------------------------------------------------------------------
PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit
I would appreciate any help.
Best Regards,
Marcin
--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Hi Miachal,
May I know what is the RAM of server and how much shmmax is configured ?
You can check if any Zombi or defunct process are running on the server.
I want to know if any maintenance activity performed same time?
Regards,
Sagar Jadhav
Database Administrator
Database Administrator
Hello, > There is no zombi nor defunct process. Now i have 3 stucked processes. When i try to pg_terminate_backend nothing happens: strace: semop(57802972, {{9, -1, 0}}, 1) = 0 semop(57802972, {{9, -1, 0}}, 1) = 0 select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) semop(57802972, {{9, -1, 0}}, 1) = 0 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=4431, si_uid=105} --- rt_sigreturn() = 139718030872480 select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) semop(57802972, {{9, -1, 0}}, 1) = 0 semop(57802972, {{9, -1, 0}}, 1) = 0 select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout) Best Regards, Michal
On Tue, Dec 15, 2015 at 1:34 PM, Michał Nowak <minowack@wp.pl> wrote: > postgres=# select version(); > version > ------------------------------------------------------------------------------------------------------ > PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit There is a bug fix in 9.4.5 which can cause a deadlock on lightweight locking. http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=bab959906911c97437f410a03b0346e6dd28d528 http://www.postgresql.org/support/versioning/ From a subsequent post: > perf top: > > 17.04% postgres [.] _bt_moveright > 13.39% postgres [.] LWLockAcquire > 8.82% postgres [.] LWLockRelease > 8.11% postgres [.] _bt_checkpage The perf top output suggests the possibility of a corrupted index, causing an endless loop. Can you get a few stack traces from "stuck" backends? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello, Thanks for all replies. > The perf top output suggests the possibility of a corrupted index, > causing an endless loop. Yes! Rebuilding indexes solves problem for this table. :) But after 20h problem is back on another table. First table has about 2m records, second has only 200k. > There is a bug fix in 9.4.5 which can cause a deadlock on > lightweight locking. I've just upgraded to 9.4.5 and hope that helps. I'll keep you posted. Best Regards, Michal
Hello, > I've just upgraded to 9.4.5 and hope that helps. I'll keep you posted. After upgrade everything works fine for almost 4 days. Thank you all for help. Best Regards, Michal