Обсуждение: semop hanging - Postgres 9.4.4

Поиск
Список
Период
Сортировка

semop hanging - Postgres 9.4.4

От
Michał Nowak
Дата:
Hello,

Since couple of days on my Postgres cluster i've seen few times a hanged process. It looks like one of postgres workers
stuckon simple INSERT query. 

  PID USERNAME PRI NICE  SIZE   RES STATE   TIME   WCPU    CPU COMMAND
12173 postgres  20    0 1372M  345M run   166:18 24.47% 99.90% postgres: db01 db01 10.0.0.101(58713) INSERT


It utilizes 100% of one CPU and after 1-2 hours it starts to utilize memory. After another 1-2 hours OOM-Killer have to
takeaction. At the same time other postgres workers are properly processing queries.  

There's no error on syslog/dmesg. Strace:


semop(58261738, {{12, -1, 0}}, 1)       = 0
semop(58261738, {{12, -1, 0}}, 1)       = 0
semop(58261738, {{12, -1, 0}}, 1)       = 0
semop(58851580, {{0, 1, 0}}, 1)         = 0
semop(58261738, {{12, -1, 0}}, 1)       = 0
semop(58261738, {{12, -1, 0}}, 1)       = 0
semop(58261738, {{12, -1, 0}}, 1)       = 0
semop(57802972, {{1, 1, 0}}, 1)         = 0
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
semop(58261738, {{12, -1, 0}}, 1)       = 0
semop(58261738, {{12, -1, 0}}, 1)       = 0
semop(58261738, {{12, -1, 0}}, 1)       = 0


Is this related to incorrect semaphores configuration?



Here's my current configuration:

sql01:~# ipcs -l

------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 18014398509465599
max total shared memory (kbytes) = 18446744073642442748
min seg size (bytes) = 1

------ Semaphore Limits --------
max number of arrays = 2048
max semaphores per array = 250
max semaphores system wide = 512000
max ops per semop call = 100
semaphore max value = 32767

------ Messages Limits --------
max queues system wide = 32000
max size of message (bytes) = 8192
default max size of queue (bytes) = 16384


sql01:~# ipcs -u

------ Shared Memory Status --------
segments allocated 1
pages allocated 1
pages resident  1
pages swapped   0
Swap performance: 0 attempts     0 successes

------ Semaphore Status --------
used arrays = 257
allocated semaphores = 4369

------ Messages Status --------
allocated queues = 0
used headers = 0
used space = 0 bytes


Some values from postgresql.conf:
max_connections = 4096
effective_cache_size = 12GB
work_mem = 16MB
shared_buffers = 1GB


postgres=# select version();
                                               version
------------------------------------------------------------------------------------------------------
 PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit


I would appreciate any help.

Best Regards,

Marcin





Re: semop hanging - Postgres 9.4.4

От
Shreeyansh Dba
Дата:

On Wed, Dec 16, 2015 at 1:04 AM, Michał Nowak <minowack@wp.pl> wrote:
Hello,

Since couple of days on my Postgres cluster i've seen few times a hanged process. It looks like one of postgres workers stuck on simple INSERT query.

  PID USERNAME PRI NICE  SIZE   RES STATE   TIME   WCPU    CPU COMMAND
12173 postgres  20    0 1372M  345M run   166:18 24.47% 99.90% postgres: db01 db01 10.0.0.101(58713) INSERT


It utilizes 100% of one CPU and after 1-2 hours it starts to utilize memory. After another 1-2 hours OOM-Killer have to take action. At the same time other postgres workers are properly processing queries.

There's no error on syslog/dmesg. Strace:


semop(58261738, {{12, -1, 0}}, 1)       = 0
semop(58261738, {{12, -1, 0}}, 1)       = 0
semop(58261738, {{12, -1, 0}}, 1)       = 0
semop(58851580, {{0, 1, 0}}, 1)         = 0
semop(58261738, {{12, -1, 0}}, 1)       = 0
semop(58261738, {{12, -1, 0}}, 1)       = 0
semop(58261738, {{12, -1, 0}}, 1)       = 0
semop(57802972, {{1, 1, 0}}, 1)         = 0
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
semop(58261738, {{12, -1, 0}}, 1)       = 0
semop(58261738, {{12, -1, 0}}, 1)       = 0
semop(58261738, {{12, -1, 0}}, 1)       = 0


Is this related to incorrect semaphores configuration?



Here's my current configuration:

sql01:~# ipcs -l

------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 18014398509465599
max total shared memory (kbytes) = 18446744073642442748
min seg size (bytes) = 1

------ Semaphore Limits --------
max number of arrays = 2048
max semaphores per array = 250
max semaphores system wide = 512000
max ops per semop call = 100
semaphore max value = 32767

------ Messages Limits --------
max queues system wide = 32000
max size of message (bytes) = 8192
default max size of queue (bytes) = 16384


sql01:~# ipcs -u

------ Shared Memory Status --------
segments allocated 1
pages allocated 1
pages resident  1
pages swapped   0
Swap performance: 0 attempts     0 successes

------ Semaphore Status --------
used arrays = 257
allocated semaphores = 4369

------ Messages Status --------
allocated queues = 0
used headers = 0
used space = 0 bytes


Some values from postgresql.conf:
max_connections = 4096
effective_cache_size = 12GB
work_mem = 16MB
shared_buffers = 1GB


postgres=# select version();
                                               version
------------------------------------------------------------------------------------------------------
 PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit


I would appreciate any help.

Best Regards,

Marcin





--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin



Hi Miachal,

May I know what is the RAM of server and how much shmmax is configured ?
You can check if any  Zombi or defunct process are running on the server.
I want to know if any maintenance activity performed same time?

Regards,
Sagar Jadhav
Database Administrator

Re: semop hanging - Postgres 9.4.4

От
Michał Nowak
Дата:
Hello,

> There is no zombi nor defunct process. Now i have 3 stucked processes.

When i try to pg_terminate_backend nothing happens:

strace:

semop(57802972, {{9, -1, 0}}, 1)        = 0
semop(57802972, {{9, -1, 0}}, 1)        = 0
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
semop(57802972, {{9, -1, 0}}, 1)        = 0
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=4431, si_uid=105} ---
rt_sigreturn()                          = 139718030872480
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
semop(57802972, {{9, -1, 0}}, 1)        = 0
semop(57802972, {{9, -1, 0}}, 1)        = 0
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000})  = 0 (Timeout)


Best Regards,
Michal




Re: semop hanging - Postgres 9.4.4

От
Kevin Grittner
Дата:
On Tue, Dec 15, 2015 at 1:34 PM, Michał Nowak <minowack@wp.pl> wrote:

> postgres=# select version();
>                                                version
> ------------------------------------------------------------------------------------------------------
>  PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit

There is a bug fix in 9.4.5 which can cause a deadlock on
lightweight locking.

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=bab959906911c97437f410a03b0346e6dd28d528

http://www.postgresql.org/support/versioning/

From a subsequent post:

> perf top:
>
>   17.04%  postgres                 [.] _bt_moveright
>   13.39%  postgres                 [.] LWLockAcquire
 >   8.82%  postgres                 [.] LWLockRelease
 >   8.11%  postgres                 [.] _bt_checkpage

The perf top output suggests the possibility of a corrupted index,
causing an endless loop.

Can you get a few stack traces from "stuck" backends?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: semop hanging - Postgres 9.4.4

От
Michał Nowak
Дата:
Hello,

Thanks for all replies.


> The perf top output suggests the possibility of a corrupted index,
> causing an endless loop.


Yes! Rebuilding indexes solves problem for this table. :)

But after 20h problem is back on another table. First table has about 2m records, second has only 200k.



> There is a bug fix in 9.4.5 which can cause a deadlock on
> lightweight locking.


I've just upgraded to 9.4.5 and hope that helps. I'll keep you posted.


Best Regards,
Michal




Re: semop hanging - Postgres 9.4.4

От
Michał Nowak
Дата:
Hello,

> I've just upgraded to 9.4.5 and hope that helps. I'll keep you posted.

After upgrade everything works fine for almost 4 days.

Thank you all for help.


Best Regards,
Michal