Re: Stalls on PGSemaphoreLock

От: Takashi Horikawa
Тема: Re: Stalls on PGSemaphoreLock
Дата: ,
Msg-id: 73FA3881462C614096F815F75628AFCD0192D27F@BPXM01GP.gisp.nec.co.jp
(см: обсуждение, исходный текст)
Ответ на: Stalls on PGSemaphoreLock  (Matthew Spilich)
Список: pgsql-performance

Скрыть дерево обсуждения

Stalls on PGSemaphoreLock  (Matthew Spilich, )
 Re: Stalls on PGSemaphoreLock  (Ray Stell, )
  RE : Stalls on PGSemaphoreLock  (Pavy Philippe, )
 Re: Stalls on PGSemaphoreLock  (Matthew Spilich, )
  Re: Stalls on PGSemaphoreLock  (Pavy Philippe, )
   Re: Stalls on PGSemaphoreLock  ("Gudmundsson Martin (mg)", )
   Re: Stalls on PGSemaphoreLock  (Matthew Spilich, )
    Re: Stalls on PGSemaphoreLock  (Matheus de Oliveira, )
 Re: Stalls on PGSemaphoreLock  (Emre Hasegeli, )
 Re: Stalls on PGSemaphoreLock  (Takashi Horikawa, )

On Mar 25, 2014, at 8:46 AM, Matthew Spilich wrote:
> Has any on the forum seen something similar?

I think I reported similar phenomenon in my SIGMOD 2013 paper (Latch-free
data structures for DBMS: design, implementation, and evaluation,
<http://dl.acm.org/citation.cfm?id=2463720>).

> ----- 47245 -----
> 0x00000037392eb197 in semop () from /lib64/libc.so.6
> #0  0x00000037392eb197 in semop () from /lib64/libc.so.6
> #1  0x00000000005e0c87 in PGSemaphoreLock ()
> #2  0x000000000061e3af in LWLockAcquire ()
> #3  0x000000000060aa0f in ReadBuffer_common ()
> #4  0x000000000060b2e4 in ReadBufferExtended ()
...

> ----- 47257 -----
> 0x00000037392eb197 in semop () from /lib64/libc.so.6
> #0  0x00000037392eb197 in semop () from /lib64/libc.so.6
> #1  0x00000000005e0c87 in PGSemaphoreLock ()
> #2  0x000000000061e3af in LWLockAcquire ()
> #3  0x000000000060aa0f in ReadBuffer_common ()
> #4  0x000000000060b2e4 in ReadBufferExtended ()
...

These stack trace results indicate that there was heavy contention of
LWLocks for buffers. What I observed is that, in a similar situation, there
was also heavy contention on spin locks that ensure mutual exclusion of
LWLock status data. Those contentions resulted in a sudden increase in CPU
utilization, which is consistent with the following description.
> At the time of the event, we see a spike in system CPU and load average,
but we do not see a corresponding spike in disk reads or writes which would
indicate IO load.

If the cause of the problem is the same as what I observed, a possible
instant countermeasure is increasing the value of 'NUM_BUFFER_PARTITIONS'
defined in src/include/storage/lwlock.h from 16 to, for example, 128 or 256,
and build the binary.
# Using latch-free buffer manager, proposed in my paper, would take long
time, since it is not unincorporated in the upstream.

--
Takashi Horikawa, Ph.D.,
Knowledge Discovery Research Laboratories,
NEC Corporation.


Вложения

В списке pgsql-performance по дате сообщения:

От: Alexey Vasiliev
Дата:
Сообщение: Why shared_buffers max is 8GB?
От: Ilya Kosmodemiansky
Дата:
Сообщение: Re: Why shared_buffers max is 8GB?