Re: stuck spin lock with many concurrent users

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: stuck spin lock with many concurrent users
Дата
Msg-id 25284.994186865@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: stuck spin lock with many concurrent users  (Tatsuo Ishii <t-ishii@sra.co.jp>)
Список pgsql-hackers
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> I added some codes into HandleDeadLock to measure how long
> LockLockTable and DeadLOckCheck calls take. Followings are the result
> in running pgbench -c 1000 (it failed with stuck spin lock
> error). "real time" shows how long they actually run (using
> gettimeofday). "user time" and "system time" are measured by calling
> getrusage. The time unit is milli second.

>  LockLockTable: real time

>  min |  max   |        avg        
> -----+--------+-------------------
>    0 | 867873 | 152874.9015151515

>  LockLockTable: user time

>  min | max |     avg      
> -----+-----+--------------
>    0 |  30 | 1.2121212121

>  LockLockTable: system time

>  min | max  |      avg       
> -----+------+----------------
>    0 | 2140 | 366.5909090909


>  DeadLockCheck: real time

>  min |  max  |       avg       
> -----+-------+-----------------
>    0 | 87671 | 3463.6996197719

>  DeadLockCheck: user time

>  min | max |      avg      
> -----+-----+---------------
>    0 | 330 | 14.2205323194

>  DeadLockCheck: system time

>  min | max |     avg      
> -----+-----+--------------
>    0 | 100 | 2.5095057034

Hm.  It doesn't seem that DeadLockCheck is taking very much of the time.
I have to suppose that the problem is (once again) our inefficient
spinlock code.

If you think about it, on a typical platform where processes waiting for
a time delay are released at a clock tick, what's going to be happening
is that a whole lot of spinblocked processes will all be awoken in the
same clock tick interrupt.  The first one of these that gets to run will
acquire the spinlock, if it's free, and the rest will go back to sleep
and try again at the next tick.  This could be highly unfair depending
on just how the kernel's scheduler works --- for example, one could
easily believe that the waiters might be awoken in process-number order,
in which case backends with high process numbers might never get to
acquire the spinlock, or at least would have such low probability of
winning that they are prone to "stuck spinlock" timeout.

We really need to look at replacing the spinlock mechanism with
something more efficient.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: selecting from cursor
Следующее
От: Ilan Fait
Дата:
Сообщение: how to monitor/examine the database