Re: Is the unfair lwlock behavior intended?

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: Is the unfair lwlock behavior intended?
Дата
Msg-id CAPpHfdtOCPvvL_irrES+M5YZ+jZR8bUSQ7cz39ObjEuOaDsgsw@mail.gmail.com
обсуждение исходный текст
Ответ на Is the unfair lwlock behavior intended?  ("Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com>)
Список pgsql-hackers
Hi!

On Tue, May 24, 2016 at 9:03 AM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:
I encountered a strange behavior of lightweight lock in PostgreSQL 9.2.  That appears to apply to 9.6, too, as far as I examine the code.  Could you tell me if the behavior is intended or needs fix?

Simply put, the unfair behavior is that waiters for exclusive mode are overtaken by share-mode lockers who arrive later.


PROBLEM
====================

Under a heavy read/write workload on a big machine with dozens of CPUs and hundreds of GBs of RAM, psql sometimes took more than 30 seconds to connect to the database (and actually, it failed to connect due to our connect_timeout setting.)  The backend corresponding to the psql was waiting to acquire exclusive mode lock on ProcArrayLock.  Some other backends took more than 10 seconds to commit their transactions, waiting for exclusive mode lock on ProcArrayLock.

At that time, many backend processes (I forgot the number) were acquiring and releasing share mode lock on ProcArrayLock, most of which were from TransactionIsInProgress().


CAUSE
====================

Going into the 9.2 code, I realized that those who request share mode don't pay attention to the wait queue.  That is, if some processes hold share mode lock and someone is waiting for exclusive mode in the wait queue, other processes who come later can get share mode overtaking those who are already waiting.  If many processes repeatedly request share mode, the waiters can't get exclusive mode for a long time.

Is this intentional, or should we make the later share-lockers if someone is in the wait queue?

I've already observed such behavior, see [1].  I think that now there is no consensus on how to fix that.  For instance, Andres express opinion that this shouldn't be fixed from LWLock side [2].
FYI, I'm planning to pickup work on CSN patch [3] for 10.0.  CSN should fix various scalability issues including high ProcArrayLock contention.

References.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Tsunakawa, Takayuki"
Дата:
Сообщение: Is the unfair lwlock behavior intended?
Следующее
От: Magnus Hagander
Дата:
Сообщение: pg_dump -j against standbys