Re: [HACKERS] Moving relation extension locks out of heavyweight lockmanager

Поиск
Список
Период
Сортировка
От Konstantin Knizhnik
Тема Re: [HACKERS] Moving relation extension locks out of heavyweight lockmanager
Дата
Msg-id 00dae0d4-1d65-b0d5-355c-d27d757aa18d@postgrespro.ru
обсуждение исходный текст
Ответ на Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager  (Masahiko Sawada <sawada.mshk@gmail.com>)
Ответы Re: [HACKERS] Moving relation extension locks out of heavyweightlock manager  (Andres Freund <andres@anarazel.de>)
Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager  (Masahiko Sawada <sawada.mshk@gmail.com>)
Список pgsql-hackers

On 26.04.2018 09:10, Masahiko Sawada wrote:
> On Thu, Apr 26, 2018 at 3:30 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Tue, Apr 10, 2018 at 9:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> Never mind. There was a lot of items especially at the last CommitFest.
>>>
>>>> In terms of moving forward, I'd still like to hear what
>>>> Andres has to say about the comments I made on March 1st.
>>> Yeah, agreed.
>> $ ping -n andres.freund
>> Request timeout for icmp_seq 0
>> Request timeout for icmp_seq 1
>> Request timeout for icmp_seq 2
>> Request timeout for icmp_seq 3
>> Request timeout for icmp_seq 4
>> ^C
>> --- andres.freund ping statistics ---
>> 6 packets transmitted, 0 packets received, 100.0% packet loss
>>
>> Meanwhile, https://www.postgresql.org/message-id/4c171ffe-e3ee-acc5-9066-a40d52bc5ae9@postgrespro.ru
>> shows that this patch has some benefits for other cases, which is a
>> point in favor IMHO.
> Thank you for sharing. That's good to know.
>
> Andres pointed out the performance degradation due to hash collision
> when multiple loading. I think the point is that it happens at where
> users don't know.  Therefore even if we make N_RELEXTLOCK_ENTS
> configurable parameter, since users don't know the hash collision they
> don't know when they should tune it.
>
> So it's just an idea but how about adding an SQL-callable function
> that returns the estimated number of lock waiters of the given
> relation? Since user knows how many processes are loading to the
> relation, if a returned value by the function is greater than the
> expected value user  can know hash collision and will be able to start
> to consider to increase N_RELEXTLOCK_ENTS.
>
> Regards,
>
> --
> Masahiko Sawada
> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> NTT Open Source Software Center
>
We in PostgresProc were faced with lock extension contention problem at 
two more customers and tried to use this patch (v13) to address this issue.
Unfortunately replacing heavy lock with lwlock couldn't completely 
eliminate contention, now most of backends are blocked on conditional 
variable:

0x00007fb03a318903 in __epoll_wait_nocancel () from /lib64/libc.so.6
#0  0x00007fb03a318903 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1  0x00000000007024ee in WaitEventSetWait ()
#2  0x0000000000718fa6 in ConditionVariableSleep ()
#3  0x000000000071954d in RelExtLockAcquire ()
#4  0x00000000004ba99d in RelationGetBufferForTuple ()
#5  0x00000000004b3f18 in heap_insert ()
#6  0x00000000006109c8 in ExecInsert ()
#7  0x0000000000611a49 in ExecModifyTable ()
#8  0x00000000005ef97a in standard_ExecutorRun ()
#9  0x000000000072440a in ProcessQuery ()
#10 0x0000000000724631 in PortalRunMulti ()
#11 0x00000000007250ec in PortalRun ()
#12 0x0000000000721287 in exec_simple_query ()
#13 0x0000000000722532 in PostgresMain ()
#14 0x000000000047a9eb in ServerLoop ()
#15 0x00000000006b9fe9 in PostmasterMain ()
#16 0x000000000047b431 in main ()

Obviously there is nothing surprising here: if a lot of processes try to 
acquire the same exclusive lock, then high contention is expected.
I just want to notice that this patch is not able to completely 
eliminate the problem with large number of concurrent inserts to the 
same table.

Second problem we observed was even more critical: if backed is granted 
relation extension lock and then got some error before releasing this lock,
then abort of the current transaction doesn't release this lock (unlike 
heavy weight lock) and the relation is kept locked.
So database is actually stalled and server has to be restarted.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ashutosh Bapat
Дата:
Сообщение: Re: Performance regression with PostgreSQL 11 and partitioning
Следующее
От: Teodor Sigaev
Дата:
Сообщение: Re: \d t: ERROR: XX000: cache lookup failed for relation