Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
От | Masahiko Sawada |
---|---|
Тема | Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager |
Дата | |
Msg-id | CAD21AoA7rvsxLuWD47m7647G6ie+SDpJY0kHeNqv+w1dnV1bzw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] Moving relation extension locks out of heavyweight lockmanager (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>) |
Ответы |
Re: [HACKERS] Moving relation extension locks out of heavyweight lockmanager
(Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
|
Список | pgsql-hackers |
On Mon, Jun 4, 2018 at 10:47 PM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote: > > > On 26.04.2018 09:10, Masahiko Sawada wrote: >> >> On Thu, Apr 26, 2018 at 3:30 AM, Robert Haas <robertmhaas@gmail.com> >> wrote: >>> >>> On Tue, Apr 10, 2018 at 9:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> >>> wrote: >>>> >>>> Never mind. There was a lot of items especially at the last CommitFest. >>>> >>>>> In terms of moving forward, I'd still like to hear what >>>>> Andres has to say about the comments I made on March 1st. >>>> >>>> Yeah, agreed. >>> >>> $ ping -n andres.freund >>> Request timeout for icmp_seq 0 >>> Request timeout for icmp_seq 1 >>> Request timeout for icmp_seq 2 >>> Request timeout for icmp_seq 3 >>> Request timeout for icmp_seq 4 >>> ^C >>> --- andres.freund ping statistics --- >>> 6 packets transmitted, 0 packets received, 100.0% packet loss >>> >>> Meanwhile, >>> https://www.postgresql.org/message-id/4c171ffe-e3ee-acc5-9066-a40d52bc5ae9@postgrespro.ru >>> shows that this patch has some benefits for other cases, which is a >>> point in favor IMHO. >> >> Thank you for sharing. That's good to know. >> >> Andres pointed out the performance degradation due to hash collision >> when multiple loading. I think the point is that it happens at where >> users don't know. Therefore even if we make N_RELEXTLOCK_ENTS >> configurable parameter, since users don't know the hash collision they >> don't know when they should tune it. >> >> So it's just an idea but how about adding an SQL-callable function >> that returns the estimated number of lock waiters of the given >> relation? Since user knows how many processes are loading to the >> relation, if a returned value by the function is greater than the >> expected value user can know hash collision and will be able to start >> to consider to increase N_RELEXTLOCK_ENTS. >> >> Regards, >> >> -- >> Masahiko Sawada >> NIPPON TELEGRAPH AND TELEPHONE CORPORATION >> NTT Open Source Software Center >> > We in PostgresProc were faced with lock extension contention problem at two > more customers and tried to use this patch (v13) to address this issue. > Unfortunately replacing heavy lock with lwlock couldn't completely eliminate > contention, now most of backends are blocked on conditional variable: > > 0x00007fb03a318903 in __epoll_wait_nocancel () from /lib64/libc.so.6 > #0 0x00007fb03a318903 in __epoll_wait_nocancel () from /lib64/libc.so.6 > #1 0x00000000007024ee in WaitEventSetWait () > #2 0x0000000000718fa6 in ConditionVariableSleep () > #3 0x000000000071954d in RelExtLockAcquire () > #4 0x00000000004ba99d in RelationGetBufferForTuple () > #5 0x00000000004b3f18 in heap_insert () > #6 0x00000000006109c8 in ExecInsert () > #7 0x0000000000611a49 in ExecModifyTable () > #8 0x00000000005ef97a in standard_ExecutorRun () > #9 0x000000000072440a in ProcessQuery () > #10 0x0000000000724631 in PortalRunMulti () > #11 0x00000000007250ec in PortalRun () > #12 0x0000000000721287 in exec_simple_query () > #13 0x0000000000722532 in PostgresMain () > #14 0x000000000047a9eb in ServerLoop () > #15 0x00000000006b9fe9 in PostmasterMain () > #16 0x000000000047b431 in main () > > Obviously there is nothing surprising here: if a lot of processes try to > acquire the same exclusive lock, then high contention is expected. > I just want to notice that this patch is not able to completely eliminate > the problem with large number of concurrent inserts to the same table. > > Second problem we observed was even more critical: if backed is granted > relation extension lock and then got some error before releasing this lock, > then abort of the current transaction doesn't release this lock (unlike > heavy weight lock) and the relation is kept locked. > So database is actually stalled and server has to be restarted. > Thank you for reporting. Regarding the second problem, I tried to reproduce that bug with latest version patch (v13) but could not. When transaction aborts, we call ReousrceOwnerRelease()->ResourceOwnerReleaseInternal()->ProcReleaseLocks()->RelExtLockCleanup() and clear either relext lock bits we are holding or waiting. If we raise an error after we added a relext lock bit but before we increment its holding count then the relext lock is remained, but I couldn't see the code raises an error between them. Could you please share the concrete reproduction steps of the cause of database stalled if possible? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Michael PaquierДата:
Сообщение: Re: pg_replication_slot_advance to return NULL instead of 0/0 ifslot not advanced
Следующее
От: Michael PaquierДата:
Сообщение: Re: pg_replication_slot_advance to return NULL instead of 0/0 ifslot not advanced