Re: LWLock deadlock and gdb advice

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: LWLock deadlock and gdb advice
Дата
Msg-id CAMkU=1wS8R7rz67jYEZoc4P_rXp0frbZerdP1gcCbE=w8fpMvw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: LWLock deadlock and gdb advice  (Andres Freund <andres@anarazel.de>)
Ответы Re: LWLock deadlock and gdb advice  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Thu, Jul 30, 2015 at 8:22 PM, Andres Freund <andres@anarazel.de> wrote:
On 2015-07-30 09:03:01 -0700, Jeff Janes wrote:
> On Wed, Jul 29, 2015 at 6:10 AM, Andres Freund <andres@anarazel.de> wrote:
> > What do you think about something roughly like the attached?
> >
>
> I've not evaluated the code, but applying it does solve the problem I was
> seeing.

Cool, thanks for testing! How long did you run the test and how long did
it, on average, previously take for the issue to occur?

I had run it for 24 hours, while it usually took less than 8 hours to look up before.  I did see it within a few minutes one time when I checked out a new HEAD and then forgot to re-apply your or Heikki's patch.

But now I've got the same situation again, after 15 hours, with your patch.  This is probably all down to luck.  The only differences that I can think of is that I advanced the base to e8e86fbc8b3619da54c, and turned on JJ_vac and set log_autovacuum_min_duration=0.


checkpointer, 31516
(gdb) bt
#0  0x0000003dcb6eaf27 in semop () from /lib64/libc.so.6
#1  0x00000000006481df in PGSemaphoreLock (sema=0x7f84ccd3c5f8) at pg_sema.c:387
#2  0x00000000006a5754 in LWLockAcquireCommon (l=0x7f84c42edf80, valptr=0x7f84c42edfa8, val=18446744073709551615) at lwlock.c:1041
#3  LWLockAcquireWithVar (l=0x7f84c42edf80, valptr=0x7f84c42edfa8, val=18446744073709551615) at lwlock.c:915
#4  0x00000000004cbd86 in WALInsertLockAcquireExclusive () at xlog.c:1445
#5  0x00000000004d4195 in CreateCheckPoint (flags=64) at xlog.c:8181
#6  0x000000000064ea53 in CheckpointerMain () at checkpointer.c:493
#7  0x00000000004e328a in AuxiliaryProcessMain (argc=2, argv=0x7fff6eed1f50) at bootstrap.c:428
#8  0x00000000006563f7 in StartChildProcess (type=CheckpointerProcess) at postmaster.c:5167
#9  0x000000000065884a in reaper (postgres_signal_arg=<value optimized out>) at postmaster.c:2670

(gdb) f 2
(gdb) p *(lock->owner)
$1 = {links = {prev = 0x0, next = 0x0}, procgloballist = 0x7f84ccd27c88, sem = {semId = 1480359975, semNum = 12}, waitStatus = 0, procLatch = {is_set = 1,
    is_shared = 1 '\001', owner_pid = 31540}, lxid = 3209, pid = 31540, pgprocno = 92, backendId = 12, databaseId = 16384, roleId = 10,
  recoveryConflictPending = 0 '\000', lwWaiting = 1 '\001', lwWaitMode = 2 '\002', lwWaitLink = {prev = 0x7f84c42edf10, next = 0x7f84ccd38e60},
  waitLock = 0x0, waitProcLock = 0x0, waitLockMode = 0, heldLocks = 0, waitLSN = 0, syncRepState = 0, syncRepLinks = {prev = 0x0, next = 0x0},
  myProcLocks = {{prev = 0x7f84ccd38bb8, next = 0x7f84ccd38bb8}, {prev = 0x7f84ccd38bc8, next = 0x7f84ccd38bc8}, {prev = 0x7f84ccd38bd8,
      next = 0x7f84ccd38bd8}, {prev = 0x7f84ccd38be8, next = 0x7f84ccd38be8}, {prev = 0x7f84ccd38bf8, next = 0x7f84ccd38bf8}, {prev = 0x7f84ccd38c08,
      next = 0x7f84ccd38c08}, {prev = 0x7f84ccd38c18, next = 0x7f84ccd38c18}, {prev = 0x7f84ccd38c28, next = 0x7f84ccd38c28}, {prev = 0x7f84ccd38c38,
      next = 0x7f84ccd38c38}, {prev = 0x7f84ccd38c48, next = 0x7f84ccd38c48}, {prev = 0x7f84ccd38c58, next = 0x7f84ccd38c58}, {prev = 0x7f84ccd38c68,
      next = 0x7f84ccd38c68}, {prev = 0x7f84ccd38c78, next = 0x7f84ccd38c78}, {prev = 0x7f84ccd38c88, next = 0x7f84ccd38c88}, {prev = 0x7f84ccb0ceb0,
      next = 0x7f84ccb0ceb0}, {prev = 0x7f84ccd38ca8, next = 0x7f84ccd38ca8}}, subxids = {xids = {0 <repeats 64 times>}}, backendLock = 0x7f84c42e9800,
  fpLockBits = 160528697655296, fpRelId = {0, 0, 0, 0, 0, 0, 0, 0, 0, 2679, 2610, 16413, 16411, 16415, 16414, 16408}, fpVXIDLock = 1 '\001',
  fpLocalTransactionId = 3209}


backend 31540:

#0  0x0000003dcb6eaf27 in semop () from /lib64/libc.so.6
#1  0x00000000006481df in PGSemaphoreLock (sema=0x7f84ccd38b38) at pg_sema.c:387
#2  0x00000000006a53d4 in LWLockWaitForVar (lock=0x7f84c42edf00, valptr=0x7f84c42edf28, oldval=0, newval=0x7fff6eeceaf8) at lwlock.c:1438
#3  0x00000000004d3276 in WaitXLogInsertionsToFinish (upto=183341449216) at xlog.c:1576
#4  0x00000000004d3b33 in AdvanceXLInsertBuffer (upto=183345636352, opportunistic=<value optimized out>) at xlog.c:1888
#5  0x00000000004d8df2 in GetXLogBuffer (ptr=183345636352) at xlog.c:1669
#6  0x00000000004d8fea in CopyXLogRecordToWAL (rdata=0xbd2410, fpw_lsn=<value optimized out>) at xlog.c:1276
#7  XLogInsertRecord (rdata=0xbd2410, fpw_lsn=<value optimized out>) at xlog.c:1008
#8  0x00000000004dc72c in XLogInsert (rmid=13 '\r', info=32 ' ') at xloginsert.c:453
#9  0x0000000000478f41 in ginPlaceToPage (btree=0x7fff6eecee20, stack=0x1be11e0, insertdata=<value optimized out>, updateblkno=<value optimized out>,
    childbuf=0, buildStats=0x0) at ginbtree.c:418
#10 0x00000000004798a3 in ginInsertValue (btree=0x7fff6eecee20, stack=0x1be11e0, insertdata=0x7fff6eeceea0, buildStats=0x0) at ginbtree.c:748
#11 0x00000000004734aa in ginEntryInsert (ginstate=0x7fff6eecf200, attnum=39408, key=1, category=<value optimized out>, items=0x1c81c20, nitem=54,
    buildStats=0x0) at gininsert.c:234
#12 0x000000000047edbb in ginInsertCleanup (ginstate=0x7fff6eecf200, vac_delay=<value optimized out>, stats=0x0) at ginfast.c:843
#13 0x000000000047fae6 in ginHeapTupleFastInsert (ginstate=0x7fff6eecf200, collector=<value optimized out>) at ginfast.c:436
#14 0x000000000047386a in gininsert (fcinfo=<value optimized out>) at gininsert.c:531
...

(gdb) f 2
(gdb) p *(lock->owner)
$1 = {links = {prev = 0x0, next = 0x0}, procgloballist = 0x0, sem = {semId = 1480425513, semNum = 0}, waitStatus = 0, procLatch = {is_set = 0,
    is_shared = 1 '\001', owner_pid = 31516}, lxid = 0, pid = 31516, pgprocno = 112, backendId = -1, databaseId = 0, roleId = 0,
  recoveryConflictPending = 0 '\000', lwWaiting = 1 '\001', lwWaitMode = 0 '\000', lwWaitLink = {prev = 0x7f84ccd3cc10, next = 0x7f84ccd3a5e0},
  waitLock = 0x0, waitProcLock = 0x0, waitLockMode = 0, heldLocks = 0, waitLSN = 0, syncRepState = 0, syncRepLinks = {prev = 0x0, next = 0x0},
  myProcLocks = {{prev = 0x7f84ccd3c678, next = 0x7f84ccd3c678}, {prev = 0x7f84ccd3c688, next = 0x7f84ccd3c688}, {prev = 0x7f84ccd3c698,
      next = 0x7f84ccd3c698}, {prev = 0x7f84ccd3c6a8, next = 0x7f84ccd3c6a8}, {prev = 0x7f84ccd3c6b8, next = 0x7f84ccd3c6b8}, {prev = 0x7f84ccd3c6c8,
      next = 0x7f84ccd3c6c8}, {prev = 0x7f84ccd3c6d8, next = 0x7f84ccd3c6d8}, {prev = 0x7f84ccd3c6e8, next = 0x7f84ccd3c6e8}, {prev = 0x7f84ccd3c6f8,
      next = 0x7f84ccd3c6f8}, {prev = 0x7f84ccd3c708, next = 0x7f84ccd3c708}, {prev = 0x7f84ccd3c718, next = 0x7f84ccd3c718}, {prev = 0x7f84ccd3c728,
      next = 0x7f84ccd3c728}, {prev = 0x7f84ccd3c738, next = 0x7f84ccd3c738}, {prev = 0x7f84ccd3c748, next = 0x7f84ccd3c748}, {prev = 0x7f84ccd3c758,
      next = 0x7f84ccd3c758}, {prev = 0x7f84ccd3c768, next = 0x7f84ccd3c768}}, subxids = {xids = {0 <repeats 64 times>}}, backendLock = 0x7f84c42e9d00,
  fpLockBits = 0, fpRelId = {0 <repeats 16 times>}, fpVXIDLock = 0 '\000', fpLocalTransactionId = 0}


Cheers,


Jeff

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jesper Pedersen
Дата:
Сообщение: RequestAddinLWLocks(int n)
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Re: Using quicksort and a merge step to significantly improve on tuplesort's single run "external sort"