Re: LWLock deadlock and gdb advice

Поиск
Список
Период
Сортировка
От Amit Langote
Тема Re: LWLock deadlock and gdb advice
Дата
Msg-id 55B5C23B.40906@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: LWLock deadlock and gdb advice  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-hackers
On 2015-07-16 PM 04:03, Jeff Janes wrote:
> On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> 
>>
>> Both. Here's the patch.
>>
>> Previously, LWLockAcquireWithVar set the variable associated with the lock
>> atomically with acquiring it. Before the lwlock-scalability changes, that
>> was straightforward because you held the spinlock anyway, but it's a lot
>> harder/expensive now. So I changed the way acquiring a lock with a variable
>> works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that
>> the current lock holder has updated the variable. The LWLockAcquireWithVar
>> function is gone - you now just use LWLockAcquire(), which always clears
>> the LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if
>> you want to set the variable immediately. LWLockWaitForVar() always waits
>> if the flag is not set, i.e. it will not return regardless of the
>> variable's value, if the current lock-holder has not updated it yet.
>>
>>
> I ran this for a while without casserts and it seems to work.  But with
> casserts, I get failures in the autovac process on the GIN index.
> 
> I don't see how this is related to the LWLock issue, but I didn't see it
> without your patch.  Perhaps the system just didn't survive long enough to
> uncover it without the patch (although it shows up pretty quickly).  It
> could just be an overzealous Assert, since the casserts off didn't show
> problems.
> 
> bt and bt full are shown below.
> 

I got a similar assert failure but with a btree index
(pg_attribute_relid_attnum_index). The backtrace looks like Jeff's:

(gdb) bt
#0  0x0000003969632625 in raise () from /lib64/libc.so.6
#1  0x0000003969633e05 in abort () from /lib64/libc.so.6
#2  0x000000000092eb9e in ExceptionalCondition (conditionName=0x9c2220
"!(((PageHeader) (page))->pd_special >= (__builtin_offsetof
(PageHeaderData, pd_linp)))",   errorType=0x9c0c41 "FailedAssertion", fileName=0x9c0c10 "nbtree.c",
lineNumber=903) at assert.c:54
#3  0x00000000004e02d8 in btvacuumpage (vstate=0x7fff2c7655f0, blkno=9,
orig_blkno=9) at nbtree.c:903
#4  0x00000000004e0067 in btvacuumscan (info=0x7fff2c765cd0,
stats=0x279f7d0, callback=0x668f6d <lazy_tid_reaped>,
callback_state=0x279e338, cycleid=49190)   at nbtree.c:821
#5  0x00000000004dfdde in btbulkdelete (fcinfo=0x7fff2c7657d0) at nbtree.c:676
#6  0x0000000000939769 in FunctionCall4Coll (flinfo=0x7fff2c765bb0,
collation=0, arg1=140733939342544, arg2=0, arg3=6721389, arg4=41542456) at
fmgr.c:1375
#7  0x00000000004d2a01 in index_bulk_delete (info=0x7fff2c765cd0,
stats=0x0, callback=0x668f6d <lazy_tid_reaped>, callback_state=0x279e338)
at indexam.c:690
#8  0x000000000066861d in lazy_vacuum_index (indrel=0x7fd40ab846f0,
stats=0x279e770, vacrelstats=0x279e338) at vacuumlazy.c:1367
#9  0x00000000006678a8 in lazy_scan_heap (onerel=0x274ec90,
vacrelstats=0x279e338, Irel=0x279e790, nindexes=2, scan_all=0 '\000') at
vacuumlazy.c:1098
#10 0x00000000006660f7 in lazy_vacuum_rel (onerel=0x274ec90, options=99,
params=0x27bdc88, bstrategy=0x27bdd18) at vacuumlazy.c:244
#11 0x0000000000665c1a in vacuum_rel (relid=1249, relation=0x7fff2c7662a0,
options=99, params=0x27bdc88) at vacuum.c:1388
#12 0x00000000006643ce in vacuum (options=99, relation=0x7fff2c7662a0,
relid=1249, params=0x27bdc88, va_cols=0x0, bstrategy=0x27bdd18,
isTopLevel=1 '\001')   at vacuum.c:293
#13 0x000000000075d23c in autovacuum_do_vac_analyze (tab=0x27bdc80,
bstrategy=0x27bdd18) at autovacuum.c:2807
#14 0x000000000075c632 in do_autovacuum () at autovacuum.c:2328
#15 0x000000000075b457 in AutoVacWorkerMain (argc=0, argv=0x0) at
autovacuum.c:1647
#16 0x000000000075b0a5 in StartAutoVacWorker () at autovacuum.c:1454
#17 0x000000000076f9cc in StartAutovacuumWorker () at postmaster.c:5261
#18 0x000000000076f28a in sigusr1_handler (postgres_signal_arg=10) at
postmaster.c:4918
#19 <signal handler called>
#20 0x00000039696e1353 in __select_nocancel () from /lib64/libc.so.6
#21 0x000000000076ace7 in ServerLoop () at postmaster.c:1582
#22 0x000000000076a538 in PostmasterMain (argc=3, argv=0x26f9330) at
postmaster.c:1263
#23 0x00000000006c1c2e in main (argc=3, argv=0x26f9330) at main.c:223

Thanks,
Amit




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Noah Misch
Дата:
Сообщение: Re: spgist recovery assertion failure
Следующее
От: Piotr Stefaniak
Дата:
Сообщение: Re: spgist recovery assertion failure