Re: LWLock deadlock and gdb advice

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: LWLock deadlock and gdb advice
Дата
Msg-id CAMkU=1zUc=h0oCZntaJaqqW7gxxVxCWsYq8DD2t7oHgsgVEsgA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: LWLock deadlock and gdb advice  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-hackers
On Thu, Jul 16, 2015 at 12:03 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, Jul 15, 2015 at 8:44 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

Both. Here's the patch.

Previously, LWLockAcquireWithVar set the variable associated with the lock atomically with acquiring it. Before the lwlock-scalability changes, that was straightforward because you held the spinlock anyway, but it's a lot harder/expensive now. So I changed the way acquiring a lock with a variable works. There is now a separate flag, LW_FLAG_VAR_SET, which indicates that the current lock holder has updated the variable. The LWLockAcquireWithVar function is gone - you now just use LWLockAcquire(), which always clears the LW_FLAG_VAR_SET flag, and you can call LWLockUpdateVar() after that if you want to set the variable immediately. LWLockWaitForVar() always waits if the flag is not set, i.e. it will not return regardless of the variable's value, if the current lock-holder has not updated it yet.


I ran this for a while without casserts and it seems to work.  But with casserts, I get failures in the autovac process on the GIN index.

I don't see how this is related to the LWLock issue, but I didn't see it without your patch.  Perhaps the system just didn't survive long enough to uncover it without the patch (although it shows up pretty quickly).  It could just be an overzealous Assert, since the casserts off didn't show problems.

bt and bt full are shown below.

Cheers, 

Jeff

#0  0x0000003dcb632625 in raise () from /lib64/libc.so.6
#1  0x0000003dcb633e05 in abort () from /lib64/libc.so.6
#2  0x0000000000930b7a in ExceptionalCondition (
    conditionName=0x9a1440 "!(((PageHeader) (page))->pd_special >= (__builtin_offsetof (PageHeaderData, pd_linp)))", errorType=0x9a12bc "FailedAssertion",
    fileName=0x9a12b0 "ginvacuum.c", lineNumber=713) at assert.c:54
#3  0x00000000004947cf in ginvacuumcleanup (fcinfo=0x7fffee073a90) at ginvacuum.c:713

It now looks like this *is* unrelated to the LWLock issue.  The assert that it is tripping over was added just recently (302ac7f27197855afa8c) and so I had not been testing under its presence until now.  It looks like it is finding all-zero pages (index extended but then a crash before initializing the page?) and it doesn't like them.

(gdb) f 3
(gdb) p *(char[8192]*)(page)
$11 = '\000' <repeats 8191 times>

Presumably before this assert, such pages would just be permanently orphaned.

Cheers,

Jeff

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: pg_dump quietly ignore missing tables - is it bug?
Следующее
От: Josh Berkus
Дата:
Сообщение: Re: Implementation of global temporary tables?