Re: the s_lock_stuck on perform_spin_delay

Поиск

Список

Период

Сортировка

От	Andy Fan
Тема	Re: the s_lock_stuck on perform_spin_delay
Дата	9 января 2024 г. 02:01:59
Msg-id	87sf37mdqq.fsf@163.com обсуждение исходный текст
Ответ на	Re: the s_lock_stuck on perform_spin_delay (Robert Haas <robertmhaas@gmail.com>)
Ответы	Re: the s_lock_stuck on perform_spin_delay
Список	pgsql-hackers

Дерево обсуждения

Hi,
Robert Haas <robertmhaas@gmail.com> writes:

> On Sun, Jan 7, 2024 at 9:52 PM Andy Fan <zhihuifan1213@163.com> wrote:
>> > I think we should add cassert-only infrastructure tracking whether we
>> > currently hold spinlocks, are in a signal handler and perhaps a few other
>> > states. That'd allow us to add assertions like:
>> ..
>> > - no lwlocks or ... while in signal handlers
>>
>> I *wish* lwlocks should *not* be held while in signal handlers since it
>> inspired me for a direction of a low-frequency internal bug where a
>> backend acuqire a LWLock when it has acuqired it before. However when I
>> read more document and code, I am thinking this should not be a
>> problem.
>
> It's not safe to acquire an LWLock in a signal handler unless we know
> that the code that the signal handler is interrupting can't already be
> doing that. Consider this code from LWLockAcquire:

Thanks for the explaination! I can follow the sistuation you descirbe
here, then I found I asked a bad question because I didn't clarify what
"signal handlers" I was refering to, sorry about that!

In your statement, I guess you are talking about the signal handler from
Linux. However I *assumed* such handlers are doing pretty similar stuff
like set a 'GlobalVarialbe=true'. If my assumption was right, I think
that should not be take cared. For example:

spin_or_lwlock_acquire();
... (linux signal handler may be invovked here no matther what ... code is)
spin_or_lwlock_relase()

Since the linux signal hander are pretty simply, so it can come back to
'spin_or_lwlock_relase' anyway. (However my assumption may be wrong and
thanks for highlight this, and it is helpful for me to debug my internal
bug!)

The singler handler I was refering to is 'CHECK_FOR_INTERRUPTS', Based
on this, spin_lock and lwlock are acted pretty differently.

spin_lock_acuqire();
CHECK_FOR_INTERRUPT();
spin_lock_release();

Since CHECK_FOR_INTERRUPT usually goes to the ERROR system which makes it
is hard to go back to 'spin_lock_release()', then spin lock leaks! so
CHECK_FOR_INTERRUPT is the place I Assert *spin lock* should not be
handled in my patch. and I understood what Andres was talking about is
the same thing. (Of course I can add the "Assert no spin lock is held"
into every linux single handler as well).

Based on the above, I asked my question in my previous post, where I am
not sure if we should do the same('Assert no-lwlock should be held') for
*lwlock* in CHECK_FOR_INTERRUPT since lwlocks can be released no matter
where CHECK_FOR_INTERRUPT jump to.

--
Best Regards
Andy Fan

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: the s_lock_stuck on perform_spin_delay