Re: the s_lock_stuck on perform_spin_delay

Поиск
Список
Период
Сортировка
От Matthias van de Meent
Тема Re: the s_lock_stuck on perform_spin_delay
Дата
Msg-id CAEze2WggP-2Dhocmdhp-LxBzic=MXRgGA_tmv1G_9n-PDt2MQg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: the s_lock_stuck on perform_spin_delay  (Andy Fan <zhihuifan1213@163.com>)
Ответы Re: the s_lock_stuck on perform_spin_delay  (Robert Haas <robertmhaas@gmail.com>)
Re: the s_lock_stuck on perform_spin_delay  (Andy Fan <zhihuifan1213@163.com>)
Список pgsql-hackers
On Wed, 10 Jan 2024 at 02:44, Andy Fan <zhihuifan1213@163.com> wrote:
> Hi,
>
> I want to know if Andres or you have plan
> to do some code review. I don't expect this would happen very soon, just
> want to make sure this will not happen that both of you think the other
> one will do, but actually none of them does it in fact. a commit fest
> [1] has been added for this.


> +++ b/src/backend/storage/buffer/bufmgr.c
> @@ -5419,6 +5419,7 @@ LockBufHdr(BufferDesc *desc)
>         perform_spin_delay(&delayStatus);
>     }
>     finish_spin_delay(&delayStatus);
> +    START_SPIN_LOCK();
>     return old_buf_state | BM_LOCKED;
> }

I think that we need to 'arm' the checks just before we lock the spin
lock, and 'disarm' the checks just after we unlock the spin lock,
rather than after and before, respectively. That way, we won't have a
chance of false negatives: with your current patch it is possible that
an interrupt fires between the acquisition of the lock and the code in
START_SPIN_LOCK() marking the thread as holding a spin lock, which
would cause any check in that signal handler to incorrectly read that
we don't hold any spin locks.

> +++ b/src/backend/storage/lmgr/lock.c
> @@ -776,6 +776,8 @@ LockAcquireExtended(const LOCKTAG *locktag,
>     bool        found_conflict;
>     bool        log_lock = false;
>
> +    Assert(SpinLockCount == 0);
> +

I'm not 100% sure on the policy of this, but theoretically you could
use LockAquireExtended(dontWait=true) while holding a spin lock, as
that would not have an unknown duration. Then again, this function
also does elog/ereport, which would cause issues, still, so this code
may be the better option.

> +    elog(PANIC, "stuck spinlock detected at %s, %s:%d after waiting for %u ms",
> +         func, file, line, delay_ms);

pg_usleep doesn't actually guarantee that we'll wait for exactly that
duration; depending on signals received while spinning and/or OS
scheduling decisions it may be off by orders of magnitude.

> +++ b/src/common/scram-common.c

This is unrelated to the main patchset.

> +++ b/src/include/storage/spin.h

Minor: I think these changes could better be included in miscadmin, or
at least the definition for SpinLockCount should be moved there: The
spin lock system itself shouldn't be needed in places where we need to
make sure that we don't hold any spinlocks, and miscadmin.h already
holds things related to "System interrupt and critical section
handling", which seems quite related.

Kind regards,

Matthias van de Meent



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nazir Bilal Yavuz
Дата:
Сообщение: Re: Show WAL write and fsync stats in pg_stat_io
Следующее
От: vignesh C
Дата:
Сообщение: Re: A failure in t/038_save_logical_slots_shutdown.pl