Re: the s_lock_stuck on perform_spin_delay

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: the s_lock_stuck on perform_spin_delay
Дата
Msg-id 20240104230628.j2wqqck3wrgol6ym@awork3.anarazel.de
обсуждение исходный текст
Ответ на the s_lock_stuck on perform_spin_delay  (Andy Fan <zhihuifan1213@163.com>)
Ответы Re: the s_lock_stuck on perform_spin_delay  (Andy Fan <zhihuifan1213@163.com>)
Re: the s_lock_stuck on perform_spin_delay  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi,

On 2024-01-04 14:59:06 +0800, Andy Fan wrote:
> My question is if someone doesn't obey the rule by mistake (everyone
> can make mistake), shall we PANIC on a production environment? IMO I
> think it can be a WARNING on a production environment and be a stuck
> when 'ifdef USE_ASSERT_CHECKING'.
>
> [...]
>
> I notice this issue actually because of the patch "Cache relation
> sizes?" from Thomas Munro [1], where the latest patch[2] still have the
> following code.
> +        sr = smgr_alloc_sr();  <-- HERE a spin lock is hold
> +
> +        /* Upgrade to exclusive lock so we can create a mapping. */
> +        LWLockAcquire(mapping_lock, LW_EXCLUSIVE); <-- HERE a complex
>   operation is needed. it may take a long time.
>
> Our internal testing system found more misuses on our own PG version.

> I think a experienced engineer like Thomas can make this mistake and the
> patch was reviewed by 3 peoples, the bug is still there. It is not easy
> to say just don't do it.

I don't follow this argument - just ignoring the problem, which emitting a
WARNING basically is, doesn't reduce the impact of the bug, it *increases* the
impact, because now the system will not recover from the bug without explicit
operator intervention.  During that time the system might be essentially
unresponsive, because all backends end up contending for some spinlock, which
makes investigating such issues very hard.


I think we should add infrastructure to detect bugs like this during
development, but not PANICing when this happens in production seems completely
non-viable.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Melanie Plageman
Дата:
Сообщение: Re: Emit fewer vacuum records by reaping removable tuples during pruning
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Adding facility for injection points (or probe points?) for more advanced tests