the s_lock_stuck on perform_spin_delay

Поиск
Список
Период
Сортировка
От Andy Fan
Тема the s_lock_stuck on perform_spin_delay
Дата
Msg-id 87plyhvama.fsf@163.com
обсуждение исходный текст
Ответы Re: the s_lock_stuck on perform_spin_delay  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Re: the s_lock_stuck on perform_spin_delay  (Robert Haas <robertmhaas@gmail.com>)
Re: the s_lock_stuck on perform_spin_delay  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Hi,

from src/backend/storage/lmgr/README:

"""
Spinlocks.  These are intended for *very* short-term locks.  If a lock
is to be held more than a few dozen instructions, or across any sort of
kernel call (or even a call to a nontrivial subroutine), don't use a
spinlock. Spinlocks are primarily used as infrastructure for lightweight
locks.
"""

I totally agree with this and IIUC spin lock is usually used with the
following functions.

#define init_local_spin_delay(status) ..
void perform_spin_delay(SpinDelayStatus *status);
void finish_spin_delay(SpinDelayStatus *status);

During the perform_spin_delay, we have the following codes:

void
perform_spin_delay(SpinDelayStatus *status)

    /* Block the process every spins_per_delay tries */
    if (++(status->spins) >= spins_per_delay)
    {
        if (++(status->delays) > NUM_DELAYS)
            s_lock_stuck(status->file, status->line, status->func);

the s_lock_stuck will PAINC the entire system.

My question is if someone doesn't obey the rule by mistake (everyone
can make mistake), shall we PANIC on a production environment? IMO I
think it can be a WARNING on a production environment and be a stuck
when 'ifdef USE_ASSERT_CHECKING'.

People may think spin lock may consume too much CPU, but it is not true
in the discussed scene since perform_spin_delay have pg_usleep in it,
and the MAX_DELAY_USEC is 1 second and MIN_DELAY_USEC is 0.001s.

I notice this issue actually because of the patch "Cache relation
sizes?" from Thomas Munro [1], where the latest patch[2] still have the 
following code. 
+        sr = smgr_alloc_sr();  <-- HERE a spin lock is hold
+
+        /* Upgrade to exclusive lock so we can create a mapping. */
+        LWLockAcquire(mapping_lock, LW_EXCLUSIVE); <-- HERE a complex
  operation is needed. it may take a long time.

Our internal testing system found more misuses on our own PG version.

I think a experienced engineer like Thomas can make this mistake and the
patch was reviewed by 3 peoples, the bug is still there. It is not easy
to say just don't do it. 

the attached code show the prototype in my mind. Any feedback is welcome. 

[1]
https://www.postgresql.org/message-id/CA%2BhUKGJg%2BgqCs0dgo94L%3D1J9pDp5hKkotji9A05k2nhYQhF4%2Bw%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/attachment/123659/v5-0001-WIP-Track-relation-sizes-in-shared-memory.patch  

-- 
Best Regards
Andy Fan


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Smith
Дата:
Сообщение: Re: GUC names in messages
Следующее
От: Ashutosh Bapat
Дата:
Сообщение: Re: speed up a logical replica setup