Re: Atomic operations within spinlocks

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Atomic operations within spinlocks
Дата
Msg-id 20200609060847.vd2c67ih4bz6vyww@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Atomic operations within spinlocks  (Andres Freund <andres@anarazel.de>)
Ответы global barrier & atomics in signal handlers (Re: Atomic operationswithin spinlocks)  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On 2020-06-05 17:19:26 -0700, Andres Freund wrote:
> Hi,
>
> On 2020-06-04 19:33:02 -0700, Andres Freund wrote:
> > But it looks like that code is currently buggy (and looks like it always
> > has been), because we don't look at the nested argument when
> > initializing the semaphore.  So we currently allocate too many
> > semaphores, without benefiting from them :(.
>
> I wrote a patch for this, and when I got around to to testing it, I
> found that our tests currently don't pass when using both
> --disable-spinlocks and --disable-atomics. Turns out to not be related
> to the issue above, but the global barrier support added in 13.
>
> That *reads* two 64 bit atomics in a signal handler. Which is normally
> fine, but not at all cool when atomics (or just 64 bit atomics) are
> backed by spinlocks. Because we can "self interrupt" while already
> holding the spinlock.
>
> It looks to me that that's a danger whenever 64bit atomics are backed by
> spinlocks, not just when both --disable-spinlocks and --disable-atomics
> are used. But I suspect that it's really hard to hit the tiny window of
> danger when those options aren't used. While we have buildfarm animals
> testing each of those separately, we don't have one that tests both
> together...
>
> I'm not really sure what to do about that issue. The easisest thing
> would probably be to change the barrier generation to 32bit (which
> doesn't have to use locks for reads in any situation).  I tested doing
> that, and it fixes the hangs for me.
>
>
> Randomly noticed while looking at the code:
>     uint64        flagbit = UINT64CONST(1) << (uint64) type;
>
> that shouldn't be 64bit, right?

Attached is a series of patches addressing these issues, of varying
quality:

1) This fixes the above mentioned issue in the global barrier code by
   using 32bit atomics. That might be fine, or it might not. I just
   included it here because otherwise the tests cannot be run fully.


2) Fixes spinlock emulation when more than INT_MAX spinlocks are
   initialized in the lifetime of a single backend

3) Add spinlock tests to normal regression tests.
   - Currently as part of test_atomic_ops. Probably not worth having a
     separate SQL function?
   - Currently contains a test for 1) that's run when the spinlock
     emulation is used. Probably too slow to actually indclude? Takes 15s
     on my computer... OTOH, it's just with --disable-spinlocks...
   - Could probably remove the current spinlock tests after this. The
     only thing they additionally test is a stuck spinlock. Since
     they're not run currently, they don't seem worth much?

4) Fix the potential for deadlocks when using atomics while holding a
   spinlock, add tests for that.

Any comments?

Greetings,

Andres Freund

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Add -Wold-style-definition to CFLAGS?
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Read access for pg_monitor to pg_replication_origin_status view