On 2020-06-05 17:19:26 -0700, Andres Freund wrote:
> Hi,
>
> On 2020-06-04 19:33:02 -0700, Andres Freund wrote:
> > But it looks like that code is currently buggy (and looks like it always
> > has been), because we don't look at the nested argument when
> > initializing the semaphore. So we currently allocate too many
> > semaphores, without benefiting from them :(.
>
> I wrote a patch for this, and when I got around to to testing it, I
> found that our tests currently don't pass when using both
> --disable-spinlocks and --disable-atomics. Turns out to not be related
> to the issue above, but the global barrier support added in 13.
>
> That *reads* two 64 bit atomics in a signal handler. Which is normally
> fine, but not at all cool when atomics (or just 64 bit atomics) are
> backed by spinlocks. Because we can "self interrupt" while already
> holding the spinlock.
>
> It looks to me that that's a danger whenever 64bit atomics are backed by
> spinlocks, not just when both --disable-spinlocks and --disable-atomics
> are used. But I suspect that it's really hard to hit the tiny window of
> danger when those options aren't used. While we have buildfarm animals
> testing each of those separately, we don't have one that tests both
> together...
>
> I'm not really sure what to do about that issue. The easisest thing
> would probably be to change the barrier generation to 32bit (which
> doesn't have to use locks for reads in any situation). I tested doing
> that, and it fixes the hangs for me.
>
>
> Randomly noticed while looking at the code:
> uint64 flagbit = UINT64CONST(1) << (uint64) type;
>
> that shouldn't be 64bit, right?
Attached is a series of patches addressing these issues, of varying
quality:
1) This fixes the above mentioned issue in the global barrier code by
using 32bit atomics. That might be fine, or it might not. I just
included it here because otherwise the tests cannot be run fully.
2) Fixes spinlock emulation when more than INT_MAX spinlocks are
initialized in the lifetime of a single backend
3) Add spinlock tests to normal regression tests.
- Currently as part of test_atomic_ops. Probably not worth having a
separate SQL function?
- Currently contains a test for 1) that's run when the spinlock
emulation is used. Probably too slow to actually indclude? Takes 15s
on my computer... OTOH, it's just with --disable-spinlocks...
- Could probably remove the current spinlock tests after this. The
only thing they additionally test is a stuck spinlock. Since
they're not run currently, they don't seem worth much?
4) Fix the potential for deadlocks when using atomics while holding a
spinlock, add tests for that.
Any comments?
Greetings,
Andres Freund