Обсуждение: elog(DEBUG2 in SpinLocked section.

Поиск
Список
Период
Сортировка

elog(DEBUG2 in SpinLocked section.

От
Kyotaro Horiguchi
Дата:
Hello.

I noticed that UpdateSpillStats calls "elog(DEBUG2" within
SpinLockAcquire section on MyWalSnd.  The lock doesn't protect rb and
in the first place the rb cannot be modified during the function is
running.

It should be out of the lock section.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 8098f066e5884128df04ecb94bcbf960d55b0c93 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyoga.ntt@gmail.com>
Date: Tue, 2 Jun 2020 16:10:14 +0900
Subject: [PATCH] Move an elog out of spin-lock section

The variable to be shown by the elog is not protected by the spin lock
and the elog call unnecessarily prolongs the lock section. Move it out
of the lock section.
---
 src/backend/replication/walsender.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 86847cbb54..2364cbfc61 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -3685,17 +3685,15 @@ UpdateSpillStats(LogicalDecodingContext *ctx)
 {
     ReorderBuffer *rb = ctx->reorder;
 
-    SpinLockAcquire(&MyWalSnd->mutex);
-
-    MyWalSnd->spillTxns = rb->spillTxns;
-    MyWalSnd->spillCount = rb->spillCount;
-    MyWalSnd->spillBytes = rb->spillBytes;
-
     elog(DEBUG2, "UpdateSpillStats: updating stats %p %lld %lld %lld",
          rb,
          (long long) rb->spillTxns,
          (long long) rb->spillCount,
          (long long) rb->spillBytes);
 
+    SpinLockAcquire(&MyWalSnd->mutex);
+    MyWalSnd->spillTxns = rb->spillTxns;
+    MyWalSnd->spillCount = rb->spillCount;
+    MyWalSnd->spillBytes = rb->spillBytes;
     SpinLockRelease(&MyWalSnd->mutex);
 }
-- 
2.18.2


Re: elog(DEBUG2 in SpinLocked section.

От
Fujii Masao
Дата:

On 2020/06/02 16:15, Kyotaro Horiguchi wrote:
> Hello.
> 
> I noticed that UpdateSpillStats calls "elog(DEBUG2" within
> SpinLockAcquire section on MyWalSnd.  The lock doesn't protect rb and
> in the first place the rb cannot be modified during the function is
> running.
> 
> It should be out of the lock section.

Thanks for the patch! It looks good to me.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: elog(DEBUG2 in SpinLocked section.

От
Amit Kapila
Дата:
On Tue, Jun 2, 2020 at 2:05 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> On 2020/06/02 16:15, Kyotaro Horiguchi wrote:
> > Hello.
> >
> > I noticed that UpdateSpillStats calls "elog(DEBUG2" within
> > SpinLockAcquire section on MyWalSnd.  The lock doesn't protect rb and
> > in the first place the rb cannot be modified during the function is
> > running.
> >
> > It should be out of the lock section.

Right.

>
> Thanks for the patch! It looks good to me.
>

The patch looks good to me as well.  I will push this unless Fujii-San
wants to do it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: elog(DEBUG2 in SpinLocked section.

От
Fujii Masao
Дата:

On 2020/06/02 17:42, Amit Kapila wrote:
> On Tue, Jun 2, 2020 at 2:05 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>
>> On 2020/06/02 16:15, Kyotaro Horiguchi wrote:
>>> Hello.
>>>
>>> I noticed that UpdateSpillStats calls "elog(DEBUG2" within
>>> SpinLockAcquire section on MyWalSnd.  The lock doesn't protect rb and
>>> in the first place the rb cannot be modified during the function is
>>> running.
>>>
>>> It should be out of the lock section.
> 
> Right.
> 
>>
>> Thanks for the patch! It looks good to me.
>>
> 
> The patch looks good to me as well.  I will push this unless Fujii-San
> wants to do it.

Thanks! I pushed the patch.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: elog(DEBUG2 in SpinLocked section.

От
Kyotaro Horiguchi
Дата:
At Tue, 2 Jun 2020 19:24:16 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> Thanks! I pushed the patch.

Thanks to all!

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: elog(DEBUG2 in SpinLocked section.

От
Michael Paquier
Дата:
On Wed, Jun 03, 2020 at 09:18:19AM +0900, Kyotaro Horiguchi wrote:
> Thanks to all!

Indeed, this was incorrect.  And you may not have noticed, but we have
a second instance of that in LogicalIncreaseRestartDecodingForSlot()
that goes down to 9.4 and b89e151.  I used a dirty-still-efficient
hack to detect that, and that's the only instance I have spotted.

I am not sure if that's worth worrying a back-patch, but we should
really address that at least on HEAD.  Attached is an extra patch to
close the loop.
--
Michael

Вложения

Re: elog(DEBUG2 in SpinLocked section.

От
Amit Kapila
Дата:
On Wed, Jun 3, 2020 at 8:35 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Jun 03, 2020 at 09:18:19AM +0900, Kyotaro Horiguchi wrote:
> > Thanks to all!
>
> Indeed, this was incorrect.
>

Do you mean to say correct?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: elog(DEBUG2 in SpinLocked section.

От
Tom Lane
Дата:
Michael Paquier <michael@paquier.xyz> writes:
> Indeed, this was incorrect.  And you may not have noticed, but we have
> a second instance of that in LogicalIncreaseRestartDecodingForSlot()
> that goes down to 9.4 and b89e151.  I used a dirty-still-efficient
> hack to detect that, and that's the only instance I have spotted.

Ugh, that is just horrid.  I experimented with the attached patch
but it did not find any other problems.  Still, that only proves
something about code paths that are taken during check-world, and
we know that our test coverage is not very good :-(.

Should we think about adding automated detection of this type of
mistake?  I don't like the attached as-is because of the #include
footprint expansion, but maybe we can find a better way.

> I am not sure if that's worth worrying a back-patch, but we should
> really address that at least on HEAD.

It's actually worse in the back branches, because elog() did not have
a good short-circuit path like ereport() does.  +1 for back-patch.

            regards, tom lane

diff --git a/src/backend/storage/lmgr/spin.c b/src/backend/storage/lmgr/spin.c
index 4d2a4c6641..f95f83d039 100644
--- a/src/backend/storage/lmgr/spin.c
+++ b/src/backend/storage/lmgr/spin.c
@@ -27,6 +27,8 @@
 #include "storage/spin.h"
 
 
+volatile slock_t *held_spinlock = NULL;
+
 #ifndef HAVE_SPINLOCKS
 PGSemaphore *SpinlockSemaArray;
 #endif
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 14fa127ab1..247d9a8089 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -28,6 +28,10 @@
 #include "datatype/timestamp.h" /* for TimestampTz */
 #include "pgtime.h"                /* for pg_time_t */
 
+#ifndef FRONTEND
+#include "storage/spin.h"
+#endif
+
 
 #define InvalidPid                (-1)
 
@@ -98,6 +102,7 @@ extern void ProcessInterrupts(void);
 
 #define CHECK_FOR_INTERRUPTS() \
 do { \
+    NotHoldingSpinLock(); \
     if (InterruptPending) \
         ProcessInterrupts(); \
 } while(0)
@@ -105,6 +110,7 @@ do { \
 
 #define CHECK_FOR_INTERRUPTS() \
 do { \
+    NotHoldingSpinLock(); \
     if (UNBLOCKED_SIGNAL_QUEUE()) \
         pgwin32_dispatch_queued_signals(); \
     if (InterruptPending) \
@@ -113,7 +119,11 @@ do { \
 #endif                            /* WIN32 */
 
 
-#define HOLD_INTERRUPTS()  (InterruptHoldoffCount++)
+#define HOLD_INTERRUPTS() \
+do { \
+    NotHoldingSpinLock(); \
+    InterruptHoldoffCount++; \
+} while(0)
 
 #define RESUME_INTERRUPTS() \
 do { \
diff --git a/src/include/storage/spin.h b/src/include/storage/spin.h
index 5ad25d0f6f..6d806620a1 100644
--- a/src/include/storage/spin.h
+++ b/src/include/storage/spin.h
@@ -56,12 +56,26 @@
 #include "storage/pg_sema.h"
 #endif
 
+extern volatile slock_t *held_spinlock;
+
+#define NotHoldingSpinLock() \
+    Assert(held_spinlock == NULL)
 
 #define SpinLockInit(lock)    S_INIT_LOCK(lock)
 
-#define SpinLockAcquire(lock) S_LOCK(lock)
+#define SpinLockAcquire(lock) \
+    do { \
+        Assert(held_spinlock == NULL); \
+        S_LOCK(lock); \
+        held_spinlock = (lock); \
+    } while (0)
 
-#define SpinLockRelease(lock) S_UNLOCK(lock)
+#define SpinLockRelease(lock) \
+    do { \
+        Assert(held_spinlock == (lock)); \
+        S_UNLOCK(lock); \
+        held_spinlock = NULL; \
+    } while (0)
 
 #define SpinLockFree(lock)    S_LOCK_FREE(lock)
 
diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index 1e09ee0541..902996fed4 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -16,6 +16,10 @@
 
 #include <setjmp.h>
 
+#ifndef FRONTEND
+#include "storage/spin.h"
+#endif
+
 /* Error level codes */
 #define DEBUG5        10            /* Debugging messages, in categories of
                                  * decreasing detail. */
@@ -124,6 +128,7 @@
 #define ereport_domain(elevel, domain, ...)    \
     do { \
         pg_prevent_errno_in_scope(); \
+        NotHoldingSpinLock(); \
         if (errstart(elevel, domain)) \
             __VA_ARGS__, errfinish(__FILE__, __LINE__, PG_FUNCNAME_MACRO); \
         if (__builtin_constant_p(elevel) && (elevel) >= ERROR) \
@@ -134,6 +139,7 @@
     do { \
         const int elevel_ = (elevel); \
         pg_prevent_errno_in_scope(); \
+        NotHoldingSpinLock(); \
         if (errstart(elevel_, domain)) \
             __VA_ARGS__, errfinish(__FILE__, __LINE__, PG_FUNCNAME_MACRO); \
         if (elevel_ >= ERROR) \

Re: elog(DEBUG2 in SpinLocked section.

От
Michael Paquier
Дата:
On Wed, Jun 03, 2020 at 08:52:08AM +0530, Amit Kapila wrote:
> Do you mean to say correct?

Nope, I really meant that the code before caa3c42 is incorrect, and I
am glad that it got fixed.  Sorry if that sounded confusing.
--
Michael

Вложения

Re: elog(DEBUG2 in SpinLocked section.

От
Michael Paquier
Дата:
On Wed, Jun 03, 2020 at 12:36:34AM -0400, Tom Lane wrote:
> Ugh, that is just horrid.  I experimented with the attached patch
> but it did not find any other problems.

Oh.  I can see the same "ifndef FRONTEND" logic all around the place
as I did on my local branch :)

> Still, that only proves something about code paths that are taken
> during check-world, and we know that our test coverage is not very
> good :-(.

Yeah.  Not perfect, still we are getting better at it with the years.
I am fine to take care of a backpatch, but I'll wait first a bit to
see if others have any comments.
--
Michael

Вложения

Re: elog(DEBUG2 in SpinLocked section.

От
Tom Lane
Дата:
I wrote:
> Ugh, that is just horrid.  I experimented with the attached patch
> but it did not find any other problems.

It occurred to me to add NotHoldingSpinLock() into palloc and
friends, and look what I found in copy_replication_slot:

            SpinLockAcquire(&s->mutex);
            src_islogical = SlotIsLogical(s);
            src_restart_lsn = s->data.restart_lsn;
            temporary = s->data.persistency == RS_TEMPORARY;
            plugin = logical_slot ? pstrdup(NameStr(s->data.plugin)) : NULL;
            SpinLockRelease(&s->mutex);

That is not gonna do, of course.  And there is another pstrdup
inside another spinlock section a bit further down in the same
function.  Also, pg_get_replication_slots has a couple of
namecpy() calls inside a spinlock, which is maybe less dangerous
than palloc() but it's still willful disregard of the project coding
rule about "only straight-line code inside a spinlock".

I'm inclined to think that memcpy'ing the ReplicationSlot struct
into a local variable might be the best way, replacing all the
piecemeal copying these stanzas are doing right now.  memcpy() of
a fixed amount of data isn't quite straight-line code perhaps,
but it has a well-defined runtime and zero chance of throwing an
error, which are the two properties we should be most urgently
concerned about.

            regards, tom lane



Re: elog(DEBUG2 in SpinLocked section.

От
Michael Paquier
Дата:
On Wed, Jun 03, 2020 at 01:27:51AM -0400, Tom Lane wrote:
> I'm inclined to think that memcpy'ing the ReplicationSlot struct
> into a local variable might be the best way, replacing all the
> piecemeal copying these stanzas are doing right now.  memcpy() of
> a fixed amount of data isn't quite straight-line code perhaps,
> but it has a well-defined runtime and zero chance of throwing an
> error, which are the two properties we should be most urgently
> concerned about.

+1.  And I guess that you are already on that?  ;)
--
Michael

Вложения

Re: elog(DEBUG2 in SpinLocked section.

От
Tom Lane
Дата:
Michael Paquier <michael@paquier.xyz> writes:
> On Wed, Jun 03, 2020 at 01:27:51AM -0400, Tom Lane wrote:
>> I'm inclined to think that memcpy'ing the ReplicationSlot struct
>> into a local variable might be the best way, replacing all the
>> piecemeal copying these stanzas are doing right now.

> +1.  And I guess that you are already on that?  ;)

I'll work on it tomorrow ... it's getting late here.

            regards, tom lane



Re: elog(DEBUG2 in SpinLocked section.

От
Tom Lane
Дата:
... and InvalidateObsoleteReplicationSlots(), too.

I am detecting a pattern here.

            regards, tom lane



Re: elog(DEBUG2 in SpinLocked section.

От
Kyotaro Horiguchi
Дата:
At Wed, 03 Jun 2020 02:00:53 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in 
> ... and InvalidateObsoleteReplicationSlots(), too.
> 
> I am detecting a pattern here.

I looked through 224 locations where SpinLockAcquire and found some.

LogicalIncreaseRestartDecodingForSlot is spotted by Michael.
pg_get_replication_slots has some namecpy as Tom pointed out.
copy_replication_slot has pstrdup as Tom pointed out.
InvalidateObsoleteReplicationSlots has pstrdup as Tom poineed out.

I found another instance of pstrdup, but found some string copy functions.

CreateInitDecodingContext has StrNCpy (up to NAMEDATALEN = 64 bytes).
RequestXLogStreaming has strlcpy (up to MAXCONNINFO = 1024 bytes).
SaveSlotToPath has memcpy on ReplicationSlotOnDisk (176 bytes).
WalReceiverMain has strlcpy(MAXCONINFO + NAMEDATALEN) and memset of MAXCONNINFO.
pg_stat_get_wal_receiver has strlcpy (NAMEDATALEN + NI_MAXHOST(1025) + MAXCONNINFO).

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: elog(DEBUG2 in SpinLocked section.

От
Tom Lane
Дата:
Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
> I looked through 224 locations where SpinLockAcquire and found some.

Yeah, I made a similar scan and arrived at about the same conclusions.
I think that the memcpy and strlcpy calls are fine; at least, we've got
to transport data somehow and it's not apparent why those aren't OK ways
to do it.  The one use of StrNCpy is annoying from a cosmetic standpoint
(mainly because it's Not Like Anywhere Else) but I'm not sure it's worth
changing.

The condition-variable code has a boatload of spinlocked calls of the
proclist functions in proclist.h.  All of those are straight-line code
so they're okay performance wise, but I wonder if we shouldn't add a
comment to that header pointing out that its functions must not throw
errors.

The only other thing I remain concerned about is some instances of atomic
operations inside spinlocks, which I started a separate thread about [1].

            regards, tom lane

[1] https://www.postgresql.org/message-id/1141819.1591208385%40sss.pgh.pa.us



Re: elog(DEBUG2 in SpinLocked section.

От
Michael Paquier
Дата:
On Wed, Jun 03, 2020 at 12:36:34AM -0400, Tom Lane wrote:
> Should we think about adding automated detection of this type of
> mistake?  I don't like the attached as-is because of the #include
> footprint expansion, but maybe we can find a better way.

I think that this one first boils down to the FRONTEND dependency in
those headers.  Or in short, spin.h may get loaded by the frontend but
we have a backend-only API, no?

> It's actually worse in the back branches, because elog() did not have
> a good short-circuit path like ereport() does.  +1 for back-patch.

Thanks, got that fixed down to 9.5.
--
Michael

Вложения

Re: elog(DEBUG2 in SpinLocked section.

От
Tom Lane
Дата:
Michael Paquier <michael@paquier.xyz> writes:
> On Wed, Jun 03, 2020 at 12:36:34AM -0400, Tom Lane wrote:
>> Should we think about adding automated detection of this type of
>> mistake?  I don't like the attached as-is because of the #include
>> footprint expansion, but maybe we can find a better way.

> I think that this one first boils down to the FRONTEND dependency in
> those headers.  Or in short, spin.h may get loaded by the frontend but
> we have a backend-only API, no?

I think the #include bloat comes from wanting to declare the global
state variable as "slock_t *".  We could give up on that and write
something like this in a central place like c.h:

#if defined(USE_ASSERT_CHECKING) && !defined(FRONTEND)
extern void *held_spinlock;
#define NotHoldingSpinLock() Assert(held_spinlock == NULL)
#else
#define NotHoldingSpinLock() ((void) 0)
#endif

Then throwing NotHoldingSpinLock() into relevant places costs
nothing new include-wise.

            regards, tom lane



Re: elog(DEBUG2 in SpinLocked section.

От
Robert Haas
Дата:
On Wed, Jun 3, 2020 at 12:36 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Should we think about adding automated detection of this type of
> mistake?  I don't like the attached as-is because of the #include
> footprint expansion, but maybe we can find a better way.

I think it would be an excellent idea.

Removing some of these spinlocks and replacing them with LWLocks might
also be worth considering.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: elog(DEBUG2 in SpinLocked section.

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> Removing some of these spinlocks and replacing them with LWLocks might
> also be worth considering.

When I went through the existing spinlock stanzas, the only thing that
really made me acutely uncomfortable was the chunk in pg_stat_statement's
pgss_store(), lines 1386..1438 in HEAD.  In the first place, that's
pushing the notion of "short straight-line code" well beyond reasonable
bounds.  Other processes could waste a fair amount of time spinning while
the lock holder does all this arithmetic; not to mention the risk of
exhausting one's CPU time-slice partway through.  In the second place,
a chunk of code this large could well allow people to make modifications
without noticing that they're inside a spinlock, allowing future coding
violations to sneak in.

Not sure what we want to do about it though.  An LWLock per pgss entry
probably isn't gonna do.  Perhaps we could take a cue from your old
hack with multiplexed spinlocks, and map the pgss entries onto some
fixed-size pool of LWLocks, figuring that the odds of false conflicts
are small as long as the pool is bigger than MaxBackends.

            regards, tom lane



Re: elog(DEBUG2 in SpinLocked section.

От
Robert Haas
Дата:
On Tue, Jun 9, 2020 at 1:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > Removing some of these spinlocks and replacing them with LWLocks might
> > also be worth considering.
>
> When I went through the existing spinlock stanzas, the only thing that
> really made me acutely uncomfortable was the chunk in pg_stat_statement's
> pgss_store(), lines 1386..1438 in HEAD.  In the first place, that's
> pushing the notion of "short straight-line code" well beyond reasonable
> bounds.  Other processes could waste a fair amount of time spinning while
> the lock holder does all this arithmetic; not to mention the risk of
> exhausting one's CPU time-slice partway through.  In the second place,
> a chunk of code this large could well allow people to make modifications
> without noticing that they're inside a spinlock, allowing future coding
> violations to sneak in.
>
> Not sure what we want to do about it though.  An LWLock per pgss entry
> probably isn't gonna do.  Perhaps we could take a cue from your old
> hack with multiplexed spinlocks, and map the pgss entries onto some
> fixed-size pool of LWLocks, figuring that the odds of false conflicts
> are small as long as the pool is bigger than MaxBackends.

I mean, what would be wrong with having an LWLock per pgss entry? If
you're worried about efficiency, it's no longer the case that an
LWLock uses a spinlock internally, so there's not the old problem of
doubling (plus contention) the number of atomic operations by using an
LWLock. If you're worried about space, an LWLock is only 16 bytes, and
the slock_t that we'd be replacing is currently at the end of the
struct so presumably followed by some padding.

I suspect that these days many of the places we're using spinlocks are
buying little of any value on the efficiency side, but making any
high-contention scenarios way worse. Plus, unlike LWLocks, they're not
instrumented with wait events, so you can't even find out that you've
got contention there without breaking out 'perf', not exactly a great
thing to have to do in a production environments.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: elog(DEBUG2 in SpinLocked section.

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jun 9, 2020 at 1:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> When I went through the existing spinlock stanzas, the only thing that
>> really made me acutely uncomfortable was the chunk in pg_stat_statement's
>> pgss_store(), lines 1386..1438 in HEAD.

> I mean, what would be wrong with having an LWLock per pgss entry?

Hmm, maybe nothing.  I'm accustomed to thinking of them as being
significantly more expensive than spinlocks, but maybe we've narrowed
the gap enough that that's not such a problem.

            regards, tom lane



Re: elog(DEBUG2 in SpinLocked section.

От
Andres Freund
Дата:
Hi,

On 2020-06-09 19:24:15 -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Tue, Jun 9, 2020 at 1:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> When I went through the existing spinlock stanzas, the only thing that
> >> really made me acutely uncomfortable was the chunk in pg_stat_statement's
> >> pgss_store(), lines 1386..1438 in HEAD.
> 
> > I mean, what would be wrong with having an LWLock per pgss entry?

+1

> Hmm, maybe nothing.  I'm accustomed to thinking of them as being
> significantly more expensive than spinlocks, but maybe we've narrowed
> the gap enough that that's not such a problem.

They do add a few cycles (IIRC ~30 or so, last time I measured a
specific scenario) of latency to acquisition, but it's not a large
amount. The only case where acquisition is noticably slower, in my
experiments, is when there's "just the right amount" of
contention. There spinning instead of entering the kernel can be good.

I've mused about adding a small amount of spinning to lwlock acquisition
before. But so far working on reducing contention seemed the better
route.


Funnily enough lwlock *release*, even when there are no waiters, has a
somewhat noticable performance difference on x86 (and other TSO
platforms) compared to spinlock release. For spinlock release we can
just use a plain write and a compiler barrier, whereas lwlock release
needs to use an atomic operation.

I think that's hard, but not impossible, to avoid for an userspace
reader-writer lock.


It would be a nice experiment to make spinlocks a legacy wrapper around
rwlocks. I think if we added 2-3 optimizations (optimize for
exclusive-only locks, short amount of spinning, possibly inline
functions for "fast path" acquisitions/release) that'd be better for
nearly all situations. And in the situations where it's not, the loss
would be pretty darn small.

Greetings,

Andres Freund



Re: elog(DEBUG2 in SpinLocked section.

От
Andres Freund
Дата:
Hi,

On 2020-06-09 15:20:08 -0400, Robert Haas wrote:
> If you're worried about space, an LWLock is only 16 bytes, and the
> slock_t that we'd be replacing is currently at the end of the struct
> so presumably followed by some padding.

I don't think the size is worth of concern in this case, and I'm not
sure there's any current case where it's really worth spending effort
reducing size. But if there is: It seems possible to reduce the size.

struct LWLock {
        uint16                     tranche;              /*     0     2 */

        /* XXX 2 bytes hole, try to pack */

        pg_atomic_uint32           state;                /*     4     4 */
        proclist_head              waiters;              /*     8     8 */

        /* size: 16, cachelines: 1, members: 3 */
        /* sum members: 14, holes: 1, sum holes: 2 */
        /* last cacheline: 16 bytes */
};

First, we could remove the tranche from the lwlock, and instead perform
more work when we need to know it. Which is only when we're going to
sleep, so it'd be ok if it's not that much work. Perhaps we could even
defer determining the tranche to the the *read* side of the wait event
(presumably that'd require making the pgstat side a bit more
complicated).

Second, it seems like it should be doable to reduce the size of the
waiters list. We e.g. could have a separate 'array of wait lists' array
in shared memory, which gets assigned to an lwlock whenever a backend
wants to wait for an lwlock. The number of processes waiting for lwlocks
is clearly limited by MAX_BACKENDS / 2^18-1 backends waiting, so one 4
byte integer pointing to a wait list obviously would suffice.

But again, I'm not sure the current size a real problem anywhere.

Greetings,

Andres Freund



Re: elog(DEBUG2 in SpinLocked section.

От
Robert Haas
Дата:
On Tue, Jun 9, 2020 at 8:12 PM Andres Freund <andres@anarazel.de> wrote:
> I don't think the size is worth of concern in this case, and I'm not
> sure there's any current case where it's really worth spending effort
> reducing size. But if there is: It seems possible to reduce the size.

Yeah, I don't think it's very important.

> First, we could remove the tranche from the lwlock, and instead perform
> more work when we need to know it. Which is only when we're going to
> sleep, so it'd be ok if it's not that much work. Perhaps we could even
> defer determining the tranche to the the *read* side of the wait event
> (presumably that'd require making the pgstat side a bit more
> complicated).
>
> Second, it seems like it should be doable to reduce the size of the
> waiters list. We e.g. could have a separate 'array of wait lists' array
> in shared memory, which gets assigned to an lwlock whenever a backend
> wants to wait for an lwlock. The number of processes waiting for lwlocks
> is clearly limited by MAX_BACKENDS / 2^18-1 backends waiting, so one 4
> byte integer pointing to a wait list obviously would suffice.
>
> But again, I'm not sure the current size a real problem anywhere.

Honestly, both of these sound more painful than it's worth. We're not
likely to have enough LWLocks that using 16 bytes for each one rather
than 8 is a major problem. With regard to the first of these ideas,
bear in mind that the LWLock might be in a DSM segment that the reader
doesn't have mapped.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: elog(DEBUG2 in SpinLocked section.

От
Andres Freund
Дата:
Hi,

On 2020-06-03 00:36:34 -0400, Tom Lane wrote:
> Should we think about adding automated detection of this type of
> mistake?  I don't like the attached as-is because of the #include
> footprint expansion, but maybe we can find a better way.

I experimented with making the compiler warn about about some of these
kinds of mistakes without needing full test coverage:

I was able to get clang to warn about things like using palloc in signal
handlers, or using palloc while holding a spinlock. Which would be
great, except that it doesn't warn when there's an un-annotated
intermediary function. Even when that function is in the same TU.

Here's my attempt: https://godbolt.org/z/xfa6Es

It does detect things like
    spinlock_lock();
    example_alloc(17);
    spinlock_unlock();

<source>:49:2: warning: cannot call function 'example_alloc' while mutex 'holding_spinlock' is held
[-Wthread-safety-analysis]

        example_alloc(17);

        ^

which isn't too bad.

Does anybody think this would be useful even if it doesn't detect the
more complicated cases?

Greetings,

Andres Freund



Re: elog(DEBUG2 in SpinLocked section.

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> I experimented with making the compiler warn about about some of these
> kinds of mistakes without needing full test coverage:

> I was able to get clang to warn about things like using palloc in signal
> handlers, or using palloc while holding a spinlock. Which would be
> great, except that it doesn't warn when there's an un-annotated
> intermediary function. Even when that function is in the same TU.

Hm.  Couldn't we make "calling an un-annotated function" be a violation
in itself?  Certainly in the case of spinlocks, what we want is pretty
nearly a total ban on calling anything at all.  I wouldn't cry too hard
about having a similar policy for signal handlers.  (The postmaster's
handlers would have to be an exception for now.)

            regards, tom lane



Re: elog(DEBUG2 in SpinLocked section.

От
Andres Freund
Дата:
Hi,

On 2020-06-16 19:46:29 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I experimented with making the compiler warn about about some of these
> > kinds of mistakes without needing full test coverage:
>
> > I was able to get clang to warn about things like using palloc in signal
> > handlers, or using palloc while holding a spinlock. Which would be
> > great, except that it doesn't warn when there's an un-annotated
> > intermediary function. Even when that function is in the same TU.
>
> Hm.  Couldn't we make "calling an un-annotated function" be a violation
> in itself?

I don't see a way to do that with these annotations, unfortunately.

https://clang.llvm.org/docs/ThreadSafetyAnalysis.html
https://clang.llvm.org/docs/AttributeReference.html#acquire-capability-acquire-shared-capability


> Certainly in the case of spinlocks, what we want is pretty
> nearly a total ban on calling anything at all.  I wouldn't cry too hard
> about having a similar policy for signal handlers.

It'd be interesting to try and see how invasive that'd be, if it were
possible to enforce. But...

Greetings,

Andres Freund