Обсуждение: LogwrtResult contended spinlock

Поиск

Список

Период

Сортировка

LogwrtResult contended spinlock

От

Alvaro Herrera

Дата:

31 августа 2020 г., 21:21:56

Jaime Casanova recently reported a situation where pglogical replicating
from 64 POS sites to a single central (64-core) node, each with two
replication sets, causes XLog's info_lck to become highly contended
because of frequently reading LogwrtResult.  We tested the simple fix of
adding a new LWLock that protects LogwrtResult and LogwrtRqst; that
seems to solve the problem easily enough.

At first I wanted to make the new LWLock cover only LogwrtResult proper,
and leave LogwrtRqst alone.  However on doing it, it seemed that that
might change the locking protocol in a nontrivial way.  So I decided to
make it cover both and call it a day.  We did verify that the patch
solves the reported problem, at any rate.

-- 
Álvaro Herrera                PostgreSQL Expert, https://www.2ndQuadrant.com/

Вложения

0001-use-an-LWLock-rather-than-spinlock-for-Logwrt-Result.patch

Re: LogwrtResult contended spinlock

От

Andres Freund

Дата:

31 августа 2020 г., 21:29:38

Hi,

On August 31, 2020 11:21:56 AM PDT, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>Jaime Casanova recently reported a situation where pglogical
>replicating
>from 64 POS sites to a single central (64-core) node, each with two
>replication sets, causes XLog's info_lck to become highly contended
>because of frequently reading LogwrtResult.  We tested the simple fix
>of
>adding a new LWLock that protects LogwrtResult and LogwrtRqst; that
>seems to solve the problem easily enough.
>
>At first I wanted to make the new LWLock cover only LogwrtResult
>proper,
>and leave LogwrtRqst alone.  However on doing it, it seemed that that
>might change the locking protocol in a nontrivial way.  So I decided to
>make it cover both and call it a day.  We did verify that the patch
>solves the reported problem, at any rate.

Wouldn't the better fix here be to allow reading of individual members without a lock? E.g. by wrapping each in a 64bit
atomic.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: LogwrtResult contended spinlock

От

Alvaro Herrera

Дата:

31 августа 2020 г., 21:34:45

On 2020-Aug-31, Andres Freund wrote:

> Hi, 
> 
> On August 31, 2020 11:21:56 AM PDT, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> >At first I wanted to make the new LWLock cover only LogwrtResult
> >proper,
> >and leave LogwrtRqst alone.  However on doing it, it seemed that that
> >might change the locking protocol in a nontrivial way.  So I decided to
> >make it cover both and call it a day.  We did verify that the patch
> >solves the reported problem, at any rate.
> 
> Wouldn't the better fix here be to allow reading of individual members
> without a lock? E.g. by wrapping each in a 64bit atomic.

Heh, Simon said the same.  It's not clear to me due to the lack of
general availability of 64-bit atomics.  If they are spinlock-protected
when emulated, I think that would make the problem worse.

IIRC Thomas wanted to start relying on atomic 64-bit vars in some patch,
but I don't remember what it was.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: LogwrtResult contended spinlock

От

Andres Freund

Дата:

31 августа 2020 г., 21:38:11

Hi,

On August 31, 2020 11:34:45 AM PDT, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>On 2020-Aug-31, Andres Freund wrote:
>
>> Hi,
>>
>> On August 31, 2020 11:21:56 AM PDT, Alvaro Herrera
><alvherre@2ndquadrant.com> wrote:
>
>> >At first I wanted to make the new LWLock cover only LogwrtResult
>> >proper,
>> >and leave LogwrtRqst alone.  However on doing it, it seemed that
>that
>> >might change the locking protocol in a nontrivial way.  So I decided
>to
>> >make it cover both and call it a day.  We did verify that the patch
>> >solves the reported problem, at any rate.
>>
>> Wouldn't the better fix here be to allow reading of individual
>members
>> without a lock? E.g. by wrapping each in a 64bit atomic.
>
>Heh, Simon said the same.  It's not clear to me due to the lack of
>general availability of 64-bit atomics.  If they are spinlock-protected
>when emulated, I think that would make the problem worse.
>
>IIRC Thomas wanted to start relying on atomic 64-bit vars in some
>patch,
>but I don't remember what it was.

All relevant platforms have 64bit atomics. So I don't think there's much point in worrying about the emulated
performance.Correctness, sure. Performance, not so much. 

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: LogwrtResult contended spinlock

От

Alvaro Herrera

Дата:

03 сентября 2020 г., 21:34:52

Looking at patterns like this

    if (XLogCtl->LogwrtRqst.Write < EndPos)
        XLogCtl->LogwrtRqst.Write = EndPos;

It seems possible to implement with

    do {
        XLogRecPtr    currwrite;

        currwrite = pg_atomic_read_u64(LogwrtRqst.Write);
    if (currwrite > EndPos)
            break;  // already done by somebody else
        if (pg_atomic_compare_exchange_u64(LogwrtRqst.Write,
                                       currwrite, EndPos))
            break;  // successfully updated
    } while (true);

This assumes that LogwrtRqst.Write never goes backwards, so it doesn't
seem good material for a general routine.

This *seems* correct to me, though this is muddy territory to me.  Also,
are there better ways to go about this?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: LogwrtResult contended spinlock

От

Andres Freund

Дата:

04 сентября 2020 г., 20:05:45

Hi,

On 2020-09-03 14:34:52 -0400, Alvaro Herrera wrote:
> Looking at patterns like this
> 
>     if (XLogCtl->LogwrtRqst.Write < EndPos)
>         XLogCtl->LogwrtRqst.Write = EndPos;
> 
> It seems possible to implement with
> 
>     do {
>         XLogRecPtr    currwrite;
> 
>         currwrite = pg_atomic_read_u64(LogwrtRqst.Write);
>     if (currwrite > EndPos)
>             break;  // already done by somebody else
>         if (pg_atomic_compare_exchange_u64(LogwrtRqst.Write,
>                                        currwrite, EndPos))
>             break;  // successfully updated
>     } while (true);
> 
> This assumes that LogwrtRqst.Write never goes backwards, so it doesn't
> seem good material for a general routine.
> 
> This *seems* correct to me, though this is muddy territory to me.  Also,
> are there better ways to go about this?

Hm, I was thinking that we'd first go for reading it without a spinlock,
but continuing to write it as we currently do.

But yea, I can't see an issue with what you propose here. I personally
find do {} while () weird and avoid it when not explicitly useful, but
that's extremely minor, obviously.

Greetings,

Andres Freund

Re: LogwrtResult contended spinlock

От

Andres Freund

Дата:

04 сентября 2020 г., 20:13:55

Hi,

On 2020-09-04 10:05:45 -0700, Andres Freund wrote:
> On 2020-09-03 14:34:52 -0400, Alvaro Herrera wrote:
> > Looking at patterns like this
> > 
> >     if (XLogCtl->LogwrtRqst.Write < EndPos)
> >         XLogCtl->LogwrtRqst.Write = EndPos;
> > 
> > It seems possible to implement with
> > 
> >     do {
> >         XLogRecPtr    currwrite;
> > 
> >         currwrite = pg_atomic_read_u64(LogwrtRqst.Write);
> >     if (currwrite > EndPos)
> >             break;  // already done by somebody else
> >         if (pg_atomic_compare_exchange_u64(LogwrtRqst.Write,
> >                                        currwrite, EndPos))
> >             break;  // successfully updated
> >     } while (true);
> > 
> > This assumes that LogwrtRqst.Write never goes backwards, so it doesn't
> > seem good material for a general routine.
> > 
> > This *seems* correct to me, though this is muddy territory to me.  Also,
> > are there better ways to go about this?
> 
> Hm, I was thinking that we'd first go for reading it without a spinlock,
> but continuing to write it as we currently do.
> 
> But yea, I can't see an issue with what you propose here. I personally
> find do {} while () weird and avoid it when not explicitly useful, but
> that's extremely minor, obviously.

Re general routine: On second thought, it might actually be worth having
it. Even just for LSNs - there's plenty places where it's useful to
ensure a variable is at least a certain size.  I think I would be in
favor of a general helper function.

Greetings,

Andres Freund

Re: LogwrtResult contended spinlock

От

Anastasia Lubennikova

Дата:

24 ноября 2020 г., 13:36:58

On 04.09.2020 20:13, Andres Freund wrote:
> Hi,
>
> On 2020-09-04 10:05:45 -0700, Andres Freund wrote:
>> On 2020-09-03 14:34:52 -0400, Alvaro Herrera wrote:
>>> Looking at patterns like this
>>>
>>>     if (XLogCtl->LogwrtRqst.Write < EndPos)
>>>         XLogCtl->LogwrtRqst.Write = EndPos;
>>>
>>> It seems possible to implement with
>>>
>>>      do {
>>>          XLogRecPtr    currwrite;
>>>
>>>          currwrite = pg_atomic_read_u64(LogwrtRqst.Write);
>>>     if (currwrite > EndPos)
>>>              break;  // already done by somebody else
>>>          if (pg_atomic_compare_exchange_u64(LogwrtRqst.Write,
>>>                                        currwrite, EndPos))
>>>              break;  // successfully updated
>>>      } while (true);
>>>
>>> This assumes that LogwrtRqst.Write never goes backwards, so it doesn't
>>> seem good material for a general routine.
>>>
>>> This *seems* correct to me, though this is muddy territory to me.  Also,
>>> are there better ways to go about this?
>> Hm, I was thinking that we'd first go for reading it without a spinlock,
>> but continuing to write it as we currently do.
>>
>> But yea, I can't see an issue with what you propose here. I personally
>> find do {} while () weird and avoid it when not explicitly useful, but
>> that's extremely minor, obviously.
> Re general routine: On second thought, it might actually be worth having
> it. Even just for LSNs - there's plenty places where it's useful to
> ensure a variable is at least a certain size.  I think I would be in
> favor of a general helper function.
Do you mean by general helper function something like this?

void
swap_lsn(XLogRecPtr old_value, XLogRecPtr new_value, bool to_largest)
{
   while (true) {
     XLogRecPtr  currwrite;

     currwrite = pg_atomic_read_u64(old_value);

     if (to_largest)
       if (currwrite > new_value)
         break;  /* already done by somebody else */
     else
       if (currwrite < new_value)
         break;  /* already done by somebody else */

     if (pg_atomic_compare_exchange_u64(old_value,
                        currwrite, new_value))
     break;  /* already done by somebody else */
   }
}


which will be called like
swap_lsn(XLogCtl->LogwrtRqst.Write, EndPos, true);

>
> Greetings,
>
> Andres Freund

This CF entry was inactive for a while. Alvaro, are you going to 
continue working on it?

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: LogwrtResult contended spinlock

От

Alvaro Herrera

Дата:

24 ноября 2020 г., 18:01:53

On 2020-Nov-24, Anastasia Lubennikova wrote:

> On 04.09.2020 20:13, Andres Freund wrote:

> > Re general routine: On second thought, it might actually be worth having
> > it. Even just for LSNs - there's plenty places where it's useful to
> > ensure a variable is at least a certain size.  I think I would be in
> > favor of a general helper function.
> Do you mean by general helper function something like this?
> 
> void
> swap_lsn(XLogRecPtr old_value, XLogRecPtr new_value, bool to_largest)

Something like that, yeah, though maybe name it "pg_atomic_increase_lsn"
or some similar name that makes it clear that 

1. it is supposed to use atomics
2. it can only be used to *advance* a value rather than a generic swap.

(I'm not 100% clear that that's the exact API we need.)

> This CF entry was inactive for a while. Alvaro, are you going to continue
> working on it?

Yes, please move it forward.  I'll post an update sometime before the
next CF.

Re: LogwrtResult contended spinlock

От

Masahiko Sawada

Дата:

22 января 2021 г., 16:23:46

Hi Alvaro,

On Wed, Nov 25, 2020 at 12:02 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2020-Nov-24, Anastasia Lubennikova wrote:
>
> > On 04.09.2020 20:13, Andres Freund wrote:
>
> > > Re general routine: On second thought, it might actually be worth having
> > > it. Even just for LSNs - there's plenty places where it's useful to
> > > ensure a variable is at least a certain size.  I think I would be in
> > > favor of a general helper function.
> > Do you mean by general helper function something like this?
> >
> > void
> > swap_lsn(XLogRecPtr old_value, XLogRecPtr new_value, bool to_largest)
>
> Something like that, yeah, though maybe name it "pg_atomic_increase_lsn"
> or some similar name that makes it clear that
>
> 1. it is supposed to use atomics
> 2. it can only be used to *advance* a value rather than a generic swap.
>
> (I'm not 100% clear that that's the exact API we need.)
>
> > This CF entry was inactive for a while. Alvaro, are you going to continue
> > working on it?
>
> Yes, please move it forward.  I'll post an update sometime before the
> next CF.

Anything update on this?

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: LogwrtResult contended spinlock

От

Alvaro Herrera

Дата:

22 января 2021 г., 16:39:00

On 2021-Jan-22, Masahiko Sawada wrote:

> On Wed, Nov 25, 2020 at 12:02 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

> > Yes, please move it forward.  I'll post an update sometime before the
> > next CF.
> 
> Anything update on this?

I'll update this one early next week.

Thanks!

-- 
Álvaro Herrera       Valdivia, Chile
"I would rather have GNU than GNOT."  (ccchips, lwn.net/Articles/37595/)

Re: LogwrtResult contended spinlock

От

Masahiko Sawada

Дата:

22 января 2021 г., 17:28:46

On Fri, Jan 22, 2021 at 10:39 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2021-Jan-22, Masahiko Sawada wrote:
>
> > On Wed, Nov 25, 2020 at 12:02 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> > > Yes, please move it forward.  I'll post an update sometime before the
> > > next CF.
> >
> > Anything update on this?
>
> I'll update this one early next week.

Great, thanks! I'll look at that.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: LogwrtResult contended spinlock

От

Alvaro Herrera

Дата:

29 января 2021 г., 18:40:18

On 2020-Aug-31, Andres Freund wrote:

> Wouldn't the better fix here be to allow reading of individual members without a lock? E.g. by wrapping each in a
64bitatomic.
 

So I've been playing with this and I'm annoyed about having two
datatypes to represent Write/Flush positions:

typedef struct XLogwrtRqst
{
    XLogRecPtr    Write;            /* last byte + 1 to write out */
    XLogRecPtr    Flush;            /* last byte + 1 to flush */
} XLogwrtRqst;

typedef struct XLogwrtResult
{
    XLogRecPtr    Write;            /* last byte + 1 written out */
    XLogRecPtr    Flush;            /* last byte + 1 flushed */
} XLogwrtResult;

Don't they look, um, quite similar?  I am strongly tempted to remove
that distinction, since it seems quite pointless, and introduce a
different one:

typedef struct XLogwrtAtomic
{
    pg_atomic_uint64    Write;        /* last byte + 1 of write position */
    pg_atomic_uint64    Flush;        /* last byte + 1 of flush position */
} XLogwrtAtomic;

this one, with atomics, would be used for the XLogCtl struct members
LogwrtRqst and LogwrtResult, and are always accessed using atomic ops.
On the other hand we would have

typedef struct XLogwrt
{
    XLogRecPtr    Write;        /* last byte + 1 of write position */
    XLogRecPtr    Flush;        /* last byte + 1 of flush position */
} XLogwrt;

to be used for the local-memory-only LogwrtResult, using normal
assignment.

Now, I do wonder if there's a point in keeping LogwrtResult as a local
variable at all; maybe since the ones in shared memory are going to use
unlocked access, we don't need it anymore?  I'd prefer to defer that
decision to after this patch is done, since ISTM that it'd merit more
careful benchmarking.

Thoughts?

-- 
Álvaro Herrera       Valdivia, Chile

Re: LogwrtResult contended spinlock

От

Alvaro Herrera

Дата:

30 января 2021 г., 04:47:56

So I tried this, but -- perhaps not suprisingly -- I can't get it to
work properly; the synchronization fails.  I suspect I need some
barriers, but I tried adding a few (probably some that are not really
necessary) and that didn't have the expected effect.  Strangely, all
tests work for me, but the pg_upgrade one in particular fails.

(The attached is of course POC quality at best.)

I'll have another look next week.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W

Вложения

Re: LogwrtResult contended spinlock

От

Andres Freund

Дата:

30 января 2021 г., 05:02:11

Hi,


On 2021-01-29 12:40:18 -0300, Alvaro Herrera wrote:
> On 2020-Aug-31, Andres Freund wrote:
> 
> > Wouldn't the better fix here be to allow reading of individual members without a lock? E.g. by wrapping each in a
64bitatomic.
 
> 
> So I've been playing with this and I'm annoyed about having two
> datatypes to represent Write/Flush positions:
> 
> typedef struct XLogwrtRqst
> {
>     XLogRecPtr    Write;            /* last byte + 1 to write out */
>     XLogRecPtr    Flush;            /* last byte + 1 to flush */
> } XLogwrtRqst;
> 
> typedef struct XLogwrtResult
> {
>     XLogRecPtr    Write;            /* last byte + 1 written out */
>     XLogRecPtr    Flush;            /* last byte + 1 flushed */
> } XLogwrtResult;
> 
> Don't they look, um, quite similar?  I am strongly tempted to remove
> that distinction, since it seems quite pointless, and introduce a
> different one:

Their spelling drives me nuts. Like, one is *Rqst, the other *Result?
Comeon.


> Now, I do wonder if there's a point in keeping LogwrtResult as a local
> variable at all; maybe since the ones in shared memory are going to use
> unlocked access, we don't need it anymore?  I'd prefer to defer that
> decision to after this patch is done, since ISTM that it'd merit more
> careful benchmarking.

I think doing that might be a bit harmful - we update LogwrtResult
fairly granularly in XLogWrite(). Doing that in those small steps in
shared memory will increase the likelihood of cache misses in other
backends.

Greetings,

Andres Freund

Re: LogwrtResult contended spinlock

От

Andres Freund

Дата:

30 января 2021 г., 05:30:11

Hi,

On 2021-01-29 22:47:56 -0300, Alvaro Herrera wrote:
> So I tried this, but -- perhaps not suprisingly -- I can't get it to
> work properly; the synchronization fails.

What do you mean by "synchronization fails"?


> Strangely, all tests work for me, but the pg_upgrade one in particular
> fails.

It's one of the few tests using fsync=on.


> +static inline void
> +pg_atomic_monotonic_advance_u64(volatile pg_atomic_uint64 *ptr, uint64 target_)
> +{
> +#ifndef PG_HAVE_ATOMIC_U64_SIMULATION
> +    AssertPointerAlignment(ptr, 8);
> +#endif
> +    /* FIXME is this algorithm correct if we have u64 simulation? */

I don't see a problem.


> +    while (true)
> +    {
> +        uint64        currval;
> +
> +        currval = pg_atomic_read_u64(ptr);
> +        if (currval > target_)
> +            break;    /* already done by somebody else */
> +        if (pg_atomic_compare_exchange_u64(ptr, &currval, target_))
> +            break;    /* successfully done */
> +    }
> +}

I suggest writing this as

    currval = pg_atomic_read_u64(ptr);
    while (currval < target_)
    {
        if (pg_atomic_compare_exchange_u64(ptr, &currval, target_))
            break;
    }

>  /*
>   * Inserting to WAL is protected by a small fixed number of WAL insertion
> @@ -596,8 +599,10 @@ typedef struct XLogCtlData
>  {
>      XLogCtlInsert Insert;
>  
> +    XLogwrtAtomic LogwrtRqst;
> +    XLogwrtAtomic LogwrtResult;
> +
>      /* Protected by info_lck: */
> -    XLogwrtRqst LogwrtRqst;

Not sure putting these into the same cacheline is a good idea.


> @@ -2166,12 +2163,11 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic)
>              if (opportunistic)
>                  break;
>  
> -            /* Before waiting, get info_lck and update LogwrtResult */
> -            SpinLockAcquire(&XLogCtl->info_lck);
> -            if (XLogCtl->LogwrtRqst.Write < OldPageRqstPtr)
> -                XLogCtl->LogwrtRqst.Write = OldPageRqstPtr;
> -            LogwrtResult = XLogCtl->LogwrtResult;
> -            SpinLockRelease(&XLogCtl->info_lck);
> +            /* Before waiting, update LogwrtResult */
> +            pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtRqst.Write, OldPageRqstPtr);
> +
> +            LogwrtResult.Write = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Write);
> +            LogwrtResult.Flush = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Flush);

I don't think it's quite as easy as this. Write/Flush now aren't
guaranteed to be coherent with each other - previously they were. And
because it's in a global variable used everywhere, we can't just be more
careful about update protocols in one place...

We also shouldn't re-read a variable that we just did via the
pg_atomic_monotonic_advance_u64().

I think we should stop updating both the Write/Flush position at the
same time. That way we don't have an expectation of them being coherent
with each other. Most places really don't need both anyway.


>      {
> -        SpinLockAcquire(&XLogCtl->info_lck);
> -        XLogCtl->LogwrtResult = LogwrtResult;
> -        if (XLogCtl->LogwrtRqst.Write < LogwrtResult.Write)
> -            XLogCtl->LogwrtRqst.Write = LogwrtResult.Write;
> -        if (XLogCtl->LogwrtRqst.Flush < LogwrtResult.Flush)
> -            XLogCtl->LogwrtRqst.Flush = LogwrtResult.Flush;
> -        SpinLockRelease(&XLogCtl->info_lck);
> +        pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtResult.Write, LogwrtResult.Write);
> +        pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtResult.Flush, LogwrtResult.Flush);
> +
> +        pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtRqst.Write, LogwrtResult.Write);
> +        pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtRqst.Flush, LogwrtResult.Flush);

Hm. We went from one cheap atomic operation (SpinLockAcquire) to four
expensive ones in the happy path. That's not free...

I don't think we need to manipulate XLogCtl->LogwrtResult.* using atomic
ops - they can only be updated with WALWriteLock held, right?

XLogCtl->LogwrtResult was updated with plain assignment before, why did
you change it to pg_atomic_monotonic_advance_u64()?


> @@ -1169,6 +1169,8 @@ XLogInsertRecord(XLogRecData *rdata,
>      {
>          /* advance global request to include new block(s) */
>          pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtRqst.Write, EndPos);
> +        pg_memory_barrier();
> +

That's not really useful - the path that actually updates already
implies a barrier. It'd probably be better to add a barrier to a "never
executed cmpxchg" fastpath.



> @@ -2905,6 +2909,7 @@ XLogFlush(XLogRecPtr record)
>          WriteRqstPtr = pg_atomic_read_u64(&XLogCtl->LogwrtRqst.Write);
>          LogwrtResult.Write = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Write);
>          LogwrtResult.Flush = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Flush);
> +        pg_read_barrier();

I'm not sure you really can get away with just a read barrier in these
places. We can't e.g. have later updates to other shared memory
variables be "moved" to before the barrier (which a read barrier
allows).

Greetings,

Andres Freund

Re: LogwrtResult contended spinlock

От

Alvaro Herrera

Дата:

03 февраля 2021 г., 02:19:19

Hello,

So I addressed about half of your comments in this version merely by
fixing silly bugs.  The problem I had which I described as
"synchronization fails" was one of those silly bugs.

So in further thinking, it seems simpler to make only LogwrtResult
atomic, and leave LogwrtRqst as currently, using the spinlock.  This
should solve the contention problem we saw at the customer (but I've
asked Jaime very nicely to do a test run, if possible, to confirm).

For things like logical replication, which call GetFlushRecPtr() very
frequently (causing the spinlock issue we saw) it is good, because we're
no longer hitting the spinlock at all in that case.

I have another (pretty mechanical) patch that renames LogwrtResult.Write
to LogWriteResult and LogwrtResult.Flush to LogFlushResult.  That more
clearly shows that we're no longer updating them on unison.  Didn't want
to attach here because I didn't rebase on current one.  But it seems
logical: there's no longer any point in doing struct assignment, which
is the only thing that stuff was good for.

On 2021-Jan-29, Andres Freund wrote:

> > @@ -1169,6 +1169,8 @@ XLogInsertRecord(XLogRecData *rdata,
> >      {
> >          /* advance global request to include new block(s)
> >          */
> >          pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtRqst.Write, EndPos);
> > +        pg_memory_barrier();
> 
> That's not really useful - the path that actually updates already
> implies a barrier. It'd probably be better to add a barrier to a "never
> executed cmpxchg" fastpath.

Got it.  Do you mean in pg_atomic_monotonic_advance_u64() itself?  I'm
not sure which is the nicer semantics.  (If it's got to be at the
caller, then we'll need to return a boolean from there, which sounds
worse.)

> > @@ -2905,6 +2909,7 @@ XLogFlush(XLogRecPtr record)
> >          WriteRqstPtr = pg_atomic_read_u64(&XLogCtl->LogwrtRqst.Write);
> >          LogwrtResult.Write = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Write);
> >          LogwrtResult.Flush = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Flush);
> > +        pg_read_barrier();
> 
> I'm not sure you really can get away with just a read barrier in these
> places. We can't e.g. have later updates to other shared memory
> variables be "moved" to before the barrier (which a read barrier
> allows).

Ah, that makes sense.

I have not really studied the barrier locations terribly closely in this
version of the patch.  It probably misses some (eg. in GetFlushRecPtr
and GetXLogWriteRecPtr).  It is passing the tests for me, but alone
that's probably not enough.  I'm gonna try and study the generated
assembly and see if I can make sense of things ...

-- 
Álvaro Herrera                            39°49'30"S 73°17'W

Вложения

Re: LogwrtResult contended spinlock

От

Andres Freund

Дата:

03 февраля 2021 г., 03:32:50

Hi,

On 2021-02-02 20:19:19 -0300, Alvaro Herrera wrote:
> > > @@ -1169,6 +1169,8 @@ XLogInsertRecord(XLogRecData *rdata,
> > >      {
> > >          /* advance global request to include new block(s)
> > >          */
> > >          pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtRqst.Write, EndPos);
> > > +        pg_memory_barrier();
> >
> > That's not really useful - the path that actually updates already
> > implies a barrier. It'd probably be better to add a barrier to a "never
> > executed cmpxchg" fastpath.
>
> Got it.  Do you mean in pg_atomic_monotonic_advance_u64() itself?

Yes.


> I'm not sure which is the nicer semantics.  (If it's got to be at the
> caller, then we'll need to return a boolean from there, which sounds
> worse.)

Nearly all other modifying atomic operations have full barrier
semantics, so I think it'd be better to have it inside the
pg_atomic_monotonic_advance_u64().


> +/*
> + * Monotonically advance the given variable using only atomic operations until
> + * it's at least the target value.
> + */
> +static inline void
> +pg_atomic_monotonic_advance_u64(volatile pg_atomic_uint64 *ptr, uint64 target_)
> +{
> +    uint64        currval;
> +
> +#ifndef PG_HAVE_ATOMIC_U64_SIMULATION
> +    AssertPointerAlignment(ptr, 8);
> +#endif
> +
> +    currval = pg_atomic_read_u64(ptr);
> +    while (currval < target_)
> +    {
> +        if (pg_atomic_compare_exchange_u64(ptr, &currval, target_))
> +            break;
> +    }
> +}

So I think it'd be

static inline void
pg_atomic_monotonic_advance_u64(volatile pg_atomic_uint64 *ptr, uint64 target_)
{
    uint64      currval;

#ifndef PG_HAVE_ATOMIC_U64_SIMULATION
    AssertPointerAlignment(ptr, 8);
#endif

    currval = pg_atomic_read_u64(ptr);
    if (currval >= target_)
    {
        pg_memory_barrier();
        return;
    }

    while (currval < target_)
    {
        if (pg_atomic_compare_exchange_u64(ptr, &currval, target_))
            break;
    }
}


> @@ -1172,9 +1170,10 @@ XLogInsertRecord(XLogRecData *rdata,
>          /* advance global request to include new block(s) */
>          if (XLogCtl->LogwrtRqst.Write < EndPos)
>              XLogCtl->LogwrtRqst.Write = EndPos;
> -        /* update local result copy while I have the chance */
> -        LogwrtResult = XLogCtl->LogwrtResult;
>          SpinLockRelease(&XLogCtl->info_lck);
> +        /* update local result copy */
> +        LogwrtResult.Write = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Write);
> +        LogwrtResult.Flush = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Flush);
>      }

As mentioned before - it's not clear to me why this is a valid thing to
do without verifying all LogwrtResult.* usages. You can get updates
completely out of order / independently.


> @@ -2675,8 +2673,10 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
>       * code in a couple of places.
>       */
>      {
> +        pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtResult.Write, LogwrtResult.Write);
> +        pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtResult.Flush, LogwrtResult.Flush);
> +        pg_memory_barrier();
>          SpinLockAcquire(&XLogCtl->info_lck);
> -        XLogCtl->LogwrtResult = LogwrtResult;

I still don't see why we need "locked" atomic operations here, rather
than just a pg_atomic_write_u64(). They can only be modified
with WALWriteLock held.  There's two reasons for using a spinlock in
this place:
1) it avoids torn reads of 64bit values -
   pg_atomic_write_u64()/pg_atomic_read_u64() avoid that already.
2) it ensures that Write/Flush are updated in unison - but that's not
   useful anymore, given that other places now read the variables
   separately.


> @@ -3064,8 +3063,10 @@ XLogBackgroundFlush(void)
>      WriteRqst.Write -= WriteRqst.Write % XLOG_BLCKSZ;
>
>      /* if we have already flushed that far, consider async commit records */
> +    LogwrtResult.Flush = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Flush);
>      if (WriteRqst.Write <= LogwrtResult.Flush)
>      {
> +        pg_memory_barrier();
>          SpinLockAcquire(&XLogCtl->info_lck);
>          WriteRqst.Write = XLogCtl->asyncXactLSN;
>          SpinLockRelease(&XLogCtl->info_lck);

A SpinLockAcquire() is a full memory barrier on its own I think. This'd
probably better solved by just making asyncXactLSN atomic...


Greetings,

Andres Freund

Re: LogwrtResult contended spinlock

От

Jaime Casanova

Дата:

08 сентября 2021 г., 18:14:55

On Tue, Feb 02, 2021 at 08:19:19PM -0300, Alvaro Herrera wrote:
> Hello,
> 
> So I addressed about half of your comments in this version merely by
> fixing silly bugs.  The problem I had which I described as
> "synchronization fails" was one of those silly bugs.
> 

Hi Álvaro,

Are we waiting for another version of the patch based on Andres'
comments? Or this version is good enough for testing?

BTW, current patch still applies cleanly.

regards,

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

Re: LogwrtResult contended spinlock

От

Daniel Gustafsson

Дата:

04 ноября 2021 г., 15:12:13

> On 8 Sep 2021, at 17:14, Jaime Casanova <jcasanov@systemguards.com.ec> wrote:
>
> On Tue, Feb 02, 2021 at 08:19:19PM -0300, Alvaro Herrera wrote:
>> Hello,
>>
>> So I addressed about half of your comments in this version merely by
>> fixing silly bugs.  The problem I had which I described as
>> "synchronization fails" was one of those silly bugs.
>>
>
> Hi Álvaro,
>
> Are we waiting for another version of the patch based on Andres'
> comments? Or this version is good enough for testing?

Any update on this?  Should the patch be reset back to "Needs review" or will
there be a new version?

--
Daniel Gustafsson        https://vmware.com/

Re: LogwrtResult contended spinlock

От

Alvaro Herrera

Дата:

19 ноября 2021 г., 01:15:58

Here's a further attempt at this.  Sorry it took so long.
In this version, I replaced the coupled-in-a-struct representation of
Write&Flush with two separate global variables.  The reason to do this
is to cater to Andres' idea to keep them up-to-date separately.  Of
course, I could kept them together, but it seems more sensible this way.

Andres also suggested to not use monotonic-advance in XLogWrite, because
these update are always done with WALWriteLock held in exclusive mode.
However, that doesn't work (and tests fail pretty quickly) because
XLogWrite seems to be called possibly out of order, so you still have to
check if the current value is newer than what you have to write anyway;
using stock atomic write doesn't work.

So this is v4.

-- 
Álvaro Herrera           39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/

Вложения

logwrt-4.patch

Re: LogwrtResult contended spinlock

От

Alvaro Herrera

Дата:

23 ноября 2021 г., 00:56:43

There was an earlier comment by Andres that asyncXactLSN should also be
atomic, to avoid an ugly spinlock interaction with the new atomic-based
logwrtResult.  The 0002 here is an attempt at doing that; I found that
it also needed to change WalWriterSleeping to use atomics, to avoid
XLogSetAsyncXactLSN having to grab the spinlock for that.


-- 
Álvaro Herrera              Valdivia, Chile  —  https://www.EnterpriseDB.com/
"Learn about compilers. Then everything looks like either a compiler or
a database, and now you have two problems but one of them is fun."
            https://twitter.com/thingskatedid/status/1456027786158776329

Вложения

Re: LogwrtResult contended spinlock

От

Andres Freund

Дата:

22 марта 2022 г., 04:11:18

Hi,

On 2021-11-22 18:56:43 -0300, Alvaro Herrera wrote:
> There was an earlier comment by Andres that asyncXactLSN should also be
> atomic, to avoid an ugly spinlock interaction with the new atomic-based
> logwrtResult.  The 0002 here is an attempt at doing that; I found that
> it also needed to change WalWriterSleeping to use atomics, to avoid
> XLogSetAsyncXactLSN having to grab the spinlock for that.

This patch currently doesn't apply: http://cfbot.cputube.org/patch_37_2716.log

Are you aiming this for v15? Otherwise I'd like to move the entry to the next
CF. Marked as waiting-on-author.

Greetings,

Andres Freund

Re: LogwrtResult contended spinlock

От

Alvaro Herrera

Дата:

22 марта 2022 г., 16:49:13

On 2022-Mar-21, Andres Freund wrote:

> This patch currently doesn't apply: http://cfbot.cputube.org/patch_37_2716.log

Updated.

> Are you aiming this for v15? Otherwise I'd like to move the entry to the next
> CF. Marked as waiting-on-author.

I'd like to get 0001 pushed to pg15, yes.  I'll let 0002 sit here for
discussion, but I haven't seen any evidence that we need it.  If others
vouch for it, I can push that one too, but I'd rather have it be a
separate thing.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"El miedo atento y previsor es la madre de la seguridad" (E. Burke)

Вложения

Re: LogwrtResult contended spinlock

От

Tom Lane

Дата:

22 марта 2022 г., 16:59:16

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2022-Mar-21, Andres Freund wrote:
>> Are you aiming this for v15? Otherwise I'd like to move the entry to the next
>> CF. Marked as waiting-on-author.

> I'd like to get 0001 pushed to pg15, yes.  I'll let 0002 sit here for
> discussion, but I haven't seen any evidence that we need it.  If others
> vouch for it, I can push that one too, but I'd rather have it be a
> separate thing.

I looked briefly at 0001, and I've got to say that I disagree with
your decision to rearrange the representation of the local LogwrtResult
copy.  It clutters the patch tremendously and makes it hard to
understand what the actual functional change is.  Moreover, I'm
not entirely convinced that it's a notational improvement in the
first place.

Perhaps it'd help if you split 0001 into two steps, one to do the
mechanical change of the representation and then a second patch that
converts the shared variable to atomics.  Since you've moved around
the places that read the shared variable, that part is subtler than
one could wish and really needs to be studied on its own.

            regards, tom lane

Re: LogwrtResult contended spinlock

От

Alvaro Herrera

Дата:

22 марта 2022 г., 19:11:20

On 2022-Mar-22, Tom Lane wrote:

> I looked briefly at 0001, and I've got to say that I disagree with
> your decision to rearrange the representation of the local LogwrtResult
> copy.  It clutters the patch tremendously and makes it hard to
> understand what the actual functional change is.  Moreover, I'm
> not entirely convinced that it's a notational improvement in the
> first place.
> 
> Perhaps it'd help if you split 0001 into two steps, one to do the
> mechanical change of the representation and then a second patch that
> converts the shared variable to atomics.  Since you've moved around
> the places that read the shared variable, that part is subtler than
> one could wish and really needs to be studied on its own.

Hmm, I did it the other way around: first change to use atomics, then
the mechanical change.  I think that makes the usefulness of the change
more visible, because before the atomics use the use of the combined
struct as a unit remains sensible.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
Officer Krupke, what are we to do?
Gee, officer Krupke, Krup you! (West Side Story, "Gee, Officer Krupke")

Вложения

Re: LogwrtResult contended spinlock

От

Alvaro Herrera

Дата:

22 марта 2022 г., 21:58:34

So I've been wondering about this block at the bottom of XLogWrite:

    /*
     * Make sure that the shared 'request' values do not fall behind the
     * 'result' values.  This is not absolutely essential, but it saves some
     * code in a couple of places.
     */
    {
        SpinLockAcquire(&XLogCtl->info_lck);
        if (XLogCtl->LogwrtRqst.Write < LogwrtResult.Write)
            XLogCtl->LogwrtRqst.Write = LogwrtResult.Write;
        if (XLogCtl->LogwrtRqst.Flush < LogwrtResult.Flush)
            XLogCtl->LogwrtRqst.Flush = LogwrtResult.Flush;
        SpinLockRelease(&XLogCtl->info_lck);
    }

I just noticed that my 0001 makes the comment a lie: it is now quite
possible that 'result' is advanced beyond 'request'.  Before the patch
that never happened because they were both advanced in the region locked
by the spinlock.

I think we could still maintain this promise if we just moved this
entire block before the first pg_atomic_monotonic_advance_u64 setting
XLogCtl->LogwrtResult.Write.  Or we could halve the whole block, and put
one acquire/test/set/release stanza before each monotonic increase of
the corresponding variable.


However, I wonder if this is still necessary.  This code was added in
4d14fe0048c (March 2001) and while everything else was quite different
back then, this hasn't changed at all.  I can't quite figure out what
are those "couple of places" that would need additional code if this
block is just removed.  I tried running the tests (including
wal_consistency_checking), and nothing breaks.  Reading the code
surrounding the other accesses of XLogCtl->LogwrtRqst, there's nothing
that looks to me like it depends on these values not lagging behind
LogwrtResult.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"If you have nothing to say, maybe you need just the right tool to help you
not say it."                   (New York Times, about Microsoft PowerPoint)

Re: LogwrtResult contended spinlock

От

Alvaro Herrera

Дата:

05 апреля 2022 г., 18:47:04

Here's a v8, where per my previous comment I removed some code that I
believe is no longer necessary.

I've omitted the patch that renames LogwrtResult subvariables into
LogWriteResult/LogWriteFlush; I still think the end result is better
after that one, but it's a pretty trivial change that can be dealt with
separately afterwards.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Saca el libro que tu religión considere como el indicado para encontrar la
oración que traiga paz a tu alma. Luego rebootea el computador
y ve si funciona" (Carlos Duclós)

Вложения

v8-0001-Split-LogwrtResult-into-separate-variables.patch

Re: LogwrtResult contended spinlock

От

Alvaro Herrera

Дата:

05 апреля 2022 г., 21:16:28

Apologies -- I selected the wrong commit to extract the commit message
from.  Here it is again.  I also removed an obsolete /* XXX */ comment.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/

Вложения

v9-0001-Make-XLogCtl-LogwrtResult-accessible-with-atomics.patch

Re: LogwrtResult contended spinlock

От

Alvaro Herrera

Дата:

07 апреля 2022 г., 21:29:44

On 2022-Apr-05, Alvaro Herrera wrote:

> Apologies -- I selected the wrong commit to extract the commit message
> from.  Here it is again.  I also removed an obsolete /* XXX */ comment.

I spent a lot of time staring at this to understand the needs for memory
barriers in the interactions.  In the end I decided not to get this out
for this cycle because I don't want to create subtle bugs in WAL.  I'll
come back with this for pg16.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"El que vive para el futuro es un iluso, y el que vive para el pasado,
un imbécil" (Luis Adler, "Los tripulantes de la noche")

Re: LogwrtResult contended spinlock

От

Alvaro Herrera

Дата:

28 июля 2022 г., 09:59:20

v10 is just a trivial rebase.  No changes.  Moved to next commitfest.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Вложения

v10-0001-Make-XLogCtl-LogwrtResult-accessible-with-atomic.patch

Re: LogwrtResult contended spinlock

От

Alvaro Herrera

Дата:

23 сентября 2022 г., 11:49:41

On 2022-Jul-28, Alvaro Herrera wrote:

> v10 is just a trivial rebase.  No changes.  Moved to next commitfest.

I realized that because of commit e369f3708636 this change is no longer
as critical as it used to be, so I'm withdrawing this patch from the
commitfest.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"No necesitamos banderas
 No reconocemos fronteras"                  (Jorge González)

Re: LogwrtResult contended spinlock

От

Jeff Davis

Дата:

13 февраля, 04:44:33

On Fri, 2022-09-23 at 10:49 +0200, Alvaro Herrera wrote:
> On 2022-Jul-28, Alvaro Herrera wrote:
>
> > v10 is just a trivial rebase.  No changes.  Moved to next
> > commitfest.
>
> I realized that because of commit e369f3708636 this change is no
> longer
> as critical as it used to be, so I'm withdrawing this patch from the
> commitfest.

It looks like there's some renewed interest in this patch:

https://www.postgresql.org/message-id/20240213001150.4uqzh7tinuhvoopl@awork3.anarazel.de

even if there's not a pressing performance conern now, it could clean
up the complexity of having a non-shared copy of LogwrtResult.

Regards,
    Jeff Davis

Re: LogwrtResult contended spinlock

От

Jeff Davis

Дата:

16 февраля, 23:54:25

On Mon, 2024-02-12 at 17:44 -0800, Jeff Davis wrote:
> It looks like there's some renewed interest in this patch:

After rebasing (attached as 0001), I'm seeing some test failures. It
looks like the local LogwrtResult is not being updated in as many
places, and that's hitting the Assert that I recently added. The fix is
easy (attached as 0002).

Though it looks like we can remove the non-shared LogwrtResult
entirely. Andres expressed some concern here:

https://www.postgresql.org/message-id/20210130020211.rtu5ir3dpjrbiats@alap3.anarazel.de

But then seemed to favor removing it here:

https://www.postgresql.org/message-id/20240213001150.4uqzh7tinuhvoopl@awork3.anarazel.de

I'm inclined to think we can get rid of the non-shared copy.

A few other comments:

 * Should GetFlushRecPtr()/GetXLogWriteRecPtr() use a read memory
barrier?
 * Why did you add pg_memory_barrier() right before a spinlock
acquisition?
 * Is it an invariant that Write >= Flush at all times? Are there
guaranteed to be write barriers in the right place to ensure that?

I would also like it if we could add a new "Copy" pointer indicating
how much WAL data has been copied to the WAL buffers. That would be set
by WaitXLogInsertionsToFinish() so that subsequent calls are cheap.
Attached a patch (0003) for illustration purposes. It adds to the size
of XLogCtlData, but it's fairly large already, so I'm not sure if
that's a problem. If we do add this, there would be an invariant that
Copy >= Write at all times.

Regards,
    Jeff Davis

Вложения

Re: LogwrtResult contended spinlock

От

Bharath Rupireddy

Дата:

21 февраля, 17:30:00

On Sat, Feb 17, 2024 at 2:24 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> Though it looks like we can remove the non-shared LogwrtResult
> entirely. Andres expressed some concern here:
>
> https://www.postgresql.org/message-id/20210130020211.rtu5ir3dpjrbiats@alap3.anarazel.de
>
> But then seemed to favor removing it here:
>
> https://www.postgresql.org/message-id/20240213001150.4uqzh7tinuhvoopl@awork3.anarazel.de
>
> I'm inclined to think we can get rid of the non-shared copy.

The local copy of LogwrtResult is so frequently used in the backends,
if we were to replace it with atomic accesses, won't the atomic reads
be costly and start showing up in perf profiles? In any case, I prefer
discussing this separately once we get to a conclusion on converting
shared memory Write and Flush ptrs to atomics.

> A few other comments:
>
>  * Should GetFlushRecPtr()/GetXLogWriteRecPtr() use a read memory
> barrier?
>  * Why did you add pg_memory_barrier() right before a spinlock
> acquisition?
>  * Is it an invariant that Write >= Flush at all times? Are there
> guaranteed to be write barriers in the right place to ensure that?

I'll continue to think about these points.

> I would also like it if we could add a new "Copy" pointer indicating
> how much WAL data has been copied to the WAL buffers. That would be set
> by WaitXLogInsertionsToFinish() so that subsequent calls are cheap.
> Attached a patch (0003) for illustration purposes. It adds to the size
> of XLogCtlData, but it's fairly large already, so I'm not sure if
> that's a problem. If we do add this, there would be an invariant that
> Copy >= Write at all times.

Thanks. I have a few comments on v11j patches.

1. I guess we need to initialize the new atomics with
pg_atomic_init_u64 initially in XLOGShmemInit:

2. I guess we need to update both the Write and Flush local copies in
AdvanceXLInsertBuffer. Overall, I guess we need to update both the
Write and Flush local copies whenever we previously read LogwrtResult,
no? Missing to do that caused regression tests to fail and since we
were able to catch it, we ended up with v11j-0002-fixup.patch. There
might be cases where we aren't able to catch if we didn't update both
Write and Flush local copies. I'd suggest we just wrap both of these
under a macro:

     LogwrtResult.Flush = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Flush);
     LogwrtResult.Write = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Write);

and just use it wherever we did LogwrtResult = XLogCtl->LogwrtResult;
previously but not under spinlock.

3.
@@ -6371,7 +6373,9 @@ GetFlushRecPtr(TimeLineID *insertTLI)
 {
     Assert(XLogCtl->SharedRecoveryState == RECOVERY_STATE_DONE);

+    pg_read_barrier();
     LogwrtResult.Flush = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Flush);
+    LogwrtResult.Write = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Write);

Do we need a read barrier here to not reorder things when more than
one process is accessing the flush and write ptrs? If at all, a read
barrier is warranted here, we can use atomic read with full barrier
sematices as proposed here (when that's in) -
https://www.postgresql.org/message-id/20231127210030.GA140335%40nathanxps13.

4.
+    /*
+     * Update local and shared status.  This is OK to do without any locks
+     * because no other process can be reading or writing WAL yet.
+     */
     LogwrtResult.Write = LogwrtResult.Flush = EndOfLog;

+    pg_atomic_write_u64(&XLogCtl->LogwrtResult.Write, EndOfLog);
+    pg_atomic_write_u64(&XLogCtl->LogwrtResult.Flush, EndOfLog);
     XLogCtl->LogwrtRqst.Write = EndOfLog;

pg_atomic_write_u64 here seems fine to me as no other process is
active writing WAL yet, otherwise, we need write with full barrier
something like pg_write_barrier + pg_atomic_write_u64 or
pg_atomic_exchange_u64 like in LWLockUpdateVar or use atomic write
with full barrier sematices as proposed here (when that's in) -
https://www.postgresql.org/message-id/20231127210030.GA140335%40nathanxps13.

5. I guess we'd better use pg_atomic_read_u64_impl and
pg_atomic_compare_exchange_u64_impl in pg_atomic_monotonic_advance_u64
to reduce one level of function indirections. Of course, we need the
pointer alignments that pg_atomic_compare_exchange_u64 in the new
monotonic function.

6.
+ * Full barrier semantics (even when value is unchanged).

+    currval = pg_atomic_read_u64(ptr);
+    if (currval >= target_)
+    {
+        pg_memory_barrier();

Perhaps, we can use atomic read with full barrier sematices proposed
here - https://www.postgresql.org/message-id/20231127210030.GA140335%40nathanxps13

7.
+static inline void
+pg_atomic_monotonic_advance_u64(volatile pg_atomic_uint64 *ptr, uint64 target_)

Just for the sake of completeness, do we need
pg_atomic_monotonic_advance_u32 as well?

8. I'd split the addition of these new monotonic functions into 0001,
then 0002 adding XLogwrtAtomic and 0003 adding Copy to XLogwrtAtomic.

9.
+    copyptr = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Copy);
+    if (startptr + count > copyptr)
+        ereport(WARNING,
+                (errmsg("request to read past end of generated WAL;
request %X/%X, current position %X/%X",
+                        LSN_FORMAT_ARGS(startptr + count),
LSN_FORMAT_ARGS(copyptr))));

Any specific reason for this to be a WARNING rather than an ERROR?

I've addressed some of my review comments (#1, #5, #7 and #8) above,
merged the v11j-0002-fixup.patch into 0002, ran pgindent. Please see
the attached v12 patch set. FWIW, CF bot is happy with these patches
and also tests with --enable-atomics=no are clean.

BTW, I couldn't find a CF entry for this, so I've registered one -
https://commitfest.postgresql.org/47/4846/.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: LogwrtResult contended spinlock

От

Jeff Davis

Дата:

21 февраля, 23:24:42

On Wed, 2024-02-21 at 20:00 +0530, Bharath Rupireddy wrote:
> The local copy of LogwrtResult is so frequently used in the backends,
> if we were to replace it with atomic accesses, won't the atomic reads
> be costly and start showing up in perf profiles?

I don't see exactly where the extra cost would come from. What is your
concern?

In functions that access it several times, it may make sense to copy it
to a function-local variable, of course. But having a global non-shared
LogwrtResult doesn't seem particularly useful to me.

>  In any case, I prefer
> discussing this separately once we get to a conclusion on converting
> shared memory Write and Flush ptrs to atomics.

I agree that it shouldn't block the earlier patches, but it seems on-
topic for this thread.

>
> 1. I guess we need to initialize the new atomics with
> pg_atomic_init_u64 initially in XLOGShmemInit:

Thank you.

> 2. I guess we need to update both the Write and Flush local copies in
> AdvanceXLInsertBuffer.

I agree. Whenever we update the non-shared LogwrtResult, let's update
the whole thing. Otherwise it's confusing.

> 3.
> @@ -6371,7 +6373,9 @@ GetFlushRecPtr(TimeLineID *insertTLI)
>  {
>      Assert(XLogCtl->SharedRecoveryState == RECOVERY_STATE_DONE);
>
> +    pg_read_barrier();
>      LogwrtResult.Flush = pg_atomic_read_u64(&XLogCtl-
> >LogwrtResult.Flush);
> +    LogwrtResult.Write = pg_atomic_read_u64(&XLogCtl-
> >LogwrtResult.Write);
>
> Do we need a read barrier here to not reorder things when more than
> one process is accessing the flush and write ptrs?

The above seems wrong: we don't want to reorder a load of Write before
a load of Flush, otherwise it could look like Write < Flush.
(Similarly, we never want to reorder a store of Flush before a store of
Write.) So I think we should have another read barrier between those
two atomic reads.

I also think we need the read barrier at the beginning because the
caller doesn't expect a stale value of the Flush pointer. It might not
be strictly required for correctness, because I don't think that stale
value can be inconsistent with anything else, but getting an up-to-date
value is better.

>  If at all, a read
> barrier is warranted here, we can use atomic read with full barrier

I don't think we need a full barrier but I'm fine with using
pg_atomic_read_membarrier_u64() if it's better for whatever reason.

> +    pg_atomic_write_u64(&XLogCtl->LogwrtResult.Write, EndOfLog);
> +    pg_atomic_write_u64(&XLogCtl->LogwrtResult.Flush, EndOfLog);
>      XLogCtl->LogwrtRqst.Write = EndOfLog;
>
> pg_atomic_write_u64 here seems fine to me as no other process is
> active writing WAL yet, otherwise, we need write with full barrier

I don't see any memory ordering hazard. In any case, there's an
LWLockAcquire a few lines later, which acts as a full barrier.

> something like pg_write_barrier + pg_atomic_write_u64 or
> pg_atomic_exchange_u64 like in LWLockUpdateVar or use atomic write
> with full barrier sematices as proposed here (when that's in) -
> https://www.postgresql.org/message-id/20231127210030.GA140335%40nathanxps13
> .
>
> 5. I guess we'd better use pg_atomic_read_u64_impl and
> pg_atomic_compare_exchange_u64_impl in
> pg_atomic_monotonic_advance_u64
> to reduce one level of function indirections.

Aren't they inlined?

> 6.
> + * Full barrier semantics (even when value is unchanged).
>
> +    currval = pg_atomic_read_u64(ptr);
> +    if (currval >= target_)
> +    {
> +        pg_memory_barrier();
>
> Perhaps, we can use atomic read with full barrier sematices proposed
> here -
> https://www.postgresql.org/message-id/20231127210030.GA140335%40nathanxps13

I don't think they are exactly equivalent: in the current patch, the
first pg_atomic_read_u64() could be reordered with earlier reads;
whereas that wouldn't work if using pg_atomic_read_membarrier_u64() it
could not be. I'm not sure whether that could create a performance
problem or not.

> Just for the sake of completeness, do we need
> pg_atomic_monotonic_advance_u32 as well?

+1.

> 8. I'd split the addition of these new monotonic functions into 0001,
> then 0002 adding XLogwrtAtomic and 0003 adding Copy to XLogwrtAtomic.

+1

> 9.
> +    copyptr = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Copy);
> +    if (startptr + count > copyptr)
> +        ereport(WARNING,
> +                (errmsg("request to read past end of generated WAL;
> request %X/%X, current position %X/%X",
> +                        LSN_FORMAT_ARGS(startptr + count),
> LSN_FORMAT_ARGS(copyptr))));
>
> Any specific reason for this to be a WARNING rather than an ERROR?

Good question. WaitXLogInsertionsToFinish() uses a LOG level message
for the same situation. They should probably be the same log level, and
I would think it would be either PANIC or WARNING. I have no idea why
LOG was chosen.

> I've addressed some of my review comments (#1, #5, #7 and #8) above,
> merged the v11j-0002-fixup.patch into 0002, ran pgindent. Please see
> the attached v12 patch set. FWIW, CF bot is happy with these patches
> and also tests with --enable-atomics=no are clean.
>
> BTW, I couldn't find a CF entry for this, so I've registered one -
> https://commitfest.postgresql.org/47/4846/.

Thank you.

0001:

* The comments on the two versions of the functions are redundant, and
the style in that header seems to be to omit the comment from the u64
version. 

* I'm not sure the AssertPointerAlignment is needed in the u32 version?

0002:

* All updates to the non-shared LogwrtResult should update both values.
It's confusing to update those local values independently, because it
violates the invariant that LogwrtResult.Flush <= LogwrtResult.Write.

* pg_memory_barrier() is not needed right before a spinlock

* As mentioned above, I think GetFlushRecPtr() needs two read barriers.
Also, I think the same for GetXLogWriteRecPtr().

* In general, for any place using both Write and Flush, I think Flush
should be loaded first, followed by a read barrier, followed by a load
of the Write pointer. And I think in most of those places there should
be a read barrier before the load of Flush, too, to avoid a stale value
in places that might matter.

0003:

* We need to maintain the invariant that Copy >= Write >= Flush. I
believe that's always satisfied, because the
XLogWaitInsertionsToFinish() is always called before XLogWrite(). But
we should add an assert or runtime check of this invariant somewhere.

Alvaro, do you intend to review and/or commit this work eventually? If
you've set it down, I can take it. There are still a few details so I'm
not sure it's ready for commit quite yet, but it's getting closer.

Regards,
    Jeff Davis

Re: LogwrtResult contended spinlock

От

Robert Haas

Дата:

22 февраля, 07:47:12

On Thu, Feb 22, 2024 at 1:54 AM Jeff Davis <pgsql@j-davis.com> wrote:
> On Wed, 2024-02-21 at 20:00 +0530, Bharath Rupireddy wrote:
> > The local copy of LogwrtResult is so frequently used in the backends,
> > if we were to replace it with atomic accesses, won't the atomic reads
> > be costly and start showing up in perf profiles?
>
> I don't see exactly where the extra cost would come from. What is your
> concern?
>
> In functions that access it several times, it may make sense to copy it
> to a function-local variable, of course. But having a global non-shared
> LogwrtResult doesn't seem particularly useful to me.

I would love to get rid of the global variable; all of our code relies
too much on global variables, but the xlog code is some of the worst.
The logic is complex and difficult to reason about.

But I do think that there's room for concern about performance, too. I
don't know if there is an actual problem, but there could be a
problem. My impression is that accessing shared memory isn't
intrinsically more costly than accessing backend-private memory, but
accessing heavily contended memory can definitely be more expensive
than accessing uncontended memory -- and not just slightly more
expensive, but WAY more expensive.

I think the problems tend to be worst when you have some bit of data
that's being frequently modified by multiple backends. Every backend
that wants to modify the value needs to steal the cache line, and
eventually you spend most of your CPU time stealing cache lines from
other sockets and not much of it doing any actual work. If you have a
value that's just being read by a lot of backends without
modification, I think the cache line can be shared in read only mode
by all the CPUs and it's not too bad.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: LogwrtResult contended spinlock

От

Jeff Davis

Дата:

22 февраля, 22:48:22

On Thu, 2024-02-22 at 10:17 +0530, Robert Haas wrote:
> I think the problems tend to be worst when you have some bit of data
> that's being frequently modified by multiple backends. Every backend
> that wants to modify the value needs to steal the cache line, and
> eventually you spend most of your CPU time stealing cache lines from
> other sockets and not much of it doing any actual work. If you have a
> value that's just being read by a lot of backends without
> modification, I think the cache line can be shared in read only mode
> by all the CPUs and it's not too bad.

That makes sense. I guess they'd be on the same cache line as well,
which means a write to either will invalidate both.

Some places (XLogWrite, XLogInsertRecord, XLogSetAsyncXactLSN,
GetFlushRecPtr, GetXLogWriteRecPtr) already update it from shared
memory anyway, so those are non-issues.

The potential problem areas (unless I missed something) are:

  * AdvanceXLInsertBuffer reads it as an early check to see if a buffer
is already written out.

  * XLogFlush / XLogNeedsFlush use it for an early return

  * WALReadFromBuffers reads it for an error check (currently an
Assert, but we might want to make that an elog).

  * We are discussing adding a Copy pointer, which would be advanced by
WaitXLogInsertionsToFinish(), and if we do replication-before-flush, we
may want to be more eager about advancing it. That could cause an
issue, as well.

I don't see the global non-shared variable as a huge problem, so if it
serves a purpose then I'm fine keeping it. Perhaps we could make it a
bit safer by using some wrapper functions. Something like:

  bool
  IsWriteRecPtrAtLeast(XLogRecPtr recptr)
  {
     XLogRecPtr writeptr;
     if (LogwrtResult.Write >= recptr)
         return true;
     writeptr = GetXLogWriteRecPtr();
     return (writeptr >= recptr);
  }

That would reduce the number of direct references to LogwrtResult,
callers would never see a stale value, and would avoid the cache miss
problem that you're concerned about. Not sure if they'd need to be
inline or not.

Regards,
    Jeff Davis

Re: LogwrtResult contended spinlock

От

Robert Haas

Дата:

23 февраля, 07:40:43

On Fri, Feb 23, 2024 at 1:18 AM Jeff Davis <pgsql@j-davis.com> wrote:
> I don't see the global non-shared variable as a huge problem, so if it
> serves a purpose then I'm fine keeping it. Perhaps we could make it a
> bit safer by using some wrapper functions.

I actually really hate these kinds of variables. I think they likely
mask various bugs (where the value might not be up to date where we
think it is) and anti-bugs (where we actually rely on the value being
out of date but we don't know that we do because the code is so
opaque). My vintage 2021 adventures in getting rid of the global
variable ThisTimeLineID and a few other global variables found some
actual but minor bugs, also found a bunch of confused code that didn't
really do what it thought it did, and took up a huge amount of my time
trying to analyze what was happening. I'm not prepared to recommend
what we should do in this particular case. I would like to believe
that the local copies can be eliminated somehow, but I haven't studied
the code or done benchmarking so I don't really know enough to guess
what sort of code changes would or would not be good enough.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: LogwrtResult contended spinlock

От

Bharath Rupireddy

Дата:

04 марта, 18:45:00

Thanks for looking into this.

On Thu, Feb 22, 2024 at 1:54 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> > 3.
> > @@ -6371,7 +6373,9 @@ GetFlushRecPtr(TimeLineID *insertTLI)
> >  If at all, a read
> > barrier is warranted here, we can use atomic read with full barrier
>
> I don't think we need a full barrier but I'm fine with using
> pg_atomic_read_membarrier_u64() if it's better for whatever reason.

For the sake of clarity and correctness, I've used
pg_atomic_read_membarrier_u64 everywhere for reading
XLogCtl->LogwrtResult.Write and XLogCtl->LogwrtResult.Flush.

> > 5. I guess we'd better use pg_atomic_read_u64_impl and
> > pg_atomic_compare_exchange_u64_impl in
> > pg_atomic_monotonic_advance_u64
> > to reduce one level of function indirections.
>
> Aren't they inlined?

Yes, all of them are inlined. But, it seems like XXX_impl functions
are being used in implementing exposed functions as a convention.
Therefore, having pg_atomic_read_u64_impl and
pg_atomic_compare_exchange_u64_impl doesn't sound bad IMV.

> > 6.
> > + * Full barrier semantics (even when value is unchanged).
> >
> > +    currval = pg_atomic_read_u64(ptr);
> > +    if (currval >= target_)
> > +    {
> > +        pg_memory_barrier();
>
> I don't think they are exactly equivalent: in the current patch, the
> first pg_atomic_read_u64() could be reordered with earlier reads;
> whereas that wouldn't work if using pg_atomic_read_membarrier_u64() it
> could not be. I'm not sure whether that could create a performance
> problem or not.

I left it as-is.

> > 9.
> > +    copyptr = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Copy);
> > +    if (startptr + count > copyptr)
> > +        ereport(WARNING,
> > +                (errmsg("request to read past end of generated WAL;
> > request %X/%X, current position %X/%X",
> > +                        LSN_FORMAT_ARGS(startptr + count),
> > LSN_FORMAT_ARGS(copyptr))));
> >
> > Any specific reason for this to be a WARNING rather than an ERROR?
>
> Good question. WaitXLogInsertionsToFinish() uses a LOG level message
> for the same situation. They should probably be the same log level, and
> I would think it would be either PANIC or WARNING. I have no idea why
> LOG was chosen.

WaitXLogInsertionsToFinish adjusts upto after LOG so that the wait is
never past the current insert position even if a caller asks for
reading WAL that doesn't yet exist. And the comment there says "Here
we just assume that to mean that all WAL that has been reserved needs
to be finished."

In contrast, WALReadFromBuffers kind of enforces callers to do
WaitXLogInsertionsToFinish (IOW asks callers to send in the WAL that
exists in the server). Therefore, an ERROR seems a reasonable choice
to me, if PANIC sounds rather strong affecting all the postgres
processes.

FWIW, a PANIC when requested to flush past the end of WAL in
WaitXLogInsertionsToFinish instead of LOG seems to be good. CF bot
animals don't complain -
https://github.com/BRupireddy2/postgres/tree/be_harsh_when_request_to_flush_past_end_of_WAL_WIP.

> 0001:
>
> * The comments on the two versions of the functions are redundant, and
> the style in that header seems to be to omit the comment from the u64
> version.

Removed comments atop 64-bit version.

> * I'm not sure the AssertPointerAlignment is needed in the u32 version?

Borrowed them from pg_atomic_read_u32 and
pg_atomic_compare_exchange_u32, just like how they assert before
calling XXX_impl versions. I don't see any problem with them.

> 0002:
>
> * All updates to the non-shared LogwrtResult should update both values.
> It's confusing to update those local values independently, because it
> violates the invariant that LogwrtResult.Flush <= LogwrtResult.Write.
>
> > 2. I guess we need to update both the Write and Flush local copies in
> > AdvanceXLInsertBuffer.
>
> I agree. Whenever we update the non-shared LogwrtResult, let's update
> the whole thing. Otherwise it's confusing.

Yes, it's done that way now with a macro XLogUpdateLocalLogWrtResult
using pg_atomic_read_membarrier_u64 to read both Write and Flush ptrs.

> * pg_memory_barrier() is not needed right before a spinlock

Got rid of it as we read both Flush and Write local copies with
pg_atomic_read_membarrier_u64.

> * As mentioned above, I think GetFlushRecPtr() needs two read barriers.
> Also, I think the same for GetXLogWriteRecPtr().

Yes, it's done that way now with a macro XLogUpdateLocalLogWrtResult
using pg_atomic_read_membarrier_u64 to read both Write and Flush ptrs.

> * In general, for any place using both Write and Flush, I think Flush
> should be loaded first, followed by a read barrier, followed by a load
> of the Write pointer.

Why read Flush first rather than Write? I think it's enough to do
{read Write,  read barrier, read Flush}. This works because Write is
monotonically advanced first before Flush using full barriers and we
don't get reordering issues between the readers and writers no? Am I
missing anything here?

> And I think in most of those places there should
> be a read barrier before the load of Flush, too, to avoid a stale value
> in places that might matter.

Yes, using pg_atomic_read_membarrier_u64 for both Write and Flush makes it easy.

> 0003:
>
> * We need to maintain the invariant that Copy >= Write >= Flush. I
> believe that's always satisfied, because the
> XLogWaitInsertionsToFinish() is always called before XLogWrite(). But
> we should add an assert or runtime check of this invariant somewhere.

Yes, that invariant is already maintained by the server. Although, I'm
not fully agree, I added an assertion to WaitXLogInsertionsToFinish
after updating XLogCtl->LogwrtResult.Copy. CF bot is happy with it -
https://github.com/BRupireddy2/postgres/tree/atomic_LogwrtResult_v13.

Please see the attached v13 patch set for further review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: LogwrtResult contended spinlock

От

Bharath Rupireddy

Дата:

14 марта, 10:38:23

On Mon, Mar 4, 2024 at 9:15 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> > 0003:
> >
> > * We need to maintain the invariant that Copy >= Write >= Flush. I
> > believe that's always satisfied, because the
> > XLogWaitInsertionsToFinish() is always called before XLogWrite(). But
> > we should add an assert or runtime check of this invariant somewhere.
>
> Yes, that invariant is already maintained by the server. Although, I'm
> not fully agree, I added an assertion to WaitXLogInsertionsToFinish
> after updating XLogCtl->LogwrtResult.Copy. CF bot is happy with it -
> https://github.com/BRupireddy2/postgres/tree/atomic_LogwrtResult_v13.

I've now separated these invariants out into the 0004 patch.

With the assertions placed in WaitXLogInsertionsToFinish after
updating Copy ptr, I observed the assertion failing in one of the CF
bot machines - https://cirrus-ci.com/build/6202112288227328. I could
reproduce it locally with [1]. I guess the reason is that the Write
and Flush ptrs are now updated independently and atomically without
lock, they might drift and become out-of-order for a while if
concurrently they are accessed in WaitXLogInsertionsToFinish. So, I
guess the right place to verify the invariant Copy >= Write >= Flush
is in XLogWrite once Write and Flush ptrs in shared memory are updated
(note that only one process at a time can do this). Accordingly, I've
moved the assertions to XLogWrite in the attached v14-0004 patch.

> Please see the attached v13 patch set for further review.

Earlier versions of the patches removed a piece of code ensuring
shared WAL 'request' values did not fall beading the 'result' values.
There's a good reason for us to have it. So, I restored it.

-       /*
-        * Update shared-memory status
-        *
-        * We make sure that the shared 'request' values do not fall behind the
-        * 'result' values.  This is not absolutely essential, but it saves some
-        * code in a couple of places.
-        */

Please see the attached v14 patch set.

[1] for i in {1..100}; do make check
PROVE_TESTS="t/027_stream_regress.pl"; if [ $? -ne 0 ]; then echo "The
command failed on iteration $i"; break; fi; done

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: LogwrtResult contended spinlock

От

Alvaro Herrera

Дата:

03 апреля, 14:19:40

Thanks for keeping this moving forward.  I gave your proposed patches a
look.   One thing I didn't like much is that we're adding a new member
(Copy) to XLogwrtAtomic -- but this struct is supposed to be a mirror of
XLogwrtResult for use with atomic access.  Since this new member is not
added to XLogwrtResult (because it's not needed there), the whole idea
of there being symmetry between those two structs crumbles down.
Because we later stop using struct-assign anyway, meaning we no longer
need the structs to match, we can instead spell out the members in
XLogCtl and call it a day.

So what I do in the attached 0001 is stop using the XLogwrtResult struct
in XLogCtl and replace it with separate Write and Flush values, and add
the macro XLogUpdateLocalLogwrtResult() that copies the values of Write
and Flush from the shared XLogCtl to the local variable given as macro
argument.  (I also added our idiomatic do {} while(0) to the macro
definition, for safety).  The new members are XLogCtl->logWriteResult
and XLogCtl->logFlushResult and as of 0001 are just XLogRecPtr, so
essentially identical semantics as the previous code.  No atomic access
yet!

0002 then adds pg_atomic_monotonic_advance_u64.  (I don't add the _u32
variant, because I don't think it's a great idea to add dead code.  If
later we see a need for it we can put it in.)  It also changes the two
new members to be atomics, changes the macro to use atomic read, and
XLogWrite now uses monotonic increment.  A couple of other places can
move the macro calls to occur outside the spinlock.  Also, XLogWrite
gains the invariant checking that involves Write and Flush.

Finally, 0003 adds the Copy pointer to XLogCtl alongside Write and
Flush, and updates WALReadFromBuffers to test that instead of the Write
pointer, and adds in XLogWrite the invariant checks that involve the
Copy pointer.

I haven't rerun Bharath test loop yet; will do so shortly.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"You're _really_ hosed if the person doing the hiring doesn't understand
relational systems: you end up with a whole raft of programmers, none of
whom has had a Date with the clue stick."              (Andrew Sullivan)
https://postgr.es/m/20050809113420.GD2768@phlogiston.dyndns.org

Вложения

Re: LogwrtResult contended spinlock

От

Alvaro Herrera

Дата:

03 апреля, 15:00:09

On 2024-Apr-03, Alvaro Herrera wrote:

> So what I do in the attached 0001 is stop using the XLogwrtResult struct
> in XLogCtl and replace it with separate Write and Flush values, and add
> the macro XLogUpdateLocalLogwrtResult() that copies the values of Write
> and Flush from the shared XLogCtl to the local variable given as macro
> argument.  (I also added our idiomatic do {} while(0) to the macro
> definition, for safety).  The new members are XLogCtl->logWriteResult
> and XLogCtl->logFlushResult and as of 0001 are just XLogRecPtr, so
> essentially identical semantics as the previous code.  No atomic access
> yet!

BTW I forgot.  I didn't like the name XLogUpdateLocalLogwrtResult() name
much.  What do you think about RefreshXLogWriteResult() instead?

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"How strange it is to find the words "Perl" and "saner" in such close
proximity, with no apparent sense of irony. I doubt that Larry himself
could have managed it."         (ncm, http://lwn.net/Articles/174769/)

Re: LogwrtResult contended spinlock

От

Bharath Rupireddy

Дата:

03 апреля, 20:00:00

On Wed, Apr 3, 2024 at 4:49 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> Thanks for keeping this moving forward.  I gave your proposed patches a
> look.   One thing I didn't like much is that we're adding a new member
> (Copy) to XLogwrtAtomic -- but this struct is supposed to be a mirror of
> XLogwrtResult for use with atomic access.  Since this new member is not
> added to XLogwrtResult (because it's not needed there), the whole idea
> of there being symmetry between those two structs crumbles down.
> Because we later stop using struct-assign anyway, meaning we no longer
> need the structs to match, we can instead spell out the members in
> XLogCtl and call it a day.

Hm, I have no objection to having separate variables in XLogCtl.

> So what I do in the attached 0001 is stop using the XLogwrtResult struct
> in XLogCtl and replace it with separate Write and Flush values, and add
> the macro XLogUpdateLocalLogwrtResult() that copies the values of Write
> and Flush from the shared XLogCtl to the local variable given as macro
> argument.  (I also added our idiomatic do {} while(0) to the macro
> definition, for safety).  The new members are XLogCtl->logWriteResult
> and XLogCtl->logFlushResult and as of 0001 are just XLogRecPtr, so
> essentially identical semantics as the previous code.  No atomic access
> yet!

+1.

> 0002 then adds pg_atomic_monotonic_advance_u64.  (I don't add the _u32
> variant, because I don't think it's a great idea to add dead code.  If
> later we see a need for it we can put it in.)  It also changes the two
> new members to be atomics, changes the macro to use atomic read, and
> XLogWrite now uses monotonic increment.  A couple of other places can
> move the macro calls to occur outside the spinlock.  Also, XLogWrite
> gains the invariant checking that involves Write and Flush.

I'm fine with not having the 32 bit variant of
pg_atomic_monotonic_advance. However, a recent commit bd5132db5 added
both 32 and 64 bit versions of pg_atomic_read_membarrier even though
32 bit isn't being used.

> Finally, 0003 adds the Copy pointer to XLogCtl alongside Write and
> Flush, and updates WALReadFromBuffers to test that instead of the Write
> pointer, and adds in XLogWrite the invariant checks that involve the
> Copy pointer.

+1.

The attached patches look good to me.

Also, I'm fine with renaming XLogUpdateLocalLogwrtResult() to
RefreshXLogWriteResult().

> I haven't rerun Bharath test loop yet; will do so shortly.

I ran it a 100 times [1] on top of all the 3 patches, it looks fine.

[1] for i in {1..100}; do make check
PROVE_TESTS="t/027_stream_regress.pl"; if [ $? -ne 0 ]; then echo "The
command failed on iteration $i"; break; fi; done

./configure --prefix=$PWD/pg17/ --enable-debug --enable-tap-tests
--enable-cassert CC=/usr/bin/clang-14 > install.log && make -j 8
install > install.log 2>&1 &

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: LogwrtResult contended spinlock

От

Alvaro Herrera

Дата:

03 апреля, 20:57:32

On 2024-Apr-03, Bharath Rupireddy wrote:

> On Wed, Apr 3, 2024 at 4:49 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

> > So what I do in the attached 0001 is stop using the XLogwrtResult struct
> > in XLogCtl and replace it with separate Write and Flush values, and add
> > the macro XLogUpdateLocalLogwrtResult() that copies the values of Write
> > and Flush from the shared XLogCtl to the local variable given as macro
> > argument.  (I also added our idiomatic do {} while(0) to the macro
> > definition, for safety).  The new members are XLogCtl->logWriteResult
> > and XLogCtl->logFlushResult and as of 0001 are just XLogRecPtr, so
> > essentially identical semantics as the previous code.  No atomic access
> > yet!
> 
> +1.

> Also, I'm fine with renaming XLogUpdateLocalLogwrtResult() to
> RefreshXLogWriteResult().

Okay, I have pushed 0001 with the name change, will see about getting
the others in tomorrow.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Re: LogwrtResult contended spinlock

От

Jeff Davis

Дата:

03 апреля, 23:15:28

On Wed, 2024-04-03 at 13:19 +0200, Alvaro Herrera wrote:
> So what I do in the attached 0001 is stop using the XLogwrtResult
> struct
> in XLogCtl and replace it with separate Write and Flush values, and
> add
> the macro XLogUpdateLocalLogwrtResult() that copies the values of
> Write
> and Flush from the shared XLogCtl to the local variable given as
> macro
> argument.

+1

> 0002 then adds pg_atomic_monotonic_advance_u64.  (I don't add the
> _u32
> variant, because I don't think it's a great idea to add dead code. 
> If
> later we see a need for it we can put it in.)  It also changes the
> two
> new members to be atomics, changes the macro to use atomic read, and
> XLogWrite now uses monotonic increment.  A couple of other places can
> move the macro calls to occur outside the spinlock.  Also, XLogWrite
> gains the invariant checking that involves Write and Flush.

To maintain the invariant that Write >= Flush, I believe you need to
always store to Write first, then Flush; and load from Flush first,
then from Write. So RefreshXLogWriteResult should load Flush then load
Write, and the same for the Assert code. And in 0003, loading from Copy
should happen last.

Also, should pg_atomic_monotonic_advance_u64() return currval? I don't
think it's important for the current patches, but I could imagine a
caller wanting the most up-to-date value possible, even if it's beyond
what the caller requested. Returning void seems slightly wasteful.

Other than that, it looks good to me.

> Finally, 0003 adds the Copy pointer to XLogCtl alongside Write and
> Flush, and updates WALReadFromBuffers to test that instead of the
> Write
> pointer, and adds in XLogWrite the invariant checks that involve the
> Copy pointer.

Looks good to me.

Regards,
    Jeff Davis

Re: LogwrtResult contended spinlock

От

Alvaro Herrera

Дата:

04 апреля, 20:45:04

I've noticed a few things here, v16 attached with some rather largish
changes.

1. Using pg_atomic_write_membarrier_u64 is useless and it imposes mora
barriers than we actually need.  So I switched back to
pg_atomic_write_u64 and add one barrier between the two writes.  Same
for reads.

2. Using monotonic_advance for Write and Flush is useless.  We can use a
simple atomic_write with a write barrier in between.  The reason is
that, as Andres said[1], we only write those with WALWriteLock held, so
it's not possible for them to move forward while we aren't looking.  All
callers of XLogWrite do RefreshXLogWriteResult() with the WALWriteLock
held.  Therefore we can just use pg_atomic_write_u64.  Consequently I
moved the addition of the monotonic advance function to the patch that
adds Copy.

3. Testing the invariant that the Copy pointer cannot be 0 is useless,
because we initialize that pointer to EndOfLog during StartupXLOG.
So, removed.

4. If we're not modifying any callers of WALReadFromBuffers(), then
AFAICS the added check that the request is not past the Copy pointer is
useless.  In a quick look at that code, I think we only try to read data
that's been flushed, not written, so the stricter check that we don't
read data that hasn't been Copied does nothing.  (Honestly I'm not sure
that we want XLogSendPhysical to be reading data that has not been
written, or do we?)  Please argue why we need this patch.

5. The existing weird useless-looking block at the end of XLogWrite is
there because we used to have it to declare a volatile pointer to
XLogCtl (cf.  commit 6ba4ecbf477e).  That's no longer needed, so we
could remove it.  Or we could leave it alone (just because it's ancient
and it doesn't hurt anything), but there's no reason to have the new
invariant-testing block inside the same block.  So I added another weird
useless-looking block, except that this one does have two variable
declaration at its top.

6. In a few places, we read both Write and Flush to only use one of
them.  This is wasteful and we could dial this back to reading only the
one we need.  Andres suggested as much in [1].  I didn't touch this in
the current patch, and I don't necessarily think we need to address it
right now.  Addressing this should probably done similar to what I
posted in [2]'s 0002.

[1] https://postgr.es/m/20210130023011.n545o54j65t4kgxn@alap3.anarazel.de
[2] https://postgr.es/m/202203221611.hqbjdinzsbu2@alvherre.pgsql

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Вложения

Re: LogwrtResult contended spinlock

От

Jeff Davis

Дата:

05 апреля, 01:50:58

On Thu, 2024-04-04 at 19:45 +0200, Alvaro Herrera wrote:
> 1. Using pg_atomic_write_membarrier_u64 is useless and it imposes
> mora
> barriers than we actually need.  So I switched back to
> pg_atomic_write_u64 and add one barrier between the two writes.  Same
> for reads.

+1.

This looks correct to me. Just before the writes there's a spinlock,
which acts as a full barrier; and just afterwards, the function returns
and the WALWriteLock is released, again acting as a full barrier. The
write barrier in between enforces the Write >= Flush invariant.

> 2. Using monotonic_advance for Write and Flush is useless.

+1.

> 3. Testing the invariant that the Copy pointer cannot be 0 is
> useless,
> because we initialize that pointer to EndOfLog during StartupXLOG.
> So, removed.

+1.

> 4. If we're not modifying any callers of WALReadFromBuffers(), then
> AFAICS the added check that the request is not past the Copy pointer
> is
> useless.  In a quick look at that code, I think we only try to read
> data
> that's been flushed, not written, so the stricter check that we don't
> read data that hasn't been Copied does nothing.

Bharath has indicated that he may call WALReadFromBuffers() in an
extension, so I believe some error checking is appropriate there.

>   (Honestly I'm not sure
> that we want XLogSendPhysical to be reading data that has not been
> written, or do we?)

Not yet, but there has been some discussion[1][2] about future work to
allow replicating data before it's been flushed locally.

>   Please argue why we need this patch.

I'm not sure what you mean by "this patch"?

> 5. The existing weird useless-looking block at the end of XLogWrite
> is
> there because we used to have it to declare a volatile pointer to
> XLogCtl (cf.  commit 6ba4ecbf477e).  That's no longer needed, so we
> could remove it.  Or we could leave it alone (just because it's
> ancient
> and it doesn't hurt anything), but there's no reason to have the new
> invariant-testing block inside the same block.  So I added another
> weird
> useless-looking block, except that this one does have two variable
> declaration at its top.

That didn't bother me, but it could be cleaned up a bit in a later
patch.

> 6. In a few places, we read both Write and Flush to only use one of
> them.  This is wasteful and we could dial this back to reading only
> the
> one we need.  Andres suggested as much in [1].  I didn't touch this
> in
> the current patch, and I don't necessarily think we need to address
> it
> right now.  Addressing this should probably done similar to what I
> posted in [2]'s 0002.

I agree that it should be a separate patch. I haven't thought about the
consequences of making them fully independent -- I think that means we
give up the invariant that Copy >= Write >= Flush?


Regarding the patches themselves, 0001 looks good to me.

For 0002, did you consider having pg_atomic_monotonic_advance_u64()
return the currval?

Regards,
    Jeff Davis

[1]
https://www.postgresql.org/message-id/CALj2ACV6rS%2B7iZx5%2BoAvyXJaN4AG-djAQeM1mrM%3DYSDkVrUs7g%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/20230125211540.zylu74dj2uuh3k7w%40awork3.anarazel.de
[3]
https://www.postgresql.org/message-id/CALj2ACW65mqn6Ukv57SqDTMzAJgd1N_AdQtDgy%2BgMDqu6v618Q%40mail.gmail.com

Re: LogwrtResult contended spinlock

От

Bharath Rupireddy

Дата:

05 апреля, 07:29:09

On Fri, Apr 5, 2024 at 4:21 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> > 4. If we're not modifying any callers of WALReadFromBuffers(), then
> > AFAICS the added check that the request is not past the Copy pointer
> > is
> > useless.  In a quick look at that code, I think we only try to read
> > data
> > that's been flushed, not written, so the stricter check that we don't
> > read data that hasn't been Copied does nothing.
>
> Bharath has indicated that he may call WALReadFromBuffers() in an
> extension, so I believe some error checking is appropriate there.
>
> >   (Honestly I'm not sure
> > that we want XLogSendPhysical to be reading data that has not been
> > written, or do we?)
>
> Not yet, but there has been some discussion[1][2] about future work to
> allow replicating data before it's been flushed locally.

Right. Although callers of WALReadFromBuffers() in core postgres
doesn't need it now (in future, they will), but it allows one to write
something up externally - for example, see 0004 patch in
https://www.postgresql.org/message-id/CALj2ACWmW+1ZdYE0BD5-4KVLzYGn3=pudomwPHuHmi4YQuOFSg@mail.gmail.com
where I implemented a xlogreader page_read callback that just keeps
reading the WAL that's fully copied to WAL buffers.

> Regarding the patches themselves, 0001 looks good to me.

A few comments on 0001:

1.
 /*
  * Update local copy of shared XLogCtl->log{Write,Flush}Result
+ *
+ * It's critical that Flush always trails Write, so the order of the reads is
+ * important, as is the barrier.
  */
 #define RefreshXLogWriteResult(_target) \
     do { \
-        _target.Write = XLogCtl->logWriteResult; \
-        _target.Flush = XLogCtl->logFlushResult; \
+        _target.Flush = pg_atomic_read_u64(&XLogCtl->logFlushResult); \
+        pg_read_barrier(); \
+        _target.Write = pg_atomic_read_u64(&XLogCtl->logWriteResult); \
     } while (0)

Is it "Flush always trails Write" or "Flush always leades Write"? I
guess the latter, no?

2.
+
+        pg_atomic_write_u64(&XLogCtl->logWriteResult, LogwrtResult.Write);
+        pg_write_barrier();
+        pg_atomic_write_u64(&XLogCtl->logFlushResult, LogwrtResult.Flush);
     }

Maybe add the reason as to why we had to write logWriteResult first
and then logFlushResult, similar to the comment atop
RefreshXLogWriteResult?

> For 0002, did you consider having pg_atomic_monotonic_advance_u64()
> return the currval?

+1 for returning currval here.

Except for the above comments, the patches look good to me. I've also
run the test loop for any assertion failure - for i in {1..100}; do
make check PROVE_TESTS="t/027_stream_regress.pl"; if [ $? -ne 0 ];
then echo "The command failed on iteration $i"; break; fi; done.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: LogwrtResult contended spinlock

От

Alvaro Herrera

Дата:

05 апреля, 11:41:01

On 2024-Apr-05, Bharath Rupireddy wrote:

> 1.
>  /*
>   * Update local copy of shared XLogCtl->log{Write,Flush}Result
> + *
> + * It's critical that Flush always trails Write, so the order of the reads is
> + * important, as is the barrier.
>   */
>  #define RefreshXLogWriteResult(_target) \
>      do { \
> -        _target.Write = XLogCtl->logWriteResult; \
> -        _target.Flush = XLogCtl->logFlushResult; \
> +        _target.Flush = pg_atomic_read_u64(&XLogCtl->logFlushResult); \
> +        pg_read_barrier(); \
> +        _target.Write = pg_atomic_read_u64(&XLogCtl->logWriteResult); \
>      } while (0)
> 
> Is it "Flush always trails Write" or "Flush always leades Write"? I
> guess the latter, no?

Well, Flush cannot lead Write.  We cannot Flush what hasn't been
Written!  To me, "Flush trails Write" means that flush is behind.  That
seems supported by Merriam-Webster[1], which says in defining it as a
transitive verb

3 a : to follow upon the scent or trace of : TRACK
  b : to follow in the footsteps of : PURSUE
  c : to follow along behind
  d : to lag behind (someone, such as a competitor)

Maybe it's not super clear as a term.  We could turn it around and say "Write
always leads Flush".

[1] https://www.merriam-webster.com/dictionary/trail

The reason we have the barrier, and the reason we write and read them in
this order, is that we must never read a Flush value that is newer than
the Write value we read.  That is: the values are written in pairs
(w1,f1) first, (w2,f2) next, and so on.  It's okay if we obtain (w2,f1)
(new write, old flush), but it's not okay if we obtain (w1,f2) (old
write, new flush).  If we did, we might end up with a Flush value that's
ahead of Write, violating the invariant.  

> 2.
> +
> +        pg_atomic_write_u64(&XLogCtl->logWriteResult, LogwrtResult.Write);
> +        pg_write_barrier();
> +        pg_atomic_write_u64(&XLogCtl->logFlushResult, LogwrtResult.Flush);
>      }
> 
> Maybe add the reason as to why we had to write logWriteResult first
> and then logFlushResult, similar to the comment atop
> RefreshXLogWriteResult?

Hmm, I had one there and removed it.  I'll put something back.

> > For 0002, did you consider having pg_atomic_monotonic_advance_u64()
> > return the currval?
> 
> +1 for returning currval here.

Oh yeah, I had that when the monotonic stuff was used by 0001 but lost
it when I moved to 0002.  I'll push 0001 now and send an updated 0002
with the return value added.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"La rebeldía es la virtud original del hombre" (Arthur Schopenhauer)

Re: LogwrtResult contended spinlock

От

Alvaro Herrera

Дата:

05 апреля, 14:54:45

Couldn't push: I tested with --disable-atomics --disable-spinlocks and
the tests fail because the semaphore for the atomic variables is not
always initialized.  This is weird -- it's like a client process is
running at a time when StartupXLOG has not initialized the variable ...
so the initialization in the other place was not completely wrong.
I'll investigate after lunch.  Here's v16.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"XML!" Exclaimed C++.  "What are you doing here? You're not a programming
language."
"Tell that to the people who use me," said XML.
https://burningbird.net/the-parable-of-the-languages/

Вложения

Re: LogwrtResult contended spinlock

От

Alvaro Herrera

Дата:

05 апреля, 20:55:32

Pushed 0001.  Here's the patch that adds the Copy position one more
time, with the monotonic_advance function returning the value.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Вложения

v17-0002-Add-logCopyResult.patch

Re: LogwrtResult contended spinlock

От

Jeff Davis

Дата:

05 апреля, 21:22:40

On Fri, 2024-04-05 at 13:54 +0200, Alvaro Herrera wrote:
> Couldn't push: I tested with --disable-atomics --disable-spinlocks
> and
> the tests fail because the semaphore for the atomic variables is not
> always initialized.  This is weird -- it's like a client process is
> running at a time when StartupXLOG has not initialized the variable
> ...
> so the initialization in the other place was not completely wrong.

It looks like you solved the problem and pushed the first patch. Looks
good to me.

Patch 0002 also looks good to me, after a mostly-trivial rebase (just
remember to initialize logCopyResult).

Minor comments:

* You could consider using a read barrier before reading the Copy ptr,
or using the pg_atomic_read_membarrier_u64() variant. I don't see a
correctness problem, but it might be slightly more clear and I don't
see a lot of downside.

* Also, I assume that the Max() call in
pg_atomic_monotonic_advance_u64() is just for clarity? Surely the
currval cannot be less than _target when it returns. I'd probably just
do Assert(currval >= _target) and then return currval.

Regards,
    Jeff Davis

Re: LogwrtResult contended spinlock

От

Thomas Munro

Дата:

06 апреля, 06:50:24

On Sat, Apr 6, 2024 at 6:55 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Pushed 0001.

Could that be related to the 3 failures on parula that look like this?

TRAP: failed Assert("node->next == 0 && node->prev == 0"), File:
"../../../../src/include/storage/proclist.h", Line: 63, PID: 29119
2024-04-05 16:16:26.812 UTC [29114:15] pg_regress/drop_operator LOG:
statement: DROP OPERATOR <|(bigint, bigint);
postgres: postgres regression [local] CREATE
ROLE(ExceptionalCondition+0x4c)[0x9b3fdc]
postgres: postgres regression [local] CREATE ROLE[0x8529e4]
postgres: postgres regression [local] CREATE
ROLE(LWLockWaitForVar+0xec)[0x8538fc]
postgres: postgres regression [local] CREATE ROLE[0x54c7d4]
postgres: postgres regression [local] CREATE ROLE(XLogFlush+0xf0)[0x552600]
postgres: postgres regression [local] CREATE ROLE[0x54a9b0]
postgres: postgres regression [local] CREATE ROLE[0x54bbdc]

Hmm, the comments for LWLockWaitForVar say:

 * Be aware that LWLockConflictsWithVar() does not include a memory barrier,
 * hence the caller of this function may want to rely on an explicit barrier or
 * an implied barrier via spinlock or LWLock to avoid memory ordering issues.

But that seems to be more likely to make LWLockWaitForVar suffer data
races (ie hang), not break assertions about LWLock sanity, so I don't
know what's going on there.  I happened to have a shell on a Graviton
box, but I couldn't reproduce it after a while...

Re: LogwrtResult contended spinlock

От

Bharath Rupireddy

Дата:

06 апреля, 09:38:45

On Sat, Apr 6, 2024 at 9:21 AM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Sat, Apr 6, 2024 at 6:55 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > Pushed 0001.
>
> Could that be related to the 3 failures on parula that look like this?
>
> TRAP: failed Assert("node->next == 0 && node->prev == 0"), File:
> "../../../../src/include/storage/proclist.h", Line: 63, PID: 29119
> 2024-04-05 16:16:26.812 UTC [29114:15] pg_regress/drop_operator LOG:
> statement: DROP OPERATOR <|(bigint, bigint);
> postgres: postgres regression [local] CREATE
> ROLE(ExceptionalCondition+0x4c)[0x9b3fdc]
> postgres: postgres regression [local] CREATE ROLE[0x8529e4]
> postgres: postgres regression [local] CREATE
> ROLE(LWLockWaitForVar+0xec)[0x8538fc]
> postgres: postgres regression [local] CREATE ROLE[0x54c7d4]
> postgres: postgres regression [local] CREATE ROLE(XLogFlush+0xf0)[0x552600]
> postgres: postgres regression [local] CREATE ROLE[0x54a9b0]
> postgres: postgres regression [local] CREATE ROLE[0x54bbdc]
>
> Hmm, the comments for LWLockWaitForVar say:
>
>  * Be aware that LWLockConflictsWithVar() does not include a memory barrier,
>  * hence the caller of this function may want to rely on an explicit barrier or
>  * an implied barrier via spinlock or LWLock to avoid memory ordering issues.
>
> But that seems to be more likely to make LWLockWaitForVar suffer data
> races (ie hang), not break assertions about LWLock sanity, so I don't
> know what's going on there.  I happened to have a shell on a Graviton
> box, but I couldn't reproduce it after a while...

Thanks for reporting. I'll try to spin up a similar instance like
parula and reproduce. Meanwhile, I'm wondering if it is somehow
related to what's discussed in "Why is parula failing?"
https://www.postgresql.org/message-id/4009739.1710878318%40sss.pgh.pa.us.
It seems like parula is behaving unexpectedly because of the compiler
and other stuff.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: LogwrtResult contended spinlock

От

Alvaro Herrera

Дата:

06 апреля, 19:13:19

On 2024-Apr-05, Jeff Davis wrote:

> Minor comments:

> * Also, I assume that the Max() call in
> pg_atomic_monotonic_advance_u64() is just for clarity? Surely the
> currval cannot be less than _target when it returns. I'd probably just
> do Assert(currval >= _target) and then return currval.

Uhh ... my understanding of pg_atomic_compare_exchange_u64 is that
*expected is set to the value that's stored in *ptr prior to the
exchange.  Its comment

 * Atomically compare the current value of ptr with *expected and store newval
 * iff ptr and *expected have the same value. The current value of *ptr will
 * always be stored in *expected.

is actually not very clear, because what does "current" mean in this
context?  Is the original "current" value, or is it the "current" value
after the exchange?  Anyway, looking at the spinlock-emulated code for
pg_atomic_compare_exchange_u32_impl in atomics.c,

    /* perform compare/exchange logic */
    ret = ptr->value == *expected;
    *expected = ptr->value;
    if (ret)
        ptr->value = newval;

it's clear that *expected receives the original value, not the new one.

Because of this behavior, not doing the Max() would return the obsolete
value rather than the one we just installed.  (It would only work
correctly without Max() when our cmpxchg operation "fails", that is,
somebody else incremented the value past the one we want to install.)

Another reason is that when I used pg_atomic_monotonic_advance_u64 with
the Write and Flush pointers and did not have the Max(), the assertion
failed pretty quickly :-)

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
Al principio era UNIX, y UNIX habló y dijo: "Hello world\n".
No dijo "Hello New Jersey\n", ni "Hello USA\n".

Re: LogwrtResult contended spinlock

От

Jeff Davis

Дата:

06 апреля, 20:46:05

On Sat, 2024-04-06 at 18:13 +0200, Alvaro Herrera wrote:
> my understanding of pg_atomic_compare_exchange_u64 is that
> *expected is set to the value that's stored in *ptr prior to the
> exchange.

Sorry, my mistake. Your version looks good.

Regards,
    Jeff Davis

Re: LogwrtResult contended spinlock

От

Alvaro Herrera

Дата:

07 апреля, 15:19:29

I pushed the "copy" pointer now, except that I renamed it to "insert",
which is what we call the operation being tracked.  I also added some
comments.

One perhaps significant change from what Bharath submitted is that we
now use the return value from monotonic advance to return an updated
value of finishedUpto from WaitXLogInsertionsToFinish.  I think this is
a useful optimization: the only caller of WaitXLogInsertionsToFinish
which cares about its return value is XLogFlush, which AFAICS benefits
from having a more up-to-date value.

Also, I did adopt your (Jeff's) idea of using a barrier when reading
logInsertResult in WaitXLogInsertionsToFinish.  If I understand
correctly, this is actually useful: if some other CPU core was also
recently busy waiting for xlog insertions and wrote an updated version
of logInsertResult that's past what we last saw, we benefit from an
updated value because we can skip the spinlock dance and WAL-insert-var
stuff altogether.  With no barrier, we would potentially see an outdated
value of logInsertResult and waste a lot of effort to get it updated.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Those who use electric razors are infidels destined to burn in hell while
we drink from rivers of beer, download free vids and mingle with naked
well shaved babes." (http://slashdot.org/comments.pl?sid=44793&cid=4647152)

Re: LogwrtResult contended spinlock

От

Jeff Davis

Дата:

07 апреля, 22:47:00

On Sun, 2024-04-07 at 14:19 +0200, Alvaro Herrera wrote:
> I pushed the "copy" pointer now, except that I renamed it to
> "insert",
> which is what we call the operation being tracked.  I also added some
> comments.

The "copy" pointer, as I called it, is not the same as the "insert"
pointer (as returned by GetXLogInsertRecPtr()).

LSNs before the "insert" pointer are reserved for WAL inserters (and
may or may not have made it to WAL buffers yet). LSNs before the "copy"
pointer are written to WAL buffers with CopyXLogRecordToWAL() (and may
or may not have been evicted to the OS file yet).

Other than that, looks good, thank you.

Regards,
    Jeff Davis

Re: LogwrtResult contended spinlock

От

Alvaro Herrera

Дата:

08 апреля, 11:24:15

On 2024-Apr-07, Jeff Davis wrote:

> On Sun, 2024-04-07 at 14:19 +0200, Alvaro Herrera wrote:
> > I pushed the "copy" pointer now, except that I renamed it to
> > "insert",
> > which is what we call the operation being tracked.  I also added some
> > comments.
> 
> The "copy" pointer, as I called it, is not the same as the "insert"
> pointer (as returned by GetXLogInsertRecPtr()).
> 
> LSNs before the "insert" pointer are reserved for WAL inserters (and
> may or may not have made it to WAL buffers yet). LSNs before the "copy"
> pointer are written to WAL buffers with CopyXLogRecordToWAL() (and may
> or may not have been evicted to the OS file yet).

It seems a matter of interpretation.  WAL insertion starts with
reserving the space (in ReserveXLogInsertLocation) and moving the
CurrBytePos point forward; the same WAL insertion completes when the
actual data has been copied to the buffers.  It is this process as a
whole that we can an insertion.  CurrBytePos (what GetXLogInsertRecPtr
returns) is the point where the next insertions are going to occur; the
new logInsertResult is the point where previous insertions are known to
have completed.

We could think about insertions as something that's happening to a range
of bytes.  CurrBytePos is the high end of that range, logInsertResult is
its low end.  (Although in reality we don't know the true low end,
because the process writing the oldest WAL doesn't advertise as soon as
it finishes its insertion, because it doesn't know that it is writing
the oldest.  We only know that X is this "oldest known" after we have
waited for all those that were inserting earlier than X to have
finished.)

My trouble with the "copy" term is that we don't use that term anywhere
in relation to WAL.  It's a term we don't need.  This "copy" is in
reality just the insertion, after it's finished.  The "Result" suffix
is intended to convey that it's the point where the operation has
successfully finished.

Maybe we can add some commentary to make this clearer.

Now, if you're set on renaming the variable back to Copy, I won't
object.

Lastly, I just noticed that I forgot credits and discussion link in that
commit message.  I would have attributed authorship to Bharath though,
because I had not realized that this patch had actually been written by
you initially[1].  My apologies.

[1] https://postgr.es/m/727449f3151c6b9ab76ba706fa4d30bf7b03ad4f.camel@j-davis.com

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: LogwrtResult contended spinlock

От

Jeff Davis

Дата:

08 апреля, 19:13:22

On Mon, 2024-04-08 at 10:24 +0200, Alvaro Herrera wrote:
> My trouble with the "copy" term is that we don't use that term
> anywhere
> in relation to WAL.

I got the term from CopyXLogRecordToWAL().

> This "copy" is in
> reality just the insertion, after it's finished.  The "Result" suffix
> is intended to convey that it's the point where the operation has
> successfully finished.

That's a convincing point.

> Maybe we can add some commentary to make this clearer.

For now, I'd just suggest a comment on GetXLogInsertRecPtr() explaining
what it's returning and how that's different from logInsertResult.

In the future, it would be nice to clarify where variable names like
reservedUpto and CurrBytePos fit in. Also the external documentation
for pg_current_wal_insert_lsn() is a bit ambiguous.

> Lastly, I just noticed that I forgot credits and discussion link in
> that
> commit message.  I would have attributed authorship to Bharath
> though,
> because I had not realized that this patch had actually been written
> by
> you initially[1].  My apologies.

No worries. Thank you for reviewing and committing it.

Regards,
    Jeff Davis

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: LogwrtResult contended spinlock

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения