Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)

Поиск
Список
Период
Сортировка
От Hsu, John
Тема Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)
Дата
Msg-id b7561922-c241-2e3c-0da9-cd01bf4e6ebf@amazon.com
обсуждение исходный текст
Ответ на Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)  (Nathan Bossart <nathandbossart@gmail.com>)
Ответы Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)
Список pgsql-hackers
Hello,

On 2/25/22 11:38 AM, Nathan Bossart wrote:

> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you
canconfirm the sender and know the content is safe.
 
>
>
>
> On Fri, Feb 25, 2022 at 08:31:37PM +0530, Bharath Rupireddy wrote:
>> Thanks Satya and others for the inputs. Here's the v1 patch that
>> basically allows async wal senders to wait until the sync standbys
>> report their flush lsn back to the primary. Please let me know your
>> thoughts.
> I haven't had a chance to look too closely yet, but IIUC this adds a new
> function that waits for synchronous replication.  This new function
> essentially spins until the synchronous LSN has advanced.
>
> I don't think it's a good idea to block sending any WAL like this.  AFAICT
> it is possible that there will be a lot of synchronously replicated WAL
> that we can send, and it might just be the last several bytes that cannot
> yet be replicated to the asynchronous standbys.  І believe this patch will
> cause the server to avoid sending _any_ WAL until the synchronous LSN
> advances.
>
> Perhaps we should instead just choose the SendRqstPtr based on the current
> synchronous LSN.  Presumably there are other things we'd need to consider,
> but in general, I think we ought to send as much WAL as possible for a
> given call to XLogSendPhysical().

I think you're right that we'll avoid sending any WAL until sync_lsn 
advances. We could setup a contrived situation where the async-walsender 
never advances because it terminates before the flush_lsn of the 
synchronous_node catches up. And when the async-walsender restarts, 
it'll start with the latest flushed on the primary and we could go into 
a perpetual loop.

I took a look at the patch and tested basic streaming with async 
replicas ahead of the synchronous standby and with logical clients as 
well and it works as expected.

 >
 > ereport(LOG,
 >            (errmsg("async standby WAL sender with request LSN %X/%X 
is waiting as sync standbys are ahead with flush LSN %X/%X",
 >                    LSN_FORMAT_ARGS(flushLSN), 
LSN_FORMAT_ARGS(sendRqstPtr)),
 >             errhidestmt(true)));

I think this log formatting is incorrect.
s/sync standbys are ahead/sync standbys are behind/ and I think you need 
to swap flushLsn and sendRqstPtr

When a walsender is waiting for the lsn on the synchronous replica to 
advance and a database stop is issued to the writer, the pg_ctl stop 
isn't able to proceed and the database seems to never shutdown.

 > Assert(priority >= 0);

What's the point of the assert here?

Also the comments/code refer to AsyncStandbys, however it's also used 
for logical clients, which may or may not be standbys. Don't feel too 
strongly about the naming here but something to note.


 > if (!ShouldWaitForSyncRepl())
 >        return;
 > ...
 > for (;;)
 > {
 >    // rest of work
 > }

If we had a walsender already waiting for an ack, and the conditions of 
ShouldWaitForSyncRepl() change, such as disabling 
async_standbys_wait_for_sync_replication or synchronous replication 
it'll still wait since we never re-check the condition.

postgres=# select wait_event from pg_stat_activity where wait_event like 
'AsyncWal%';
               wait_event
--------------------------------------
  AsyncWalSenderWaitForSyncReplication
  AsyncWalSenderWaitForSyncReplication
  AsyncWalSenderWaitForSyncReplication
(3 rows)

postgres=# show synchronous_standby_names;
  synchronous_standby_names
---------------------------

(1 row)

postgres=# show async_standbys_wait_for_sync_replication;
  async_standbys_wait_for_sync_replication
------------------------------------------
  off
(1 row)

 >    LWLockAcquire(SyncRepLock, LW_SHARED);
 >    flushLSN = walsndctl->lsn[SYNC_REP_WAIT_FLUSH];
 >    LWLockRelease(SyncRepLock);

Should we configure this similar to the user's setting of 
synchronous_commit instead of just flush? (SYNC_REP_WAIT_WRITE, 
SYNC_REP_WAIT_APPLY)

Thanks,

John H



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Melanie Plageman
Дата:
Сообщение: Re: why do hash index builds use smgrextend() for new splitpoint pages
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations