Обсуждение: Assertion failure in pg_copy_logical_replication_slot()
Hi, If restart_lsn of logical replication slot gets behind more than max_slot_wal_keep_size from the current LSN, the logical replication slot would be invalidated and its restart_lsn is reset to an invalid LSN. If this logical replication slot with an invalid restart_lsn is specified as the source slot in pg_copy_logical_replication_slot(), the function causes the following assertion failure. TRAP: FailedAssertion("!logical_slot", File: "slotfuncs.c", Line: 727) This assertion failure is caused by /* Copying non-reserved slot doesn't make sense */ if (XLogRecPtrIsInvalid(src_restart_lsn)) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("cannot copy a replication slot that doesn't reserve WAL"))); I *guess* this assertion check was added because restart_lsn should not be invalid before. But in v13, it can be invalid thanks to max_slot_wal_keep_size. I think that this assertion check seems useless and should be removed in v13. Patch attached. Thought? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Вложения
At Tue, 23 Jun 2020 00:17:47 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > Hi, > > If restart_lsn of logical replication slot gets behind more than > max_slot_wal_keep_size from the current LSN, the logical replication > slot would be invalidated and its restart_lsn is reset to an invalid > LSN. > If this logical replication slot with an invalid restart_lsn is > specified > as the source slot in pg_copy_logical_replication_slot(), the function > causes the following assertion failure. Good catch! > TRAP: FailedAssertion("!logical_slot", File: "slotfuncs.c", Line: 727) > > This assertion failure is caused by > > /* Copying non-reserved slot doesn't make sense */ > if (XLogRecPtrIsInvalid(src_restart_lsn)) > ereport(ERROR, > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > errmsg("cannot copy a replication slot that doesn't reserve > WAL"))); > > I *guess* this assertion check was added because restart_lsn should > not be invalid before. But in v13, it can be invalid thanks to > max_slot_wal_keep_size. > I think that this assertion check seems useless and should be removed > in v13. > Patch attached. Thought? Your diagnosis looks correct to me. The assertion failure means that copy_replication_slot was not exercised at least for a non-reserving logical slots. Greping "pg_copy_logical_replication_slot" on src/test showed nothing so I doubt we are exercising the function. Don't we need some? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2020/06/23 18:42, Kyotaro Horiguchi wrote: > At Tue, 23 Jun 2020 00:17:47 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in >> Hi, >> >> If restart_lsn of logical replication slot gets behind more than >> max_slot_wal_keep_size from the current LSN, the logical replication >> slot would be invalidated and its restart_lsn is reset to an invalid >> LSN. >> If this logical replication slot with an invalid restart_lsn is >> specified >> as the source slot in pg_copy_logical_replication_slot(), the function >> causes the following assertion failure. > > Good catch! > >> TRAP: FailedAssertion("!logical_slot", File: "slotfuncs.c", Line: 727) >> >> This assertion failure is caused by >> >> /* Copying non-reserved slot doesn't make sense */ >> if (XLogRecPtrIsInvalid(src_restart_lsn)) >> ereport(ERROR, >> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), >> errmsg("cannot copy a replication slot that doesn't reserve >> WAL"))); >> >> I *guess* this assertion check was added because restart_lsn should >> not be invalid before. But in v13, it can be invalid thanks to >> max_slot_wal_keep_size. >> I think that this assertion check seems useless and should be removed >> in v13. >> Patch attached. Thought? > > Your diagnosis looks correct to me. Thanks for the check! I will commit the patch later. > The assertion failure means that > copy_replication_slot was not exercised at least for a non-reserving > logical slots. Greping "pg_copy_logical_replication_slot" on src/test > showed nothing so I doubt we are exercising the function. > > Don't we need some? Yes, increasing the test coverage sounds helpful! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2020-Jun-23, Fujii Masao wrote: > If restart_lsn of logical replication slot gets behind more than > max_slot_wal_keep_size from the current LSN, the logical replication > slot would be invalidated and its restart_lsn is reset to an invalid LSN. > If this logical replication slot with an invalid restart_lsn is specified > as the source slot in pg_copy_logical_replication_slot(), the function > causes the following assertion failure. > > TRAP: FailedAssertion("!logical_slot", File: "slotfuncs.c", Line: 727) Oops. > This assertion failure is caused by > > /* Copying non-reserved slot doesn't make sense */ > if (XLogRecPtrIsInvalid(src_restart_lsn)) > ereport(ERROR, > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > errmsg("cannot copy a replication slot that doesn't reserve WAL"))); Heh, you pasted the code after your patch rather than the original. I think the errcode is a bit bogus considering the new case. IMO ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE is more appropriate. One could argue that the error message could also be different for the case of a logical slot (or even a physical slot that has the upcoming "invalidated_at" LSN set), maybe "cannot copy a replication slot that has been invalidated" but maybe that's a pointless distinction. I don't object to the patch as presented. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020/06/24 9:38, Alvaro Herrera wrote: > On 2020-Jun-23, Fujii Masao wrote: > >> If restart_lsn of logical replication slot gets behind more than >> max_slot_wal_keep_size from the current LSN, the logical replication >> slot would be invalidated and its restart_lsn is reset to an invalid LSN. >> If this logical replication slot with an invalid restart_lsn is specified >> as the source slot in pg_copy_logical_replication_slot(), the function >> causes the following assertion failure. >> >> TRAP: FailedAssertion("!logical_slot", File: "slotfuncs.c", Line: 727) > > Oops. > >> This assertion failure is caused by >> >> /* Copying non-reserved slot doesn't make sense */ >> if (XLogRecPtrIsInvalid(src_restart_lsn)) >> ereport(ERROR, >> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), >> errmsg("cannot copy a replication slot that doesn't reserve WAL"))); > > Heh, you pasted the code after your patch rather than the original. oh.... sorry. > I think the errcode is a bit bogus considering the new case. > IMO ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE is more appropriate. Agreed. So I updated the patch so this errcode is used instead. Patch attached. > One could argue that the error message could also be different for the > case of a logical slot (or even a physical slot that has the upcoming > "invalidated_at" LSN set), maybe "cannot copy a replication slot that > has been invalidated" but maybe that's a pointless distinction. > I don't object to the patch as presented. I have no strong opinion about this, but for now I kept the message as it is. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Вложения
On 2020-Jun-24, Fujii Masao wrote: > > I think the errcode is a bit bogus considering the new case. > > IMO ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE is more appropriate. > > Agreed. So I updated the patch so this errcode is used instead. > Patch attached. LGTM. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020/06/24 23:58, Alvaro Herrera wrote: > On 2020-Jun-24, Fujii Masao wrote: > >>> I think the errcode is a bit bogus considering the new case. >>> IMO ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE is more appropriate. >> >> Agreed. So I updated the patch so this errcode is used instead. >> Patch attached. > > LGTM. Thanks! Pushed. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION