Обсуждение: Assertion failure in pg_copy_logical_replication_slot()

Поиск
Список
Период
Сортировка

Assertion failure in pg_copy_logical_replication_slot()

От
Fujii Masao
Дата:
Hi,

If restart_lsn of logical replication slot gets behind more than
max_slot_wal_keep_size from the current LSN, the logical replication
slot would be invalidated and its restart_lsn is reset to an invalid LSN.
If this logical replication slot with an invalid restart_lsn is specified
as the source slot in pg_copy_logical_replication_slot(), the function
causes the following assertion failure.

     TRAP: FailedAssertion("!logical_slot", File: "slotfuncs.c", Line: 727)

This assertion failure is caused by

    /* Copying non-reserved slot doesn't make sense */
    if (XLogRecPtrIsInvalid(src_restart_lsn))
        ereport(ERROR,
                (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                 errmsg("cannot copy a replication slot that doesn't reserve WAL")));

I *guess* this assertion check was added because restart_lsn should
not be invalid before. But in v13, it can be invalid thanks to max_slot_wal_keep_size.
I think that this assertion check seems useless and should be removed in v13.
Patch attached. Thought?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Вложения

Re: Assertion failure in pg_copy_logical_replication_slot()

От
Kyotaro Horiguchi
Дата:
At Tue, 23 Jun 2020 00:17:47 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> Hi,
> 
> If restart_lsn of logical replication slot gets behind more than
> max_slot_wal_keep_size from the current LSN, the logical replication
> slot would be invalidated and its restart_lsn is reset to an invalid
> LSN.
> If this logical replication slot with an invalid restart_lsn is
> specified
> as the source slot in pg_copy_logical_replication_slot(), the function
> causes the following assertion failure.

Good catch!

>     TRAP: FailedAssertion("!logical_slot", File: "slotfuncs.c", Line: 727)
> 
> This assertion failure is caused by
> 
>     /* Copying non-reserved slot doesn't make sense */
>     if (XLogRecPtrIsInvalid(src_restart_lsn))
>         ereport(ERROR,
>                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>                  errmsg("cannot copy a replication slot that doesn't reserve
>                  WAL")));
> 
> I *guess* this assertion check was added because restart_lsn should
> not be invalid before. But in v13, it can be invalid thanks to
> max_slot_wal_keep_size.
> I think that this assertion check seems useless and should be removed
> in v13.
> Patch attached. Thought?

Your diagnosis looks correct to me. The assertion failure means that
copy_replication_slot was not exercised at least for a non-reserving
logical slots. Greping "pg_copy_logical_replication_slot" on src/test
showed nothing so I doubt we are exercising the function.

Don't we need some?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Assertion failure in pg_copy_logical_replication_slot()

От
Fujii Masao
Дата:

On 2020/06/23 18:42, Kyotaro Horiguchi wrote:
> At Tue, 23 Jun 2020 00:17:47 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
>> Hi,
>>
>> If restart_lsn of logical replication slot gets behind more than
>> max_slot_wal_keep_size from the current LSN, the logical replication
>> slot would be invalidated and its restart_lsn is reset to an invalid
>> LSN.
>> If this logical replication slot with an invalid restart_lsn is
>> specified
>> as the source slot in pg_copy_logical_replication_slot(), the function
>> causes the following assertion failure.
> 
> Good catch!
> 
>>      TRAP: FailedAssertion("!logical_slot", File: "slotfuncs.c", Line: 727)
>>
>> This assertion failure is caused by
>>
>>     /* Copying non-reserved slot doesn't make sense */
>>     if (XLogRecPtrIsInvalid(src_restart_lsn))
>>         ereport(ERROR,
>>                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>>                  errmsg("cannot copy a replication slot that doesn't reserve
>>                  WAL")));
>>
>> I *guess* this assertion check was added because restart_lsn should
>> not be invalid before. But in v13, it can be invalid thanks to
>> max_slot_wal_keep_size.
>> I think that this assertion check seems useless and should be removed
>> in v13.
>> Patch attached. Thought?
> 
> Your diagnosis looks correct to me.

Thanks for the check! I will commit the patch later.

> The assertion failure means that
> copy_replication_slot was not exercised at least for a non-reserving
> logical slots. Greping "pg_copy_logical_replication_slot" on src/test
> showed nothing so I doubt we are exercising the function.
> 
> Don't we need some?

Yes, increasing the test coverage sounds helpful!

Regards,


-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Assertion failure in pg_copy_logical_replication_slot()

От
Alvaro Herrera
Дата:
On 2020-Jun-23, Fujii Masao wrote:

> If restart_lsn of logical replication slot gets behind more than
> max_slot_wal_keep_size from the current LSN, the logical replication
> slot would be invalidated and its restart_lsn is reset to an invalid LSN.
> If this logical replication slot with an invalid restart_lsn is specified
> as the source slot in pg_copy_logical_replication_slot(), the function
> causes the following assertion failure.
> 
>     TRAP: FailedAssertion("!logical_slot", File: "slotfuncs.c", Line: 727)

Oops.

> This assertion failure is caused by
> 
>     /* Copying non-reserved slot doesn't make sense */
>     if (XLogRecPtrIsInvalid(src_restart_lsn))
>         ereport(ERROR,
>                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>                  errmsg("cannot copy a replication slot that doesn't reserve WAL")));

Heh, you pasted the code after your patch rather than the original.

I think the errcode is a bit bogus considering the new case.
IMO ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE is more appropriate.

One could argue that the error message could also be different for the
case of a logical slot (or even a physical slot that has the upcoming
"invalidated_at" LSN set), maybe "cannot copy a replication slot that
has been invalidated" but maybe that's a pointless distinction.
I don't object to the patch as presented.


-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Assertion failure in pg_copy_logical_replication_slot()

От
Fujii Masao
Дата:

On 2020/06/24 9:38, Alvaro Herrera wrote:
> On 2020-Jun-23, Fujii Masao wrote:
> 
>> If restart_lsn of logical replication slot gets behind more than
>> max_slot_wal_keep_size from the current LSN, the logical replication
>> slot would be invalidated and its restart_lsn is reset to an invalid LSN.
>> If this logical replication slot with an invalid restart_lsn is specified
>> as the source slot in pg_copy_logical_replication_slot(), the function
>> causes the following assertion failure.
>>
>>      TRAP: FailedAssertion("!logical_slot", File: "slotfuncs.c", Line: 727)
> 
> Oops.
> 
>> This assertion failure is caused by
>>
>>     /* Copying non-reserved slot doesn't make sense */
>>     if (XLogRecPtrIsInvalid(src_restart_lsn))
>>         ereport(ERROR,
>>                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>>                  errmsg("cannot copy a replication slot that doesn't reserve WAL")));
> 
> Heh, you pasted the code after your patch rather than the original.

oh.... sorry.


> I think the errcode is a bit bogus considering the new case.
> IMO ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE is more appropriate.

Agreed. So I updated the patch so this errcode is used instead.
Patch attached.


> One could argue that the error message could also be different for the
> case of a logical slot (or even a physical slot that has the upcoming
> "invalidated_at" LSN set), maybe "cannot copy a replication slot that
> has been invalidated" but maybe that's a pointless distinction.
> I don't object to the patch as presented.

I have no strong opinion about this, but for now I kept the message as it is.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Вложения

Re: Assertion failure in pg_copy_logical_replication_slot()

От
Alvaro Herrera
Дата:
On 2020-Jun-24, Fujii Masao wrote:

> > I think the errcode is a bit bogus considering the new case.
> > IMO ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE is more appropriate.
> 
> Agreed. So I updated the patch so this errcode is used instead.
> Patch attached.

LGTM.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Assertion failure in pg_copy_logical_replication_slot()

От
Fujii Masao
Дата:

On 2020/06/24 23:58, Alvaro Herrera wrote:
> On 2020-Jun-24, Fujii Masao wrote:
> 
>>> I think the errcode is a bit bogus considering the new case.
>>> IMO ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE is more appropriate.
>>
>> Agreed. So I updated the patch so this errcode is used instead.
>> Patch attached.
> 
> LGTM.

Thanks! Pushed.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION