Re: Assertion being hit during WAL replay

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Assertion being hit during WAL replay
Дата
Msg-id 20230411220302.vzikb3tustz7lvqw@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: Assertion being hit during WAL replay  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Assertion being hit during WAL replay  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Hi,

On 2023-04-11 16:54:53 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2023-04-11 14:48:44 -0400, Tom Lane wrote:
> >> I have seen this failure a couple of times recently while
> >> testing code that caused crashes and restarts:
> 
> > Do you have a quick repro recipe?
> 
> Here's something related to what I hit that time:
> 
> diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
> index 052263aea6..d43a7c7bcb 100644
> --- a/src/backend/optimizer/plan/subselect.c
> +++ b/src/backend/optimizer/plan/subselect.c
> @@ -2188,6 +2188,7 @@ SS_charge_for_initplans(PlannerInfo *root, RelOptInfo *final_rel)
>  void
>  SS_attach_initplans(PlannerInfo *root, Plan *plan)
>  {
> +   Assert(root->init_plans == NIL);
>     plan->initPlan = root->init_plans;
>  }
>  
> You won't get through initdb with this, but if you install this change
> into a successfully init'd database and then "make installcheck-parallel",
> it will crash and then fail to recover, at least a lot of the time.

Ah, that allowed me to reproduce. Thanks.


Took me a bit to understand how we actually get into this situation. A PRUNE
record for relation+block that doesn't exist during recovery. That doesn't
commonly happen outside of PITR or such, because we obviously need a block
with content to generate the PRUNE. The way it does happen here, is that the
relation is vacuumed and then truncated. Then we crash. Thus we end up with a
PRUNE record for a block that doesn't exist on disk.

Which is also why the test is quite timing sensitive.

Seems like it'd be good to have a test that covers this scenario. There's
plenty code around it that doesn't currently get exercised.

None of the existing tests seem like a great fit. I guess it could be added to
013_crash_restart, but that really focuses on something else.

So I guess I'll write a 036_notsureyet.pl...

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: Can we do something to help stop users mistakenly using force_parallel_mode?
Следующее
От: Justin Pryzby
Дата:
Сообщение: Re: Various typo fixes