Re: prevent immature WAL streaming

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: prevent immature WAL streaming
Дата
Msg-id 20210903.160904.904317102157226316.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на prevent immature WAL streaming  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Список pgsql-hackers
At Thu, 2 Sep 2021 18:43:33 -0400, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in 
> On 2021-Sep-02, Kyotaro Horiguchi wrote:
> 
> > So, this is a crude PoC of that.
> 
> I had ended up with something very similar, except I was trying to cram
> the flag via the checkpoint record instead of hacking
> AdvanceXLInsertBuffer().  I removed that stuff and merged both, here's
> the result.
> 
> > 1. This patch is written on the current master, but it doesn't
> >   interfare with the seg-boundary-memorize patch since it removes the
> >   calls to RegisterSegmentBoundary.
> 
> I rebased on top of the revert patch.

Thanks!

> > 2. Since xlogreader cannot emit a log-message immediately, we don't
> >   have a means to leave a log message to inform recovery met an
> >   aborted partial continuation record. (In this PoC, it is done by
> >   fprintf:p)
> 
> Shrug.  We can just use an #ifndef FRONTEND / elog(LOG).  (I didn't keep
> this part, sorry.)

No problem, it was mere a develop-time message for behavior
observation.

> > 3. Myebe we need to pg_waldump to show partial continuation records,
> >   but I'm not sure how to realize that.
> 
> Ah yes, we'll need to fix that.

I just believe 0001 does the right thing.

0002:
> +    XLogRecPtr    abortedContrecordPtr; /* LSN of incomplete record at end of
> +                                       * WAL */

The name sounds like the start LSN. doesn't contrecordAbort(ed)Ptr work?

>              if (!(pageHeader->xlp_info & XLP_FIRST_IS_CONTRECORD))
>              {
>                  report_invalid_record(state,
>                                        "there is no contrecord flag at %X/%X",
>                                        LSN_FORMAT_ARGS(RecPtr));
> -                goto err;
> +                goto aborted_contrecord;

This loses the exclusion check between XLP_FIRST_IS_CONTRECORD and
_IS_ABROTED_PARTIAL.  Is it okay?  (I don't object to remove the check.).

I didin't thought this as an aborted contrecord. but on second
thought, when we see a record broken in any style, we stop recovery at
the point. I agree to the change and all the silmiar changes.

+                    /* XXX should we goto aborted_contrecord here? */

I think it should be aborted_contrecord.

When that happens, the loaded bytes actually looked like the first
fragment of a continuation record to xlogreader, even if the cause
were a broken total_len.  So if we abort the record there, the next
time xlogreader will meet XLP_FIRST_IS_ABORTED_PARTIAL at the same
page, and correctly finds a new record there.

On the other hand if we just errored-out there, we will step-back to
the beginning of the broken record in the previous page or segment
then start writing a new record there but that is exactly what we want
to avoid now.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Unused variable in TAP tests file
Следующее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: Improve logging when using Huge Pages