Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths

Поиск
Список
Период
Сортировка
От Matthias van de Meent
Тема Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths
Дата
Msg-id CAEze2Wg8SYN0DAL6HE7WifP+O9ukPttyjwbqd=e4L2mTTrGv=Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
On Wed, 13 Jul 2022 at 07:54, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Jul 11, 2022 at 02:26:46PM +0200, Matthias van de Meent wrote:
> > Thanks for reviewing.
>
> I think that v6 is over-engineered because there should be no need to
> add a check in xlogreader.c as long as the origin of the problem is
> blocked, no?  And the origin here is when the record is assembled.  At
> least this is the cleanest solution for HEAD, but not in the
> back-branches if we'd care about doing something with records already
> generated, and I am not sure that we need to care about other things
> than HEAD, TBH.

I would prefer it if we would fix the "cannot catch up to primary
because of oversized WAL record" issue in backbranches too. Rather
than failing to recover after failure or breaking replication streams,
I'd rather be unable to write the singular offending WAL record and
break up to one transaction.

> So it seems to me that there is no need to create a
> XLogRecMaxLength which is close to a duplicate of
> DecodeXLogRecordRequiredSpace().
>
> @@ -519,7 +549,7 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
>                    XLogRecPtr *fpw_lsn, int *num_fpi, bool *topxid_included)
>
>  {
>     XLogRecData *rdt;
> -   uint32      total_len = 0;
> +   uint64      total_len = 0;
> This has no need to change.
>
> My suggestion from upthread was close to what you proposed, but I had
> in mind something simpler, as of:
>
> +   /*
> +    * Ensure that xlogreader.c can read the record.
> +    */
> +   if (unlikely(!AllocSizeIsValid(DecodeXLogRecordRequiredSpace(total_len))))
> +       elog(ERROR, "too much WAL data");

Huh, yeah, I hadn't thought of that, but that's much simpler indeed.

> This would be the amount of data allocated by the WAL reader when it
> is possible to allocate an oversized record, related to the business
> of the circular buffer depending on if the read is blocking or not.

Yes, I see your point.

> Among the two problems to solve at hand, the parts where the APIs are
> changed and made more robust with unsigned types and where block data
> is not overflowed with its 16-byte limit are committable, so I'd like
> to do that first (still need to check its performance with some micro
> benchmark on XLogRegisterBufData()).

> The second part to block the
> creation of the assembled record is simpler, now
> DecodeXLogRecordRequiredSpace() would make the path a bit hotter,
> though we could inline it in the worst case?

I think that would be better for performance, yes.
DecodeXLogRecordRequiredSpace will already be optimized to just a
single addition by any of `-O[123]`, so keeping this indirection is
quite expensive (relative to the operation being performed).

As for your patch patch:

> +XLogRegisterData(char *data, uint32 len)
> {
>     XLogRecData *rdata;
>
>     Assert(begininsert_called);
>
> -    if (num_rdatas >= max_rdatas)
> +    if (unlikely(num_rdatas >= max_rdatas))
>         elog(ERROR, "too much WAL data");
>     rdata = &rdatas[num_rdatas++];

XLogRegisterData is designed to be called multiple times for each
record, and this allows the user of the API to overflow the internal
mainrdata_len field if we don't check that the field does not exceed
the maximum record length (or overflow the 32-bit field). As such, I'd
still want a len-check in that function.

I'll send an updated patch by tomorrow.

- Matthias



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths
Следующее
От: Junwang Zhao
Дата:
Сообщение: [PATCH v1] strengthen backup history filename check