Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths

Поиск
Список
Период
Сортировка
От Matthias van de Meent
Тема Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths
Дата
Msg-id CAEze2Wjyvkip6CiqLJoEL-_BDdcJUHB6QCEReX=RHgQUCfUSuQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths  (David Zhang <david.zhang@highgo.ca>)
Ответы Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths  (David Zhang <david.zhang@highgo.ca>)
Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
On Sat, 11 Jun 2022 at 01:32, David Zhang <david.zhang@highgo.ca> wrote:
>
> Hi,
>
> > > MaxAllocSize is pretty easy:
> > > SELECT pg_logical_emit_message(false, long, long) FROM repeat(repeat(' ', 1024), 1024*1023) as l(long);
> > >
> > > on a standby:
> > >
> > > 2022-03-11 16:41:59.336 PST [3639744][startup][1/0:0] LOG:  record length 2145386550 at 0/3000060 too long
> >
> > Thanks for the reference. I was already playing around with 2PC log
> > records (which can theoretically contain >4GB of data); but your
> > example is much easier and takes significantly less time.
>
> A little confused here, does this patch V3 intend to solve this problem "record length 2145386550 at 0/3000060 too
long"?

No, not once the record exists. But it does remove Postgres' ability
to create such records, thereby solving the problem for all systems
that generate WAL through Postgres' WAL writing APIs.

> I set up a simple Primary and Standby stream replication environment, and use the above query to run the test for
beforeand after patch v3. The error message still exist, but with different message.
 
>
> Before patch v3, the error is showing below,
>
> 2022-06-10 15:32:25.307 PDT [4253] LOG:  record length 2145386550 at 0/3000060 too long
> 2022-06-10 15:32:47.763 PDT [4257] FATAL:  terminating walreceiver process due to administrator command
> 2022-06-10 15:32:47.763 PDT [4253] LOG:  record length 2145386550 at 0/3000060 too long
>
> After patch v3, the error displays differently
>
> 2022-06-10 15:53:53.397 PDT [12848] LOG:  record length 2145386550 at 0/3000060 too long
> 2022-06-10 15:54:07.249 PDT [12852] FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment
000000010000000000000045has already been removed
 
> 2022-06-10 15:54:07.275 PDT [12848] LOG:  record length 2145386550 at 0/3000060 too long
>
> And once the error happens, then the Standby can't continue the replication.

Did you initiate a new cluster or otherwise skip the invalid record
you generated when running the instance based on master? It seems to
me you're trying to replay the invalid record (len > MaxAllocSize),
and this patch does not try to fix that issue. This patch just tries
to forbid emitting records larger than MaxAllocSize, as per the check
in XLogRecordAssemble, so that we wont emit unreadable records into
the WAL anymore.

Reading unreadable records still won't be possible, but that's also
not something I'm trying to fix.

> Is a particular reason to say "more datas" at line 52 in patch v3?
>
> + * more datas than are being accounted for by the XLog infrastructure.

Yes. This error is thrown when you try to register a 34th block, or an
Nth rdata where the caller previously only reserved n - 1 data slots.
As such 'datas', for the num_rdatas and max_rdatas variables.

Thanks for looking at the patch.

- Matthias



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Subscription tests vs log_error_verbosity
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: Collation version tracking for macOS