Обсуждение: WAL_LOG CREATE DATABASE strategy broken for non-standard page layouts

Поиск
Список
Период
Сортировка

WAL_LOG CREATE DATABASE strategy broken for non-standard page layouts

От
Matthias van de Meent
Дата:
Hi,

My collegue Konstantin Knizhnik pointed out that we fail to mark pages
with a non-standard page layout with page_std=false in
RelationCopyStorageUsingBuffer when we WAL-log them. This causes us to
interpret the registered buffer as a standard buffer, and omit the
hole in the page, which for FSM/VM pages covers the whole page.

The immediate effect of this bug is that replicas and primaries in a
physical replication system won't have the same data in their VM- and
FSM-forks until the first VACUUM on the new database has WAL-logged
these pages again. Whilst not actively harmful for the VM/FSM
subsystems, it's definitely suboptimal.
Secondary unwanted effects are that AMs that use the buffercache- but
which don't use or update the pageheader- also won't see the main data
logged in WAL, thus potentially losing user data in the physical
replication stream or with a system crash. I've not looked for any
such AMs and am unaware of any that would have this issue, but it's
better to fix this.


PFA a patch that fixes this issue, by assuming that all pages in the
source database utilize a non-standard page layout.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Вложения
Matthias van de Meent <boekewurm+postgres@gmail.com> writes:
> PFA a patch that fixes this issue, by assuming that all pages in the
> source database utilize a non-standard page layout.

Surely that cure is worse than the disease?

            regards, tom lane



Re: WAL_LOG CREATE DATABASE strategy broken for non-standard page layouts

От
Matthias van de Meent
Дата:
On Mon, 13 May 2024 at 16:13, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Matthias van de Meent <boekewurm+postgres@gmail.com> writes:
> > PFA a patch that fixes this issue, by assuming that all pages in the
> > source database utilize a non-standard page layout.
>
> Surely that cure is worse than the disease?

I don't know where we would get the information whether the selected
relation fork's pages are standard-compliant. We could base it off of
the fork number (that info is available locally) but that doesn't
guarantee much.
For VM and FSM-pages we know they're essentially never
standard-compliant (hence this thread), but for the main fork it is
anyone's guess once the user has installed an additional AM - which we
don't detect nor pass through to the offending
RelationCopyStorageUsingBuffer.

As for "worse", the default template database is still much smaller
than the working set of most databases. This will indeed regress the
workload a bit, but only by the fraction of holes in the page + all
FSM/VM data.
I think the additional WAL volume during CREATE DATABASE is worth it
when the alternative is losing that data with physical
replication/secondary instances. Note that this does not disable page
compression, it just stops the logging of holes in pages; holes which
generally are only a fraction of the whole database.

It's not inconceivable that this will significantly increase WAL
volume, but I think we should go for correctness rather than fastest
copy. If we went with fastest copy, we'd better just skip logging the
FSM and VM forks because we're already ignoring the data of the pages,
so why not ignore the pages themselves, too? I don't think that holds
water when we want to be crash-proof in CREATE DATABASE, with a full
data copy of the template database.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)



Re: WAL_LOG CREATE DATABASE strategy broken for non-standard page layouts

От
Robert Haas
Дата:
On Mon, May 13, 2024 at 10:53 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> It's not inconceivable that this will significantly increase WAL
> volume, but I think we should go for correctness rather than fastest
> copy.

I don't think we can afford to just do this blindly for the sake of a
hypothetical non-core AM that uses nonstandard pages. There must be
lots of cases where the holes are large, and where the WAL volume
would be a multiple of what it is currently. That's a *big*
regression.

> If we went with fastest copy, we'd better just skip logging the
> FSM and VM forks because we're already ignoring the data of the pages,
> so why not ignore the pages themselves, too? I don't think that holds
> water when we want to be crash-proof in CREATE DATABASE, with a full
> data copy of the template database.

This seems like a red herring. Either assuming standard pages is a
good idea or it isn't, and either logging the FSM and VM forks is a
good idea or it isn't, but those are two separate questions.

--
Robert Haas
EDB: http://www.enterprisedb.com