Обсуждение: pgsql: Replace existing durable_rename_excl() calls with durable_rename

Поиск
Список
Период
Сортировка

pgsql: Replace existing durable_rename_excl() calls with durable_rename

От
Michael Paquier
Дата:
Replace existing durable_rename_excl() calls with durable_rename()

durable_rename_excl() attempts to avoid overwriting any existing files
by using link() and unlink(), falling back to rename() on some platforms
(e.g., Windows where link() followed by unlink() is not concurrent-safe,
see 909b449).  Most callers of durable_rename_excl() use it just in case
there is an existing file, but it happens that for all of them we never
expect a target file to exist (WAL segment recycling, creation of
timeline history file and basic_archive).

basic_archive used durable_rename_excl() to avoid overwriting an archive
concurrently created by another server.  Now, there is a stat() call to
avoid overwriting an existing archive a couple of lines above, so note
that this change opens a small TOCTOU window in this module between the
stat() call and durable_rename().

Furthermore, as mentioned in the top comment of durable_rename_excl(),
this routine can result in multiple hard links to the same file and data
corruption, with two or more links to the same file in pg_wal/ if a
crash happens before the unlink() call during WAL recycling.
Specifically, this would produce links to the same file for the current
WAL file and the next one because the half-recycled WAL file was
re-recycled during crash recovery of a follow-up cluster restart.

This change replaces all calls to durable_rename_excl() with
durable_rename().  This removes the protection against accidentally
overwriting an existing file, but some platforms are already living
without it, and all those code paths never expect an existing file (a
couple of assertions are added to check after that, in case).

This is a bug fix, but knowing the unlikeliness of the problem involving
one of more crashes at an exceptionally bad moment, no backpatch is
done.  This could be revisited in the future.

Author: Nathan Bossart
Reviewed-by: Robert Haas, Kyotaro Horiguchi, Michael Paquier
Discussion: https://postgr.es/m/20220407182954.GA1231544@nathanxps13

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/ccfbd9287d70038518bdd3e85d7f5fd3dd1bb880

Modified Files
--------------
contrib/basic_archive/basic_archive.c |  5 +++--
src/backend/access/transam/timeline.c | 18 ++++++------------
src/backend/access/transam/xlog.c     | 10 ++++------
3 files changed, 13 insertions(+), 20 deletions(-)


Re: pgsql: Replace existing durable_rename_excl() calls with durable_rename

От
Tom Lane
Дата:
Michael Paquier <michael@paquier.xyz> writes:
> This change replaces all calls to durable_rename_excl() with
> durable_rename().  This removes the protection against accidentally
> overwriting an existing file, but some platforms are already living
> without it, and all those code paths never expect an existing file (a
> couple of assertions are added to check after that, in case).

Assert'ing that an external (filesystem) condition holds seems like a bad
idea.  See buildfarm.

            regards, tom lane



Re: pgsql: Replace existing durable_rename_excl() calls with durable_rename

От
Michael Paquier
Дата:
On Wed, Apr 27, 2022 at 11:28:18PM -0400, Tom Lane wrote:
> Assert'ing that an external (filesystem) condition holds seems like a bad
> idea.  See buildfarm.

Yes, I saw the failures:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel&dt=2022-04-28%2002%3A13%3A27
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2022-04-28%2002%3A14%3A08
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=calliphoridae&dt=2022-04-28%2002%3A59%3A26

All of them are on 025_stuck_on_old_timeline.pl when the WAL receiver
writes a TLI history file when fetching it from a primary, meaning
that we need to think harder about this case as we would overwrite an
existing TLI file once we use durable_rename() over
durable_rename_excl().

I don't want to keep the buildfarm unstable, so I'll revert for now.
--
Michael

Вложения