Re: Tighten error control for OpenTransientFile/CloseTransientFile

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Tighten error control for OpenTransientFile/CloseTransientFile
Дата
Msg-id 20191004203938.7a34jv4lnrkb56qu@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Tighten error control for OpenTransientFile/CloseTransientFile  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: Tighten error control for OpenTransientFile/CloseTransientFile  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
Hi,

On 2019-03-07 10:56:25 +0900, Michael Paquier wrote:
> diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
> index f5cf9ffc9c..bce4274362 100644
> --- a/src/backend/access/heap/rewriteheap.c
> +++ b/src/backend/access/heap/rewriteheap.c
> @@ -1202,7 +1202,10 @@ heap_xlog_logical_rewrite(XLogReaderState *r)
>                   errmsg("could not fsync file \"%s\": %m", path)));
>      pgstat_report_wait_end();
>
> -    CloseTransientFile(fd);
> +    if (CloseTransientFile(fd))
> +        ereport(ERROR,
> +                (errcode_for_file_access(),
> +                 errmsg("could not close file \"%s\": %m", path)));
>  }
...
> diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
> index 64679dd2de..21986e48fe 100644
> --- a/src/backend/access/transam/twophase.c
> +++ b/src/backend/access/transam/twophase.c
> @@ -1297,7 +1297,11 @@ ReadTwoPhaseFile(TransactionId xid, bool missing_ok)
>      }
>
>      pgstat_report_wait_end();
> -    CloseTransientFile(fd);
> +
> +    if (CloseTransientFile(fd))
> +        ereport(ERROR,
> +                (errcode_for_file_access(),
> +                 errmsg("could not close file \"%s\": %m", path)));
>
>      hdr = (TwoPhaseFileHeader *) buf;
>      if (hdr->magic != TWOPHASE_MAGIC)
> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
> index 0fdd82a287..c7047738b6 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -3469,7 +3469,10 @@ XLogFileCopy(XLogSegNo destsegno, TimeLineID srcTLI, XLogSegNo srcsegno,
>                  (errcode_for_file_access(),
>                   errmsg("could not close file \"%s\": %m", tmppath)));
>
> -    CloseTransientFile(srcfd);
> +    if (CloseTransientFile(srcfd))
> +        ereport(ERROR,
> +                (errcode_for_file_access(),
> +                 errmsg("could not close file \"%s\": %m", path)));
>
>      /*
>       * Now move the segment into place with its final name.
...

This seems like an odd set of changes to me. What is this supposed to
buy us?  The commit message says:
    2) When opening transient files, it is up to the caller to close the
    file descriptors opened.  In error code paths, CloseTransientFile() gets
    called to clean up things before issuing an error.  However in normal
    exit paths, a lot of callers of CloseTransientFile() never actually
    reported errors, which could leave a file descriptor open without
    knowing about it.  This is an issue I complained about a couple of
    times, but never had the courage to write and submit a patch, so here we
    go.

but that reasoning seems bogus to me. For one, on just about any
platform close always closes the fd, even when returning an error
(unless you pass in a bad fd, in which case it obviously doesn't). So
the reasoning that this fixes unnoticed fd leaks doesn't really seem to
make sense. For another, even if it did, throwing an ERROR seems to
achieve very little: We continue with a leaked fd *AND* we cause the
operation to error out.

I can see reasoning for:
- LOG, so it can be noticed, but operations continue to work
- FATAL, to fix the leak
- PANIC, so we recover from the problem, in case of the close indicating
  a durability issue

  commit 9ccdd7f66e3324d2b6d3dec282cfa9ff084083f1
  Author: Thomas Munro <tmunro@postgresql.org>
  Date:   2018-11-19 13:31:10 +1300

      PANIC on fsync() failure.

but ERROR seems to have very little going for it.

The durability argument doesn't seem to apply for the cases where we
previously fsynced the file, a significant fraction of the locations you
touched.

And if your goal was just to achieve consistency, I also don't
understand, because you left plenty close()'s unchecked? E.g. you added
an error check in get_controlfile(), but not one in
ReadControlFile(). alterSystemSetConfigFile() writes, but you didn't add
one.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: Proposal: Make use of C99 designated initialisers fornulls/values arrays
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Proposal: Make use of C99 designated initialisers fornulls/values arrays