Обсуждение: Tighten error control for OpenTransientFile/CloseTransientFile

Поиск
Список
Период
Сортировка

Tighten error control for OpenTransientFile/CloseTransientFile

От
Michael Paquier
Дата:
Hi all,

Joe's message here has reminded me that we have lacked a lot of error
handling around CloseTransientFile():
https://www.postgresql.org/message-id/c49b69ec-e2f7-ff33-4f17-0eaa4f2cef27@joeconway.com

This has been mentioned by Alvaro a couple of months ago (cannot find
the thread about that at quick glance), and I just forgot about it at
that time.  Anyway, attached is a patch to do some cleanup for all
that:
- Switch OpenTransientFile to read-only where sufficient.
- Add more error handling for CloseTransientFile
A major take of this patch is to make sure that the new error messages
generated have an elevel consistent with their neighbors.

Just on time for this last CF.  Thoughts?
--
Michael

Вложения

Re: Tighten error control for OpenTransientFile/CloseTransientFile

От
Joe Conway
Дата:
On 2/28/19 9:33 PM, Michael Paquier wrote:
> Hi all,
>
> Joe's message here has reminded me that we have lacked a lot of error
> handling around CloseTransientFile():
> https://www.postgresql.org/message-id/c49b69ec-e2f7-ff33-4f17-0eaa4f2cef27@joeconway.com
>
> This has been mentioned by Alvaro a couple of months ago (cannot find
> the thread about that at quick glance), and I just forgot about it at
> that time.  Anyway, attached is a patch to do some cleanup for all
> that:
> - Switch OpenTransientFile to read-only where sufficient.
> - Add more error handling for CloseTransientFile
> A major take of this patch is to make sure that the new error messages
> generated have an elevel consistent with their neighbors.
>
> Just on time for this last CF.  Thoughts?

Seems like it would be better to modify the arguments to
CloseTransientFile() to include the filename being closed, errorlevel,
and fail_on_error or something similar. Then all the repeated ereport
stanzas could be eliminated.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Вложения

Re: Tighten error control for OpenTransientFile/CloseTransientFile

От
Michael Paquier
Дата:
On Fri, Mar 01, 2019 at 05:05:54PM -0500, Joe Conway wrote:
> Seems like it would be better to modify the arguments to
> CloseTransientFile() to include the filename being closed, errorlevel,
> and fail_on_error or something similar. Then all the repeated ereport
> stanzas could be eliminated.

Sure.  Now some code paths close file descriptors without having at
hand the file name, which would mean that we'd need to pass NULL as
argument in this case.  That's not really elegant in my opinion.  And
having a consistent mapping with the system's close() is not really
bad to me either..
--
Michael

Вложения

Re: Tighten error control for OpenTransientFile/CloseTransientFile

От
Georgios Kokolatos
Дата:
Overall the patch looks good and according to the previous discussion fulfils its purpose.

It might be worthwhile to also check for errors on close in SaveSlotToPath().

        pgstat_report_wait_end();

        CloseTransientFile(fd);

        /* rename to permanent file, fsync file and directory */
        if (rename(tmppath, path) != 0)

Re: Tighten error control for OpenTransientFile/CloseTransientFile

От
Robert Haas
Дата:
On Fri, Mar 1, 2019 at 5:06 PM Joe Conway <mail@joeconway.com> wrote:
> Seems like it would be better to modify the arguments to
> CloseTransientFile() to include the filename being closed, errorlevel,
> and fail_on_error or something similar. Then all the repeated ereport
> stanzas could be eliminated.

Hmm.  I'm not sure that really saves much in terms of notation, and
it's less flexible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Tighten error control for OpenTransientFile/CloseTransientFile

От
Michael Paquier
Дата:
On Wed, Mar 06, 2019 at 02:54:52PM +0000, Georgios Kokolatos wrote:
> Overall the patch looks good and according to the previous
> discussion fulfils its purpose.
>
> It might be worthwhile to also check for errors on close in
> SaveSlotToPath().

Thanks for the feedback, added.  I have spent some time
double-checking this stuff, and noticed that the new errors in
StartupReplicationOrigin() and CheckPointReplicationOrigin() should be
switched from ERROR to PANIC to be consistent.  One message in
dsm_impl_mmap() was not consistent either.

Are there any objections if I commit this patch?
--
Michael

Вложения

Re: Tighten error control for OpenTransientFile/CloseTransientFile

От
Georgios Kokolatos
Дата:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:       not tested
Spec compliant:           not tested
Documentation:            not tested

The second version of this patch seems to be in order and ready for committer.

Thank you for taking the time to code!

The new status of this patch is: Ready for Committer

Re: Tighten error control for OpenTransientFile/CloseTransientFile

От
Alvaro Herrera
Дата:
On 2019-Mar-07, Michael Paquier wrote:

>  #else
> -    close(fd);
> +    if (close(fd))
> +    {
> +        fprintf(stderr, _("%s: could not close file \"%s\": %s"),
> +                progname, ControlFilePath, strerror(errno));
> +        exit(EXIT_FAILURE);
> +    }
>  #endif

I think this one needs a terminating \n.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Tighten error control for OpenTransientFile/CloseTransientFile

От
Michael Paquier
Дата:
On Thu, Mar 07, 2019 at 10:00:05PM -0300, Alvaro Herrera wrote:
> I think this one needs a terminating \n.

Argh...  Thanks for the lookup, Alvaro.
--
Michael

Вложения

Re: Tighten error control for OpenTransientFile/CloseTransientFile

От
Michael Paquier
Дата:
On Fri, Mar 08, 2019 at 10:23:24AM +0900, Michael Paquier wrote:
> Argh...  Thanks for the lookup, Alvaro.

And committed, after an extra pass to beautify the whole experience.
--
Michael

Вложения

Re: Tighten error control for OpenTransientFile/CloseTransientFile

От
Andres Freund
Дата:
Hi,

On 2019-03-07 10:56:25 +0900, Michael Paquier wrote:
> diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
> index f5cf9ffc9c..bce4274362 100644
> --- a/src/backend/access/heap/rewriteheap.c
> +++ b/src/backend/access/heap/rewriteheap.c
> @@ -1202,7 +1202,10 @@ heap_xlog_logical_rewrite(XLogReaderState *r)
>                   errmsg("could not fsync file \"%s\": %m", path)));
>      pgstat_report_wait_end();
>
> -    CloseTransientFile(fd);
> +    if (CloseTransientFile(fd))
> +        ereport(ERROR,
> +                (errcode_for_file_access(),
> +                 errmsg("could not close file \"%s\": %m", path)));
>  }
...
> diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
> index 64679dd2de..21986e48fe 100644
> --- a/src/backend/access/transam/twophase.c
> +++ b/src/backend/access/transam/twophase.c
> @@ -1297,7 +1297,11 @@ ReadTwoPhaseFile(TransactionId xid, bool missing_ok)
>      }
>
>      pgstat_report_wait_end();
> -    CloseTransientFile(fd);
> +
> +    if (CloseTransientFile(fd))
> +        ereport(ERROR,
> +                (errcode_for_file_access(),
> +                 errmsg("could not close file \"%s\": %m", path)));
>
>      hdr = (TwoPhaseFileHeader *) buf;
>      if (hdr->magic != TWOPHASE_MAGIC)
> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
> index 0fdd82a287..c7047738b6 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -3469,7 +3469,10 @@ XLogFileCopy(XLogSegNo destsegno, TimeLineID srcTLI, XLogSegNo srcsegno,
>                  (errcode_for_file_access(),
>                   errmsg("could not close file \"%s\": %m", tmppath)));
>
> -    CloseTransientFile(srcfd);
> +    if (CloseTransientFile(srcfd))
> +        ereport(ERROR,
> +                (errcode_for_file_access(),
> +                 errmsg("could not close file \"%s\": %m", path)));
>
>      /*
>       * Now move the segment into place with its final name.
...

This seems like an odd set of changes to me. What is this supposed to
buy us?  The commit message says:
    2) When opening transient files, it is up to the caller to close the
    file descriptors opened.  In error code paths, CloseTransientFile() gets
    called to clean up things before issuing an error.  However in normal
    exit paths, a lot of callers of CloseTransientFile() never actually
    reported errors, which could leave a file descriptor open without
    knowing about it.  This is an issue I complained about a couple of
    times, but never had the courage to write and submit a patch, so here we
    go.

but that reasoning seems bogus to me. For one, on just about any
platform close always closes the fd, even when returning an error
(unless you pass in a bad fd, in which case it obviously doesn't). So
the reasoning that this fixes unnoticed fd leaks doesn't really seem to
make sense. For another, even if it did, throwing an ERROR seems to
achieve very little: We continue with a leaked fd *AND* we cause the
operation to error out.

I can see reasoning for:
- LOG, so it can be noticed, but operations continue to work
- FATAL, to fix the leak
- PANIC, so we recover from the problem, in case of the close indicating
  a durability issue

  commit 9ccdd7f66e3324d2b6d3dec282cfa9ff084083f1
  Author: Thomas Munro <tmunro@postgresql.org>
  Date:   2018-11-19 13:31:10 +1300

      PANIC on fsync() failure.

but ERROR seems to have very little going for it.

The durability argument doesn't seem to apply for the cases where we
previously fsynced the file, a significant fraction of the locations you
touched.

And if your goal was just to achieve consistency, I also don't
understand, because you left plenty close()'s unchecked? E.g. you added
an error check in get_controlfile(), but not one in
ReadControlFile(). alterSystemSetConfigFile() writes, but you didn't add
one.

Greetings,

Andres Freund



Re: Tighten error control for OpenTransientFile/CloseTransientFile

От
Michael Paquier
Дата:
On Fri, Oct 04, 2019 at 01:39:38PM -0700, Andres Freund wrote:
> but that reasoning seems bogus to me. For one, on just about any
> platform close always closes the fd, even when returning an error
> (unless you pass in a bad fd, in which case it obviously doesn't). So
> the reasoning that this fixes unnoticed fd leaks doesn't really seem to
> make sense. For another, even if it did, throwing an ERROR seems to
> achieve very little: We continue with a leaked fd *AND* we cause the
> operation to error out.

I have read again a couple of times the commit log, and this mentions
to let users know that a fd is leaking, not that it fixes things.
Still we get to know about it, while previously it was not possible.
In some cases we may see errors in close() after a previous write(2).
Of course this does not apply to all the code paths patched here, but
it seems to me that's a good habit to spread, no?

> I can see reasoning for:
> - LOG, so it can be noticed, but operations continue to work
> - FATAL, to fix the leak
> - PANIC, so we recover from the problem, in case of the close indicating
>   a durability issue

LOG or WARNING would not be visible enough and would likely be skipped
by users.  Not sure that this justifies a FATAL either, and PANIC
would cause more harm than necessary, so for most of them ERROR
sounded like a good compromise, still the elevel choice is not
innocent depending on the code paths patched, because the elevel used
is consistent with the error handling of the surroundings.

> And if your goal was just to achieve consistency, I also don't
> understand, because you left plenty close()'s unchecked? E.g. you added
> an error check in get_controlfile(), but not one in
> ReadControlFile(). alterSystemSetConfigFile() writes, but you didn't add
> one.

Because I have not considered these when looking at transient files.
That may be worth an extra lookup.
--
Michael

Вложения