Обсуждение: Re: basic_archive lost archive_directory

Поиск
Список
Период
Сортировка

Re: basic_archive lost archive_directory

От
Álvaro Herrera
Дата:
On 2026-Jan-29, Олег Самойлов wrote:

> 2026-01-23 18:46:49.970 MSK,,,12085,,697397e9.2f35,1,,2026-01-23 18:46:49
> MSK,,0,WARNING,22023,"invalid value for parameter
> ""basic_archive.archive_directory"": ""/mnt/ocean/postgres/stars/
> WAL""","Specified archive directory does not exist.",,,,,,,,"","archiver",,0
> 2026-01-23 18:57:23.589 MSK,,,12085,,697397e9.2f35,2,,2026-01-23 18:46:49
> MSK,,0,WARNING,01000,"""archive_mode"" enabled, yet archiving is not
> configured","basic_archive.archive_directory is not
> set.",,,,,,,,"","archiver",,0

Most likely, the NFS mount was lost for some reason.  Did you check the
kernel logs?  Did you ask your colleagues if they unmounted the
filesystem for laughs?  Did you search for any funny logs on the NFS
server side?

> And directory existed and it has old WAL files (before error raised).
> The basic_archive is unstable, but I don't know how to repeat this.

It's difficult to believe that it was basic_archive's fault; it's just
giving you whatever the kernel reported to a stat() call for the
directory.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"¿Qué importan los años?  Lo que realmente importa es comprobar que
a fin de cuentas la mejor edad de la vida es estar vivo"  (Mafalda)



Re: basic_archive lost archive_directory

От
Олег Самойлов
Дата:


31.01.2026 19:08, Álvaro Herrera пишет:
On 2026-Jan-29, Олег Самойлов wrote:

2026-01-23 18:46:49.970 MSK,,,12085,,697397e9.2f35,1,,2026-01-23 18:46:49
MSK,,0,WARNING,22023,"invalid value for parameter
""basic_archive.archive_directory"": ""/mnt/ocean/postgres/stars/
WAL""","Specified archive directory does not exist.",,,,,,,,"","archiver",,0
Most likely, the NFS mount was lost for some reason.  Did you check the
kernel logs?  Did you ask your colleagues if they unmounted the
filesystem for laughs?  Did you search for any funny logs on the NFS
server side?

May be this is the reason for the first message and this is not a bug. This can be with NFS.

2026-01-23 18:57:23.589 MSK,,,12085,,697397e9.2f35,2,,2026-01-23 18:46:49
MSK,,0,WARNING,01000,"""archive_mode"" enabled, yet archiving is not
configured","basic_archive.archive_directory is not
set.",,,,,,,,"","archiver",,0
And directory existed and it has old WAL files (before error raised).
The basic_archive is unstable, but I don't know how to repeat this.
It's difficult to believe that it was basic_archive's fault; it's just
giving you whatever the kernel reported to a stat() call for the
directory.
The bug is here. basic_archive lost the value and was empty. But show all; show that the value is.

Re: basic_archive lost archive_directory

От
Sergei Kornilov
Дата:
Hello

How to reproduce:

1) configure

archive_mode = on
archive_library = 'basic_archive'
basic_archive.archive_directory = '/some/path/'

2) start postgres and verify archive works
3) make this directory temporary inaccessible. NFS will give you many ways to achieve this, here just mv /some/
/some_moved/ is enough.
 
4) basic_archive will complain ERROR:  could not create file ... No such file or directory for new WAL archive
attempts
5) restart archiver process with any reason: kill it or restart postgres
6) make archive_directory accessible again: archiver process will not check the directory's existence again and
continueto complain about unconfigured archive_directory
 

Maybe it makes sense to move the directory existence check from check_archive_directory (guc check callback) to
basic_archive_configured?(attached)
 

regards, Sergei
Вложения

Re: basic_archive lost archive_directory

От
Олег Самойлов
Дата:
06.02.2026 17:25, Sergei Kornilov пишет
> Hello
>
> How to reproduce:
>
> 1) configure
>
> archive_mode = on
> archive_library = 'basic_archive'
> basic_archive.archive_directory = '/some/path/'
>
> 2) start postgres and verify archive works
> 3) make this directory temporary inaccessible. NFS will give you many ways to achieve this, here just mv /some/
/some_moved/ is enough.
 
> 4) basic_archive will complain ERROR:  could not create file ... No such file or directory for new WAL archive
attempts
> 5) restart archiver process with any reason: kill it or restart postgres
> 6) make archive_directory accessible again: archiver process will not check the directory's existence again and
continueto complain about unconfigured archive_directory
 
>
> Maybe it makes sense to move the directory existence check from check_archive_directory (guc check callback) to
basic_archive_configured?(attached)
 
>
> regards, Sergei

You described, may be, different bug or depended. I'll try again to explain.

First error was:

"invalid value for parameter
""basic_archive.archive_directory"": ""/mnt/ocean/postgres/stars/
WAL""","Specified archive directory does not exist."

And this is, may be, correct. But second, just after the first:
"""archive_mode"" enabled, yet archiving is not
configured","basic_archive.archive_directory is not
set."

The value of basic_archive.archive_directory
was erased after the first error. But it was erased only inside archive_library 'basic_archive', postgresql itself it
wasstill
 
basic_archive.archive_directory=/mnt/ocean/postgres/stars/




Re: basic_archive lost archive_directory

От
Fujii Masao
Дата:
On Fri, Feb 6, 2026 at 11:25 PM Sergei Kornilov <sk@zsrv.org> wrote:
>
> Hello
>
> How to reproduce:
>
> 1) configure
>
> archive_mode = on
> archive_library = 'basic_archive'
> basic_archive.archive_directory = '/some/path/'
>
> 2) start postgres and verify archive works
> 3) make this directory temporary inaccessible. NFS will give you many ways to achieve this, here just mv /some/
/some_moved/ is enough. 
> 4) basic_archive will complain ERROR:  could not create file ... No such file or directory for new WAL archive
attempts
> 5) restart archiver process with any reason: kill it or restart postgres
> 6) make archive_directory accessible again: archiver process will not check the directory's existence again and
continueto complain about unconfigured archive_directory 

Yes, this issue can last until, for example, the configuration is reloaded and
archive_directory is set again. I agree this needs to be addressed.


> Maybe it makes sense to move the directory existence check from check_archive_directory (guc check callback) to
basic_archive_configured?(attached) 

Basically I like the idea of moving the checks for archive_directory from
check_archive_directory() to basic_archive_configured(). This would not only
address this issue, but also other problems caused by performing these checks
in the GUC check hook.

basic_archive is usually loaded only by the archiver via archive_library.
In that case, errors reported by check_archive_directory() are not logged
by default, since GUC check hook errors are normally emitted only by
the postmaster. As a result, misconfigurations (e.g., a non-existent
archive_directory) may go unnoticed, which is problematic for users.

Moving the checks currently done in check_archive_directory() to
basic_archive_configured() would resolve this, which is one reason
I like your proposal.

In your patch, only the existence check is moved to basic_archive_configured().
Would it also make sense to move the filename length check there? If so,
we could potentially remove the check_archive_directory GUC check hook entirely.

Regards,

--
Fujii Masao



Re: basic_archive lost archive_directory

От
Sergei Kornilov
Дата:
Hello!

> In your patch, only the existence check is moved to basic_archive_configured().
> Would it also make sense to move the filename length check there? If so,
> we could potentially remove the check_archive_directory GUC check hook entirely.

Indeed.

I'm not sure, do I need to keep the check_archive_directory declaration for backport branches? I will split the patch
intotwo parts.
 

regards, Sergei
Вложения

Re: basic_archive lost archive_directory

От
Nathan Bossart
Дата:
On Tue, Feb 10, 2026 at 02:46:39AM +0900, Fujii Masao wrote:
> Basically I like the idea of moving the checks for archive_directory from
> check_archive_directory() to basic_archive_configured(). This would not only
> address this issue, but also other problems caused by performing these checks
> in the GUC check hook.

Note that the check_configured_cb is called for every segment to archive.
That means we'd be calling stat() much more, which seems like unnecessary
overhead to me.  And we still need to be prepared for the archive directory
to disappear at any time.  I'm wondering if it would be better to simply
remove this archive directory existence check from the check_configured_cb.

-- 
nathan



Re: basic_archive lost archive_directory

От
Nathan Bossart
Дата:
On Tue, Feb 10, 2026 at 02:46:39AM +0900, Fujii Masao wrote:
> basic_archive is usually loaded only by the archiver via archive_library.
> In that case, errors reported by check_archive_directory() are not logged
> by default, since GUC check hook errors are normally emitted only by
> the postmaster. As a result, misconfigurations (e.g., a non-existent
> archive_directory) may go unnoticed, which is problematic for users.

I don't think this is true.  With default parameters, I see the following
in my logs with a misconfigured archive directory setting:

2026-02-09 15:53:10.372 CST [12803] WARNING:  invalid value for parameter "basic_archive.archive_directory":
"/does/not/exist"
2026-02-09 15:53:10.372 CST [12803] DETAIL:  Specified archive directory does not exist.

-- 
nathan



Re: basic_archive lost archive_directory

От
Nathan Bossart
Дата:
On Mon, Feb 09, 2026 at 03:48:05PM -0600, Nathan Bossart wrote:
> I'm wondering if it would be better to simply remove this archive
> directory existence check from the check_configured_cb.

s/check_configured_cb/check_archive_directory

-- 
nathan



Re: basic_archive lost archive_directory

От
Fujii Masao
Дата:
On Tue, Feb 10, 2026 at 6:58 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Tue, Feb 10, 2026 at 02:46:39AM +0900, Fujii Masao wrote:
> > basic_archive is usually loaded only by the archiver via archive_library.
> > In that case, errors reported by check_archive_directory() are not logged
> > by default, since GUC check hook errors are normally emitted only by
> > the postmaster. As a result, misconfigurations (e.g., a non-existent
> > archive_directory) may go unnoticed, which is problematic for users.
>
> I don't think this is true.  With default parameters, I see the following
> in my logs with a misconfigured archive directory setting:
>
> 2026-02-09 15:53:10.372 CST [12803] WARNING:  invalid value for parameter "basic_archive.archive_directory":
"/does/not/exist"
> 2026-02-09 15:53:10.372 CST [12803] DETAIL:  Specified archive directory does not exist.

You're right if an invalid value for basic_archive.archive_directory is detected
at server startup. However, when the setting is changed and the configuration
is reloaded, the default behavior does not emit an error log.
Please see the steps below.

-----------------------------------
initdb -D data
mkdir arch
cat <<EOF >> data/postgresql.conf
archive_mode = on
archive_library = 'basic_archive'
basic_archive.archive_directory = '../arch'
EOF
pg_ctl -D data start
echo "basic_archive.archive_directory = 'not_exists'" >> data/postgresql.conf
pg_ctl -D data reload
-----------------------------------

With these steps, the only log messages I see are:

-----------------------------------
LOG:  received SIGHUP, reloading configuration files
LOG:  parameter "basic_archive.archive_directory" changed to "not_exists"
-----------------------------------

BTW, if basic_archive is specified in shared_preload_libraries, the same steps
produce:

-----------------------------------
LOG:  invalid value for parameter "basic_archive.archive_directory":
"not_exists"
DETAIL:  Specified archive directory does not exist.
LOG:  configuration file "/hoge/data/postgresql.conf" contains errors;
unaffected changes were applied
-----------------------------------

Similarly, lowering the archiver log level to DEBUG3 (for example,
via log_min_messages = 'warning,archiver:debug3') also results in:

-----------------------------------
DEBUG:  invalid value for parameter "basic_archive.archive_directory":
"not_exists"
DETAIL:  Specified archive directory does not exist.
DEBUG:  configuration file
"/System/Volumes/Data/dav/head-pgsql/data/postgresql.conf" contains
errors; unaffected changes were applied
-----------------------------------

This illustrates that, with default settings, the error can go unnoticed
on reload.

Regards,

--
Fujii Masao



Re: basic_archive lost archive_directory

От
Nathan Bossart
Дата:
On Tue, Feb 10, 2026 at 10:23:02AM +0900, Fujii Masao wrote:
> You're right if an invalid value for basic_archive.archive_directory is detected
> at server startup. However, when the setting is changed and the configuration
> is reloaded, the default behavior does not emit an error log.

Oh, I see.  Even so, I don't think this problem is best fixed by moving
code from the GUC check hook to the archive module check callback.  For
starters, since the archive module check callback is called for each file
to archive, this adds unnecessary overhead to repeatedly verify things that
really shouldn't change as long as the GUC's value stays the same.  In
practice, the check callback is mostly meant to ensure the user hasn't
temporarily halted archiving, as per the following note in basic_archive's
documentation:

    The default is an empty string, which effectively halts WAL archiving,
    but if archive_mode is enabled, the server will accumulate WAL segment
    files in the expectation that a value will soon be provided.

Furthermore, fixing this problem in basic_archive doesn't fix it for other
archive modules that use GUC check hooks.  I think it should be fixed more
generally, perhaps by bumping up the log level in the archiver when the
GUC's prefix matches the archive_library.

I also want to push back a bit on the idea that basic_archive not working
when the directory is missing at startup is a bug.  The documentation for
basic_archive clearly states that the directory must already exist.  That
being said, I have no objection to making basic_archive more lenient or
even to back-patching such a change.  As I mentioned upthread, IMHO we
should simply remove the existence check from the GUC check hook.
basic_archive must already be written to handle the archive directory
disappearing at any moment, so we should be able to rely on it without the
extra stat().

-- 
nathan



Re: basic_archive lost archive_directory

От
Nathan Bossart
Дата:
On Tue, Feb 10, 2026 at 10:06:25AM -0600, Nathan Bossart wrote:
> As I mentioned upthread, IMHO we should simply remove the existence check
> from the GUC check hook.  basic_archive must already be written to handle
> the archive directory disappearing at any moment, so we should be able to
> rely on it without the extra stat().

Concretely, like the attached.

-- 
nathan

Вложения

Re: basic_archive lost archive_directory

От
Nathan Bossart
Дата:
On Wed, Feb 11, 2026 at 03:14:15PM -0600, Nathan Bossart wrote:
> On Tue, Feb 10, 2026 at 10:06:25AM -0600, Nathan Bossart wrote:
>> As I mentioned upthread, IMHO we should simply remove the existence check
>> from the GUC check hook.  basic_archive must already be written to handle
>> the archive directory disappearing at any moment, so we should be able to
>> rely on it without the extra stat().
> 
> Concretely, like the attached.

Any thoughts?  I'll wait another week or so, but then will proceed with
committing/back-patching this.

-- 
nathan



Re: basic_archive lost archive_directory

От
Tom Lane
Дата:
Nathan Bossart <nathandbossart@gmail.com> writes:
> On Wed, Feb 11, 2026 at 03:14:15PM -0600, Nathan Bossart wrote:
>> On Tue, Feb 10, 2026 at 10:06:25AM -0600, Nathan Bossart wrote:
>>> As I mentioned upthread, IMHO we should simply remove the existence check
>>> from the GUC check hook.  basic_archive must already be written to handle
>>> the archive directory disappearing at any moment, so we should be able to
>>> rely on it without the extra stat().

> Any thoughts?  I'll wait another week or so, but then will proceed with
> committing/back-patching this.

Agreed --- this seems like a pretty inappropriate check to be making
in a check_hook.

            regards, tom lane



Re: basic_archive lost archive_directory

От
Fujii Masao
Дата:
On Wed, Feb 11, 2026 at 1:06 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Tue, Feb 10, 2026 at 10:23:02AM +0900, Fujii Masao wrote:
> > You're right if an invalid value for basic_archive.archive_directory is detected
> > at server startup. However, when the setting is changed and the configuration
> > is reloaded, the default behavior does not emit an error log.
>
> Oh, I see.  Even so, I don't think this problem is best fixed by moving
> code from the GUC check hook to the archive module check callback.  For
> starters, since the archive module check callback is called for each file
> to archive, this adds unnecessary overhead to repeatedly verify things that
> really shouldn't change as long as the GUC's value stays the same.

Yes, so if we had an archive module callback invoked just after
ProcessConfigFile(), we could use it to check the archive directory only
when needed. However, introducing a new callback isn't suitable for
back branches.


> I also want to push back a bit on the idea that basic_archive not working
> when the directory is missing at startup is a bug.  The documentation for
> basic_archive clearly states that the directory must already exist.  That
> being said, I have no objection to making basic_archive more lenient or
> even to back-patching such a change.  As I mentioned upthread, IMHO we
> should simply remove the existence check from the GUC check hook.

I agree with removing the check from the GUC check hook. My only concern is
that simply removing it (rather than relocating it) changes the error message
users see when archive_directory is misconfigured, which may make
troubleshooting WAL archiving failures slightly harder.

[Current]
WARNING:  invalid value for parameter
"basic_archive.archive_directory": "not_exists"
DETAIL:  Specified archive directory does not exist.

[With the patch]
ERROR:  could not create file
"not_exists/archtemp.00000001000000000000000E.80107.1771905339058": No
such file or directory

One option would be to perform the check in check_configured_cb(),
but as you noted, that would add an extra stat() call for each WAL archiving.
If that overhead is unacceptable, another approach would be to wrap
copy_file() in basic_archive_file() with PG_TRY() / PG_CATCH(). On error,
we could stat() the archive directory and report a clearer reason if it does
not exist.

That said, if users are generally fine with the "could not create file" error,
I'm ok with the proposed patch (i.e., just removing the check).

Regards,

--
Fujii Masao



Re: basic_archive lost archive_directory

От
Sergei Kornilov
Дата:
Hello

Honestly, I don't think that only 8 stat calls per each 1Gbps of WAL writes is too much. I doubt basic_archive will be
usedin systems where there's a lot of writing. Although there is little benefit from this stat call, I agree.
 

> That said, if users are generally fine with the "could not create file" error,
> I'm ok with the proposed patch (i.e., just removing the check).

I'm ok with this error message.

regards, Sergei



Re: basic_archive lost archive_directory

От
Nathan Bossart
Дата:
On Tue, Feb 24, 2026 at 01:28:50PM +0900, Fujii Masao wrote:
> I agree with removing the check from the GUC check hook. My only concern is
> that simply removing it (rather than relocating it) changes the error message
> users see when archive_directory is misconfigured, which may make
> troubleshooting WAL archiving failures slightly harder.
> 
> [Current]
> WARNING:  invalid value for parameter
> "basic_archive.archive_directory": "not_exists"
> DETAIL:  Specified archive directory does not exist.
> 
> [With the patch]
> ERROR:  could not create file
> "not_exists/archtemp.00000001000000000000000E.80107.1771905339058": No
> such file or directory
> 
> One option would be to perform the check in check_configured_cb(),
> but as you noted, that would add an extra stat() call for each WAL archiving.
> If that overhead is unacceptable, another approach would be to wrap
> copy_file() in basic_archive_file() with PG_TRY() / PG_CATCH(). On error,
> we could stat() the archive directory and report a clearer reason if it does
> not exist.
> 
> That said, if users are generally fine with the "could not create file" error,
> I'm ok with the proposed patch (i.e., just removing the check).

I think it's fine as-is.  If something is wrong with the error message,
IMHO we should fix the error message.  Adding extra stat() calls to try to
give a nicer message might work most of the time, but there are still race
conditions where users will see the original one.  But in any case, I
believe the current message style is used in many places, so I don't see a
strong reason to do anything different here.

-- 
nathan



Re: basic_archive lost archive_directory

От
Fujii Masao
Дата:
On Wed, Feb 25, 2026 at 2:34 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Tue, Feb 24, 2026 at 01:28:50PM +0900, Fujii Masao wrote:
> > I agree with removing the check from the GUC check hook. My only concern is
> > that simply removing it (rather than relocating it) changes the error message
> > users see when archive_directory is misconfigured, which may make
> > troubleshooting WAL archiving failures slightly harder.
> >
> > [Current]
> > WARNING:  invalid value for parameter
> > "basic_archive.archive_directory": "not_exists"
> > DETAIL:  Specified archive directory does not exist.
> >
> > [With the patch]
> > ERROR:  could not create file
> > "not_exists/archtemp.00000001000000000000000E.80107.1771905339058": No
> > such file or directory
> >
> > One option would be to perform the check in check_configured_cb(),
> > but as you noted, that would add an extra stat() call for each WAL archiving.
> > If that overhead is unacceptable, another approach would be to wrap
> > copy_file() in basic_archive_file() with PG_TRY() / PG_CATCH(). On error,
> > we could stat() the archive directory and report a clearer reason if it does
> > not exist.
> >
> > That said, if users are generally fine with the "could not create file" error,
> > I'm ok with the proposed patch (i.e., just removing the check).
>
> I think it's fine as-is.  If something is wrong with the error message,
> IMHO we should fix the error message.  Adding extra stat() calls to try to
> give a nicer message might work most of the time, but there are still race
> conditions where users will see the original one.  But in any case, I
> believe the current message style is used in many places, so I don't see a
> strong reason to do anything different here.

Ok, so since I seem to be the only one with some reservations about
the message, I'm fine with proceeding with the proposed change.

Regards,

--
Fujii Masao



Re: basic_archive lost archive_directory

От
Nathan Bossart
Дата:
Committed.

-- 
nathan