Обсуждение: Re: basic_archive lost archive_directory
On 2026-Jan-29, Олег Самойлов wrote: > 2026-01-23 18:46:49.970 MSK,,,12085,,697397e9.2f35,1,,2026-01-23 18:46:49 > MSK,,0,WARNING,22023,"invalid value for parameter > ""basic_archive.archive_directory"": ""/mnt/ocean/postgres/stars/ > WAL""","Specified archive directory does not exist.",,,,,,,,"","archiver",,0 > 2026-01-23 18:57:23.589 MSK,,,12085,,697397e9.2f35,2,,2026-01-23 18:46:49 > MSK,,0,WARNING,01000,"""archive_mode"" enabled, yet archiving is not > configured","basic_archive.archive_directory is not > set.",,,,,,,,"","archiver",,0 Most likely, the NFS mount was lost for some reason. Did you check the kernel logs? Did you ask your colleagues if they unmounted the filesystem for laughs? Did you search for any funny logs on the NFS server side? > And directory existed and it has old WAL files (before error raised). > The basic_archive is unstable, but I don't know how to repeat this. It's difficult to believe that it was basic_archive's fault; it's just giving you whatever the kernel reported to a stat() call for the directory. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "¿Qué importan los años? Lo que realmente importa es comprobar que a fin de cuentas la mejor edad de la vida es estar vivo" (Mafalda)
31.01.2026 19:08, Álvaro Herrera пишет:
On 2026-Jan-29, Олег Самойлов wrote:2026-01-23 18:46:49.970 MSK,,,12085,,697397e9.2f35,1,,2026-01-23 18:46:49 MSK,,0,WARNING,22023,"invalid value for parameter ""basic_archive.archive_directory"": ""/mnt/ocean/postgres/stars/ WAL""","Specified archive directory does not exist.",,,,,,,,"","archiver",,0
Most likely, the NFS mount was lost for some reason. Did you check the kernel logs? Did you ask your colleagues if they unmounted the filesystem for laughs? Did you search for any funny logs on the NFS server side?
May be this is the reason for the first message and this is not a bug. This can be with NFS.
2026-01-23 18:57:23.589 MSK,,,12085,,697397e9.2f35,2,,2026-01-23 18:46:49 MSK,,0,WARNING,01000,"""archive_mode"" enabled, yet archiving is not configured","basic_archive.archive_directory is not set.",,,,,,,,"","archiver",,0
The bug is here. basic_archive lost the value and was empty. But show all; show that the value is.And directory existed and it has old WAL files (before error raised). The basic_archive is unstable, but I don't know how to repeat this.It's difficult to believe that it was basic_archive's fault; it's just giving you whatever the kernel reported to a stat() call for the directory.
Hello How to reproduce: 1) configure archive_mode = on archive_library = 'basic_archive' basic_archive.archive_directory = '/some/path/' 2) start postgres and verify archive works 3) make this directory temporary inaccessible. NFS will give you many ways to achieve this, here just mv /some/ /some_moved/ is enough. 4) basic_archive will complain ERROR: could not create file ... No such file or directory for new WAL archive attempts 5) restart archiver process with any reason: kill it or restart postgres 6) make archive_directory accessible again: archiver process will not check the directory's existence again and continueto complain about unconfigured archive_directory Maybe it makes sense to move the directory existence check from check_archive_directory (guc check callback) to basic_archive_configured?(attached) regards, Sergei
Вложения
06.02.2026 17:25, Sergei Kornilov пишет > Hello > > How to reproduce: > > 1) configure > > archive_mode = on > archive_library = 'basic_archive' > basic_archive.archive_directory = '/some/path/' > > 2) start postgres and verify archive works > 3) make this directory temporary inaccessible. NFS will give you many ways to achieve this, here just mv /some/ /some_moved/ is enough. > 4) basic_archive will complain ERROR: could not create file ... No such file or directory for new WAL archive attempts > 5) restart archiver process with any reason: kill it or restart postgres > 6) make archive_directory accessible again: archiver process will not check the directory's existence again and continueto complain about unconfigured archive_directory > > Maybe it makes sense to move the directory existence check from check_archive_directory (guc check callback) to basic_archive_configured?(attached) > > regards, Sergei You described, may be, different bug or depended. I'll try again to explain. First error was: "invalid value for parameter ""basic_archive.archive_directory"": ""/mnt/ocean/postgres/stars/ WAL""","Specified archive directory does not exist." And this is, may be, correct. But second, just after the first: """archive_mode"" enabled, yet archiving is not configured","basic_archive.archive_directory is not set." The value of basic_archive.archive_directory was erased after the first error. But it was erased only inside archive_library 'basic_archive', postgresql itself it wasstill basic_archive.archive_directory=/mnt/ocean/postgres/stars/
On Fri, Feb 6, 2026 at 11:25 PM Sergei Kornilov <sk@zsrv.org> wrote: > > Hello > > How to reproduce: > > 1) configure > > archive_mode = on > archive_library = 'basic_archive' > basic_archive.archive_directory = '/some/path/' > > 2) start postgres and verify archive works > 3) make this directory temporary inaccessible. NFS will give you many ways to achieve this, here just mv /some/ /some_moved/ is enough. > 4) basic_archive will complain ERROR: could not create file ... No such file or directory for new WAL archive attempts > 5) restart archiver process with any reason: kill it or restart postgres > 6) make archive_directory accessible again: archiver process will not check the directory's existence again and continueto complain about unconfigured archive_directory Yes, this issue can last until, for example, the configuration is reloaded and archive_directory is set again. I agree this needs to be addressed. > Maybe it makes sense to move the directory existence check from check_archive_directory (guc check callback) to basic_archive_configured?(attached) Basically I like the idea of moving the checks for archive_directory from check_archive_directory() to basic_archive_configured(). This would not only address this issue, but also other problems caused by performing these checks in the GUC check hook. basic_archive is usually loaded only by the archiver via archive_library. In that case, errors reported by check_archive_directory() are not logged by default, since GUC check hook errors are normally emitted only by the postmaster. As a result, misconfigurations (e.g., a non-existent archive_directory) may go unnoticed, which is problematic for users. Moving the checks currently done in check_archive_directory() to basic_archive_configured() would resolve this, which is one reason I like your proposal. In your patch, only the existence check is moved to basic_archive_configured(). Would it also make sense to move the filename length check there? If so, we could potentially remove the check_archive_directory GUC check hook entirely. Regards, -- Fujii Masao
Hello! > In your patch, only the existence check is moved to basic_archive_configured(). > Would it also make sense to move the filename length check there? If so, > we could potentially remove the check_archive_directory GUC check hook entirely. Indeed. I'm not sure, do I need to keep the check_archive_directory declaration for backport branches? I will split the patch intotwo parts. regards, Sergei
Вложения
On Tue, Feb 10, 2026 at 02:46:39AM +0900, Fujii Masao wrote: > Basically I like the idea of moving the checks for archive_directory from > check_archive_directory() to basic_archive_configured(). This would not only > address this issue, but also other problems caused by performing these checks > in the GUC check hook. Note that the check_configured_cb is called for every segment to archive. That means we'd be calling stat() much more, which seems like unnecessary overhead to me. And we still need to be prepared for the archive directory to disappear at any time. I'm wondering if it would be better to simply remove this archive directory existence check from the check_configured_cb. -- nathan
On Tue, Feb 10, 2026 at 02:46:39AM +0900, Fujii Masao wrote: > basic_archive is usually loaded only by the archiver via archive_library. > In that case, errors reported by check_archive_directory() are not logged > by default, since GUC check hook errors are normally emitted only by > the postmaster. As a result, misconfigurations (e.g., a non-existent > archive_directory) may go unnoticed, which is problematic for users. I don't think this is true. With default parameters, I see the following in my logs with a misconfigured archive directory setting: 2026-02-09 15:53:10.372 CST [12803] WARNING: invalid value for parameter "basic_archive.archive_directory": "/does/not/exist" 2026-02-09 15:53:10.372 CST [12803] DETAIL: Specified archive directory does not exist. -- nathan
On Mon, Feb 09, 2026 at 03:48:05PM -0600, Nathan Bossart wrote: > I'm wondering if it would be better to simply remove this archive > directory existence check from the check_configured_cb. s/check_configured_cb/check_archive_directory -- nathan
On Tue, Feb 10, 2026 at 6:58 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Tue, Feb 10, 2026 at 02:46:39AM +0900, Fujii Masao wrote: > > basic_archive is usually loaded only by the archiver via archive_library. > > In that case, errors reported by check_archive_directory() are not logged > > by default, since GUC check hook errors are normally emitted only by > > the postmaster. As a result, misconfigurations (e.g., a non-existent > > archive_directory) may go unnoticed, which is problematic for users. > > I don't think this is true. With default parameters, I see the following > in my logs with a misconfigured archive directory setting: > > 2026-02-09 15:53:10.372 CST [12803] WARNING: invalid value for parameter "basic_archive.archive_directory": "/does/not/exist" > 2026-02-09 15:53:10.372 CST [12803] DETAIL: Specified archive directory does not exist. You're right if an invalid value for basic_archive.archive_directory is detected at server startup. However, when the setting is changed and the configuration is reloaded, the default behavior does not emit an error log. Please see the steps below. ----------------------------------- initdb -D data mkdir arch cat <<EOF >> data/postgresql.conf archive_mode = on archive_library = 'basic_archive' basic_archive.archive_directory = '../arch' EOF pg_ctl -D data start echo "basic_archive.archive_directory = 'not_exists'" >> data/postgresql.conf pg_ctl -D data reload ----------------------------------- With these steps, the only log messages I see are: ----------------------------------- LOG: received SIGHUP, reloading configuration files LOG: parameter "basic_archive.archive_directory" changed to "not_exists" ----------------------------------- BTW, if basic_archive is specified in shared_preload_libraries, the same steps produce: ----------------------------------- LOG: invalid value for parameter "basic_archive.archive_directory": "not_exists" DETAIL: Specified archive directory does not exist. LOG: configuration file "/hoge/data/postgresql.conf" contains errors; unaffected changes were applied ----------------------------------- Similarly, lowering the archiver log level to DEBUG3 (for example, via log_min_messages = 'warning,archiver:debug3') also results in: ----------------------------------- DEBUG: invalid value for parameter "basic_archive.archive_directory": "not_exists" DETAIL: Specified archive directory does not exist. DEBUG: configuration file "/System/Volumes/Data/dav/head-pgsql/data/postgresql.conf" contains errors; unaffected changes were applied ----------------------------------- This illustrates that, with default settings, the error can go unnoticed on reload. Regards, -- Fujii Masao
On Tue, Feb 10, 2026 at 10:23:02AM +0900, Fujii Masao wrote:
> You're right if an invalid value for basic_archive.archive_directory is detected
> at server startup. However, when the setting is changed and the configuration
> is reloaded, the default behavior does not emit an error log.
Oh, I see. Even so, I don't think this problem is best fixed by moving
code from the GUC check hook to the archive module check callback. For
starters, since the archive module check callback is called for each file
to archive, this adds unnecessary overhead to repeatedly verify things that
really shouldn't change as long as the GUC's value stays the same. In
practice, the check callback is mostly meant to ensure the user hasn't
temporarily halted archiving, as per the following note in basic_archive's
documentation:
The default is an empty string, which effectively halts WAL archiving,
but if archive_mode is enabled, the server will accumulate WAL segment
files in the expectation that a value will soon be provided.
Furthermore, fixing this problem in basic_archive doesn't fix it for other
archive modules that use GUC check hooks. I think it should be fixed more
generally, perhaps by bumping up the log level in the archiver when the
GUC's prefix matches the archive_library.
I also want to push back a bit on the idea that basic_archive not working
when the directory is missing at startup is a bug. The documentation for
basic_archive clearly states that the directory must already exist. That
being said, I have no objection to making basic_archive more lenient or
even to back-patching such a change. As I mentioned upthread, IMHO we
should simply remove the existence check from the GUC check hook.
basic_archive must already be written to handle the archive directory
disappearing at any moment, so we should be able to rely on it without the
extra stat().
--
nathan
On Tue, Feb 10, 2026 at 10:06:25AM -0600, Nathan Bossart wrote: > As I mentioned upthread, IMHO we should simply remove the existence check > from the GUC check hook. basic_archive must already be written to handle > the archive directory disappearing at any moment, so we should be able to > rely on it without the extra stat(). Concretely, like the attached. -- nathan
Вложения
On Wed, Feb 11, 2026 at 03:14:15PM -0600, Nathan Bossart wrote: > On Tue, Feb 10, 2026 at 10:06:25AM -0600, Nathan Bossart wrote: >> As I mentioned upthread, IMHO we should simply remove the existence check >> from the GUC check hook. basic_archive must already be written to handle >> the archive directory disappearing at any moment, so we should be able to >> rely on it without the extra stat(). > > Concretely, like the attached. Any thoughts? I'll wait another week or so, but then will proceed with committing/back-patching this. -- nathan
Nathan Bossart <nathandbossart@gmail.com> writes:
> On Wed, Feb 11, 2026 at 03:14:15PM -0600, Nathan Bossart wrote:
>> On Tue, Feb 10, 2026 at 10:06:25AM -0600, Nathan Bossart wrote:
>>> As I mentioned upthread, IMHO we should simply remove the existence check
>>> from the GUC check hook. basic_archive must already be written to handle
>>> the archive directory disappearing at any moment, so we should be able to
>>> rely on it without the extra stat().
> Any thoughts? I'll wait another week or so, but then will proceed with
> committing/back-patching this.
Agreed --- this seems like a pretty inappropriate check to be making
in a check_hook.
regards, tom lane
On Wed, Feb 11, 2026 at 1:06 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Tue, Feb 10, 2026 at 10:23:02AM +0900, Fujii Masao wrote: > > You're right if an invalid value for basic_archive.archive_directory is detected > > at server startup. However, when the setting is changed and the configuration > > is reloaded, the default behavior does not emit an error log. > > Oh, I see. Even so, I don't think this problem is best fixed by moving > code from the GUC check hook to the archive module check callback. For > starters, since the archive module check callback is called for each file > to archive, this adds unnecessary overhead to repeatedly verify things that > really shouldn't change as long as the GUC's value stays the same. Yes, so if we had an archive module callback invoked just after ProcessConfigFile(), we could use it to check the archive directory only when needed. However, introducing a new callback isn't suitable for back branches. > I also want to push back a bit on the idea that basic_archive not working > when the directory is missing at startup is a bug. The documentation for > basic_archive clearly states that the directory must already exist. That > being said, I have no objection to making basic_archive more lenient or > even to back-patching such a change. As I mentioned upthread, IMHO we > should simply remove the existence check from the GUC check hook. I agree with removing the check from the GUC check hook. My only concern is that simply removing it (rather than relocating it) changes the error message users see when archive_directory is misconfigured, which may make troubleshooting WAL archiving failures slightly harder. [Current] WARNING: invalid value for parameter "basic_archive.archive_directory": "not_exists" DETAIL: Specified archive directory does not exist. [With the patch] ERROR: could not create file "not_exists/archtemp.00000001000000000000000E.80107.1771905339058": No such file or directory One option would be to perform the check in check_configured_cb(), but as you noted, that would add an extra stat() call for each WAL archiving. If that overhead is unacceptable, another approach would be to wrap copy_file() in basic_archive_file() with PG_TRY() / PG_CATCH(). On error, we could stat() the archive directory and report a clearer reason if it does not exist. That said, if users are generally fine with the "could not create file" error, I'm ok with the proposed patch (i.e., just removing the check). Regards, -- Fujii Masao
Hello Honestly, I don't think that only 8 stat calls per each 1Gbps of WAL writes is too much. I doubt basic_archive will be usedin systems where there's a lot of writing. Although there is little benefit from this stat call, I agree. > That said, if users are generally fine with the "could not create file" error, > I'm ok with the proposed patch (i.e., just removing the check). I'm ok with this error message. regards, Sergei
On Tue, Feb 24, 2026 at 01:28:50PM +0900, Fujii Masao wrote: > I agree with removing the check from the GUC check hook. My only concern is > that simply removing it (rather than relocating it) changes the error message > users see when archive_directory is misconfigured, which may make > troubleshooting WAL archiving failures slightly harder. > > [Current] > WARNING: invalid value for parameter > "basic_archive.archive_directory": "not_exists" > DETAIL: Specified archive directory does not exist. > > [With the patch] > ERROR: could not create file > "not_exists/archtemp.00000001000000000000000E.80107.1771905339058": No > such file or directory > > One option would be to perform the check in check_configured_cb(), > but as you noted, that would add an extra stat() call for each WAL archiving. > If that overhead is unacceptable, another approach would be to wrap > copy_file() in basic_archive_file() with PG_TRY() / PG_CATCH(). On error, > we could stat() the archive directory and report a clearer reason if it does > not exist. > > That said, if users are generally fine with the "could not create file" error, > I'm ok with the proposed patch (i.e., just removing the check). I think it's fine as-is. If something is wrong with the error message, IMHO we should fix the error message. Adding extra stat() calls to try to give a nicer message might work most of the time, but there are still race conditions where users will see the original one. But in any case, I believe the current message style is used in many places, so I don't see a strong reason to do anything different here. -- nathan
On Wed, Feb 25, 2026 at 2:34 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Tue, Feb 24, 2026 at 01:28:50PM +0900, Fujii Masao wrote: > > I agree with removing the check from the GUC check hook. My only concern is > > that simply removing it (rather than relocating it) changes the error message > > users see when archive_directory is misconfigured, which may make > > troubleshooting WAL archiving failures slightly harder. > > > > [Current] > > WARNING: invalid value for parameter > > "basic_archive.archive_directory": "not_exists" > > DETAIL: Specified archive directory does not exist. > > > > [With the patch] > > ERROR: could not create file > > "not_exists/archtemp.00000001000000000000000E.80107.1771905339058": No > > such file or directory > > > > One option would be to perform the check in check_configured_cb(), > > but as you noted, that would add an extra stat() call for each WAL archiving. > > If that overhead is unacceptable, another approach would be to wrap > > copy_file() in basic_archive_file() with PG_TRY() / PG_CATCH(). On error, > > we could stat() the archive directory and report a clearer reason if it does > > not exist. > > > > That said, if users are generally fine with the "could not create file" error, > > I'm ok with the proposed patch (i.e., just removing the check). > > I think it's fine as-is. If something is wrong with the error message, > IMHO we should fix the error message. Adding extra stat() calls to try to > give a nicer message might work most of the time, but there are still race > conditions where users will see the original one. But in any case, I > believe the current message style is used in many places, so I don't see a > strong reason to do anything different here. Ok, so since I seem to be the only one with some reservations about the message, I'm fine with proceeding with the proposed change. Regards, -- Fujii Masao
Committed. -- nathan