Обсуждение: BUG #16526: pg_test_fsync in v12 doesn't run in Windows
The following bug has been logged on the website: Bug reference: 16526 Logged by: Jeff Janes Email address: jeff.janes@gmail.com PostgreSQL version: 12.3 Operating system: windows 2019 Description: When I install PG12 on Windows 10 using the EDB installer, pg_test_fsync doesn't work. But I notice Windows 10 is not officially supported, so I repeated the exercise with EDB installer on a rented box: Microsoft Windows Server 2019 with Desktop Experience Locale English AMI provided by Amazon And it still doesn't work, see capture from CMD.exe done immediately after running the EDB installer: Microsoft Windows [Version 10.0.17763.1282] (c) 2018 Microsoft Corporation. All rights reserved. C:\Users\Administrator>"c:\Program Files\PostgreSQL\12\bin\pg_test_fsync.exe" 5 seconds per test O_DIRECT supported on this platform for open_datasync and open_sync. Compare file sync methods using one 8kB write: (in wal_sync_method preference order, except fdatasync is Linux's default) open_datasync pg_test_fsync: error: write failed: Invalid argument Note that it abandons the run here it does not just report an error and then continue. I'm not in a position to compile on Windows, so I can't run git bisect nor tell if this is something specific to EDB. But I suspect this rather has something to do with commit 0ba06e0bfb8cfd24ff17aca92aa72245ddd6c4d7 On v11, this does work. Or at least it runs to completion, I guess the numbers reported might be bogus, which is why the indicated commit was done.
On Fri, Jul 03, 2020 at 04:01:46PM +0000, PG Bug reporting form wrote: > Note that it abandons the run here it does not just report an error and then > continue. > I'm not in a position to compile on Windows, so I can't run git bisect nor > tell if this is something specific to EDB. But I suspect this rather has > something to do with commit 0ba06e0bfb8cfd24ff17aca92aa72245ddd6c4d7 I think that the error actually comes from 40cfe86, where we enforce O_TEXT in our port of open() for Windows so as the switch from WIN32's open() to our concurrent-safe version remains compatible. O_TEXT and FILE_FLAG_NO_BUFFERING are visibly incompatible together, and enforcing the call of open() in pg_test_fsync to use O_BINARY (actually PG_BINARY) when testing open_datasync() allows things to go through. So we are one call short of _setmode(), but it is more simple to pass down the flag when opening the fd. It may actually be safer in the long run to add the binary flag to all the calls of open() in pg_test_fsync. Thoughts? -- Michael
Вложения
On Sat, Jul 04, 2020 at 10:03:56AM +0900, Michael Paquier wrote: > I think that the error actually comes from 40cfe86, where we enforce > O_TEXT in our port of open() for Windows so as the switch from WIN32's > open() to our concurrent-safe version remains compatible. O_TEXT and > FILE_FLAG_NO_BUFFERING are visibly incompatible together, and > enforcing the call of open() in pg_test_fsync to use O_BINARY > (actually PG_BINARY) when testing open_datasync() allows things to go > through. So we are one call short of _setmode(), but it is more > simple to pass down the flag when opening the fd. It may actually be > safer in the long run to add the binary flag to all the calls of > open() in pg_test_fsync. Thoughts? So, I have been looking at that again. And, I am not completely sure why the combination of FILE_FLAG_NO_BUFFERING and _setmode(O_TEXT) would cause this failure, but I suspect that this is caused by the fact that CRLF gets changed into single LF characters on input (input buffer uses random data), causing the size of what's expected to be written to not match with what is actually written. An additional factor is visibly our wrapper safe-concurrent wrapper pgwin32_open(). It seems to me that it would be a better idea to just use the binary mode for all the open() calls in pg_test_fsync in the long run, which is what the attached patch does as there is no control on the data written. This way, we avoid any translation that may happen during the write. Any thoughts? -- Michael
Вложения
On Tue, Jul 07, 2020 at 05:25:49PM +0900, Michael Paquier wrote: > It seems to me that it would be a better idea to just use the binary > mode for all the open() calls in pg_test_fsync in the long run, which > is what the attached patch does as there is no control on the data > written. This way, we avoid any translation that may happen during > the write. Any thoughts? I would like to get a second opinion on this one, so for now I have registered this patch in the CF app: https://commitfest.postgresql.org/29/2640/ -- Michael
Вложения
On Tue, Jul 7, 2020 at 05:25:49PM +0900, Michael Paquier wrote: > So, I have been looking at that again. And, I am not completely sure > why the combination of FILE_FLAG_NO_BUFFERING and _setmode(O_TEXT) > would cause this failure, but I suspect that this is caused by the > fact that CRLF gets changed into single LF characters on input (input > buffer uses random data), causing the size of what's expected to be > written to not match with what is actually written. An additional > factor is visibly our wrapper safe-concurrent wrapper pgwin32_open(). > > It seems to me that it would be a better idea to just use the binary > mode for all the open() calls in pg_test_fsync in the long run, which > is what the attached patch does as there is no control on the data > written. This way, we avoid any translation that may happen during > the write. Any thoughts? Well, pg_test_fsync is testing binary writes, so using TEXT for the files might add unwanted overhead, so I think binary is the best approach. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
On Mon, Jul 13, 2020 at 08:06:25PM -0400, Bruce Momjian wrote: > Well, pg_test_fsync is testing binary writes, so using TEXT for the > files might add unwanted overhead, so I think binary is the best > approach. Thanks. Please note that we have switched frontend tools to use our concurrent-safe flavor of open() and fopen() as of 0ba06e0 (12 and newer versions), but 40cfe86 has shown that we have been using the text mode in pg_test_fsync since forever as Windows' open() uses the text mode by default if we don't specify _fmode with _setmode(): https://docs.microsoft.com/en-us/cpp/c-runtime-library/fmode?view=vs-2019 For this reason, it seems more sensible to me to not backpatch this change only down to 12, but actually all the way down to 9.5. -- Michael
Вложения
On Tue, Jul 14, 2020 at 12:54:58PM +0900, Michael Paquier wrote: > On Mon, Jul 13, 2020 at 08:06:25PM -0400, Bruce Momjian wrote: > > Well, pg_test_fsync is testing binary writes, so using TEXT for the > > files might add unwanted overhead, so I think binary is the best > > approach. > > Thanks. Please note that we have switched frontend tools to use our > concurrent-safe flavor of open() and fopen() as of 0ba06e0 (12 and > newer versions), but 40cfe86 has shown that we have been using the > text mode in pg_test_fsync since forever as Windows' open() uses the > text mode by default if we don't specify _fmode with _setmode(): > https://docs.microsoft.com/en-us/cpp/c-runtime-library/fmode?view=vs-2019 > > For this reason, it seems more sensible to me to not backpatch this > change only down to 12, but actually all the way down to 9.5. OK. I am kind of surprised we haven't received bug reports earlier, but I guess few Windows users run pg_test_fsync, and fewer report problems. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
On Tue, Jul 14, 2020 at 10:27:00AM -0400, Bruce Momjian wrote: > OK. I am kind of surprised we haven't received bug reports earlier, but > I guess few Windows users run pg_test_fsync, and fewer report problems. On 11 and older versions, we use O_TEXT with pg_test_fsync. This does not cause directly a failure, but those versions have as other problem to not be able to handle properly O_DSYNC, leading to tests of open_fdatasync to show extremely high and incorrect outputs: https://www.postgresql.org/message-id/1527864509.2475.49.camel@cybertec.at And it is the concurrent use of O_DSYNC with O_TEXT that's visibly causing a problem for 12~. O_DSYNC handling got addressed in v12 with 0ba06e0, and while changing back branches for ~11 to allow frontends to use our concurrent-safe version of fopen()/open() would be tempting, I don't think that fixing one test of pg_test_fsync is a reason enough to potentially risk breakages for other client tools inside, or even outside, the core tree. If there are no objections, I would like to just change the tool to use the binary mode down to 9.5. I can see that the text mode is getting used for all branches, and that's wrong as the aim is to test binary writes, basically as you said as well upthread. -- Michael
Вложения
On Wed, Jul 15, 2020 at 03:53:56PM +0900, Michael Paquier wrote: > If there are no objections, I would like to just change the tool to > use the binary mode down to 9.5. I can see that the text mode is > getting used for all branches, and that's wrong as the aim is to test > binary writes, basically as you said as well upthread. +1 -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
On Wed, Jul 15, 2020 at 12:26:53PM -0400, Bruce Momjian wrote: > On Wed, Jul 15, 2020 at 03:53:56PM +0900, Michael Paquier wrote: >> If there are no objections, I would like to just change the tool to >> use the binary mode down to 9.5. I can see that the text mode is >> getting used for all branches, and that's wrong as the aim is to test >> binary writes, basically as you said as well upthread. > > +1 Thanks. Fixed then as 932f9fb & co. -- Michael