Обсуждение: [PATCH] O_CLOEXEC not honored on Windows - handle inheritance chain
Hello, While reviewing the pg_ctl CreateProcess patch [1], I started looking into handle inheritance on Windows and found something that puzzles me. I think there's an issue with O_CLOEXEC, but wanted to walk through my reasoning to make sure I'm not missing something obvious. [1] https://www.postgresql.org/message-id/flat/TYAPR01MB586654E2D74B838021BE77CAF5EEA@TYAPR01MB5866.jpnprd01.prod.outlook.com The issue appears to be that O_CLOEXEC doesn't actually do anything on Windows. When PostgreSQL opens a WAL file in xlog.c, it specifies O_CLOEXEC in the OpenTransientFile() call, expecting the file handle to be non-inheritable. However, O_CLOEXEC is defined as 0 in win32_port.h, and our pgwin32_open() function in src/port/open.c unconditionally sets sa.bInheritHandle = TRUE at line 80. So the flag is simply ignored, and all file handles are created inheritable. Now, for a handle to actually be inherited by a child process on Windows, two conditions must both be true. First, the handle itself must have been created with bInheritHandle = TRUE (which we do for everything). Second, the parent must call CreateProcess with bInheritHandles = TRUE. So the question becomes: does that second condition ever happen? It does. When archive_command runs, PostgreSQL calls pgwin32_system(), which wraps the command in quotes and passes it to the Microsoft C runtime's system() function. That function needs to make stdio work for the child process, so it has no choice but to call CreateProcess with bInheritHandles = TRUE. This means cmd.exe inherits all our file handles, including any open database or WAL files, and cmd.exe passes them along to copy.exe or whatever command is being run. I wrote a test program that links against libpgport.a to verify this behavior. It opens files with and without O_CLOEXEC using the actual pgwin32_open() function, then spawns a child process with bInheritHandles = TRUE (mimicking what system() does) and tries to access the handles from the child. Both files were accessible from the child process regardless of whether O_CLOEXEC was specified. The flag has no effect. Commit 1da569ca1f (March 2023) added O_CLOEXEC to many call sites throughout the backend with a comment saying "Our open() replacement does not create inheritable handles, so it is safe to ignore O_CLOEXEC." But that doesn't appear to match what the code actually does. I'm wondering if I've misunderstood something about how handle inheritance works on Windows, or if the comment was based on a misunderstanding of the code path. The practical impact of this seems low. Child processes spawned by archive_command or COPY PROGRAM operate on file paths passed as arguments, not on inherited file descriptors, so they don't actually make use of the handles even though they have them. Even if a child wanted to use an inherited handle, it would need to somehow learn the numeric handle value, which isn't passed through our IPC mechanisms. And Windows users probably employ archive_command less frequently than Unix users anyway. Nonetheless, if this is really a bug, it's a correctness issue. It violates the documented semantics of O_CLOEXEC, contradicts what our own comments claim, and means PostgreSQL behaves differently on Windows than on Unix. It also creates unnecessary handle leaks where files can't be deleted or renamed while child processes are running. While reviewing my pg_ctl patch, I realized it would make handle inheritance more explicit and direct, which made me want to understand whether O_CLOEXEC actually works. The fix would be straightforward if this is indeed wrong. Define O_CLOEXEC to a non-zero value like 0x80000 (in the private use range for open() flags), and then honor it in pgwin32_open() by setting sa.bInheritHandle based on whether the flag is present: sa.bInheritHandle = (fileFlags & O_CLOEXEC) ? FALSE : TRUE; We also have to add the O_CLOEXEC to the assertion in open.c that validates that fileFlags only contains known flags. I've tested this change locally and it works as expected. Files opened with O_CLOEXEC are not accessible from child processes, while files opened without it are accessible. So my questions are: Am I correct that both conditions for handle inheritance are met, meaning handles really are being inherited by archive_command children? Is there something in Windows that prevents inheritance that I don't know about? If this is a real bug, would it make sense to backpatch to v16 where O_CLOEXEC was added? I'm happy to provide my test code or do additional testing if that would help. For reference, the Microsoft documentation on handle inheritance is at: https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-createprocessa And I confirmed through research that UCRT's system() does use bInheritHandles = TRUE, which makes sense since it needs stdio to work. Bryan Green
Вложения
On Sat, Nov 1, 2025 at 6:16 AM Bryan Green <dbryan.green@gmail.com> wrote: > Hello, Catching up with all your emails, and I must say it's great to see some solid investigation of PostgreSQL-on-Windows problems. There are ... more. > Commit 1da569ca1f (March 2023) added O_CLOEXEC to many call sites > throughout the backend with a comment saying "Our open() replacement > does not create inheritable handles, so it is safe to ignore > O_CLOEXEC." But that doesn't appear to match what the code actually > does. I'm wondering if I've misunderstood something about how handle > inheritance works on Windows, or if the comment was based on a > misunderstanding of the code path. Yeah, it looks like I was just wrong. Oops. Your analysis looks good to me. > The fix would be straightforward if this is indeed wrong. Define > O_CLOEXEC to a non-zero value like 0x80000 (in the private use range > for open() flags), and then honor it in pgwin32_open() by setting > sa.bInheritHandle based on whether the flag is present: > > sa.bInheritHandle = (fileFlags & O_CLOEXEC) ? FALSE : TRUE; Looking at fcntl.h, that's the next free bit, but also the one they'll presumably define next (I guess "private use range" is just a turn of phrase and not a real thing?), so why not use the highest free bit after O_DIRECT? We have three fake open flags, one of which cybersquats a real flag from fcntl.h, ironically the one that actually means O_CLOEXEC. We can't change existing values in released branches, so that'd give: #define O_DIRECT 0x80000000 #define O_CLOEXEC 0x04000000 #define O_DSYNC _O_NO_INHERIT Perhaps in master we could rearrange them: #define O_DIRECT 0x80000000 #define O_DSYNC 0x04000000 #define O_CLOEXEC _O_NO_INHERIT > So my questions are: Am I correct that both conditions for handle > inheritance are met, meaning handles really are being inherited by > archive_command children? Is there something in Windows that prevents > inheritance that I don't know about? If this is a real bug, would it > make sense to backpatch to v16 where O_CLOEXEC was added? I'm happy to > provide my test code or do additional testing if that would help. Yeah, seems like it, and like we should back-patch this. I vote for doing that after the upcoming minor releases. Holding files open on Windows unintentionally is worse on Windows than on Unix (preventing directories from being unlinked etc). Of course we've done that for decades so I doubt it's really a big deal, but we should clean up this mistake.
On 11/6/2025 7:43 AM, Thomas Munro wrote:
> On Sat, Nov 1, 2025 at 6:16 AM Bryan Green <dbryan.green@gmail.com> wrote:
>> Hello,
>
> Catching up with all your emails, and I must say it's great to see
> some solid investigation of PostgreSQL-on-Windows problems. There are
> ... more.
>
>> Commit 1da569ca1f (March 2023) added O_CLOEXEC to many call sites
>> throughout the backend with a comment saying "Our open() replacement
>> does not create inheritable handles, so it is safe to ignore
>> O_CLOEXEC." But that doesn't appear to match what the code actually
>> does. I'm wondering if I've misunderstood something about how handle
>> inheritance works on Windows, or if the comment was based on a
>> misunderstanding of the code path.
>
> Yeah, it looks like I was just wrong. Oops. Your analysis looks good to me.
>
>> The fix would be straightforward if this is indeed wrong. Define
>> O_CLOEXEC to a non-zero value like 0x80000 (in the private use range
>> for open() flags), and then honor it in pgwin32_open() by setting
>> sa.bInheritHandle based on whether the flag is present:
>>
>> sa.bInheritHandle = (fileFlags & O_CLOEXEC) ? FALSE : TRUE;
>
> Looking at fcntl.h, that's the next free bit, but also the one they'll
> presumably define next (I guess "private use range" is just a turn of
> phrase and not a real thing?), so why not use the highest free bit
> after O_DIRECT? We have three fake open flags, one of which
> cybersquats a real flag from fcntl.h, ironically the one that actually
> means O_CLOEXEC. We can't change existing values in released
> branches, so that'd give:
>
> #define O_DIRECT 0x80000000
> #define O_CLOEXEC 0x04000000
> #define O_DSYNC _O_NO_INHERIT
>
> Perhaps in master we could rearrange them:
>
> #define O_DIRECT 0x80000000
> #define O_DSYNC 0x04000000
> #define O_CLOEXEC _O_NO_INHERIT
>
>> So my questions are: Am I correct that both conditions for handle
>> inheritance are met, meaning handles really are being inherited by
>> archive_command children? Is there something in Windows that prevents
>> inheritance that I don't know about? If this is a real bug, would it
>> make sense to backpatch to v16 where O_CLOEXEC was added? I'm happy to
>> provide my test code or do additional testing if that would help.
>
> Yeah, seems like it, and like we should back-patch this. I vote for
> doing that after the upcoming minor releases. Holding files open on
> Windows unintentionally is worse on Windows than on Unix (preventing
> directories from being unlinked etc). Of course we've done that for
> decades so I doubt it's really a big deal, but we should clean up this
> mistake.
Thanks for reviewing this and confirming the analysis. Good to know I
wasn't missing something about Windows handle inheritance.
Your point about the bit value makes sense - using 0x04000000 (highest
free bit after O_DIRECT) is definitely safer than 0x80000 which could
collide with future fcntl.h additions. I also appreciate the irony you
pointed out - we're currently using _O_NO_INHERIT (which literally
prevents handle inheritance on Windows) for O_DSYNC instead of
O_CLOEXEC. The rearrangement in master to use _O_NO_INHERIT for what it
actually means semantically makes a lot of sense.
So the plan would be:
Backport branches (v16+):
#define O_DIRECT 0x80000000
#define O_CLOEXEC 0x04000000
#define O_DSYNC _O_NO_INHERIT
Master:
#define O_DIRECT 0x80000000
#define O_DSYNC 0x04000000
#define O_CLOEXEC _O_NO_INHERIT
And then in pgwin32_open():
sa.bInheritHandle = (fileFlags & O_CLOEXEC) ? FALSE : TRUE;
I will prepare a new version of the patch that implements the suggested
change for master.
--
Bryan Green
EDB: https://www.enterprisedb.com
On 11/6/2025 8:42 AM, Bryan Green wrote: > On 11/6/2025 7:43 AM, Thomas Munro wrote: >> On Sat, Nov 1, 2025 at 6:16 AM Bryan Green <dbryan.green@gmail.com> wrote: >>> Hello, >> >> Catching up with all your emails, and I must say it's great to see >> some solid investigation of PostgreSQL-on-Windows problems. There are >> ... more. >> >>> Commit 1da569ca1f (March 2023) added O_CLOEXEC to many call sites >>> throughout the backend with a comment saying "Our open() replacement >>> does not create inheritable handles, so it is safe to ignore >>> O_CLOEXEC." But that doesn't appear to match what the code actually >>> does. I'm wondering if I've misunderstood something about how handle >>> inheritance works on Windows, or if the comment was based on a >>> misunderstanding of the code path. >> >> Yeah, it looks like I was just wrong. Oops. Your analysis looks good to me. >> >>> The fix would be straightforward if this is indeed wrong. Define >>> O_CLOEXEC to a non-zero value like 0x80000 (in the private use range >>> for open() flags), and then honor it in pgwin32_open() by setting >>> sa.bInheritHandle based on whether the flag is present: >>> >>> sa.bInheritHandle = (fileFlags & O_CLOEXEC) ? FALSE : TRUE; >> >> Looking at fcntl.h, that's the next free bit, but also the one they'll >> presumably define next (I guess "private use range" is just a turn of >> phrase and not a real thing?), so why not use the highest free bit >> after O_DIRECT? We have three fake open flags, one of which >> cybersquats a real flag from fcntl.h, ironically the one that actually >> means O_CLOEXEC. We can't change existing values in released >> branches, so that'd give: >> >> #define O_DIRECT 0x80000000 >> #define O_CLOEXEC 0x04000000 >> #define O_DSYNC _O_NO_INHERIT >> >> Perhaps in master we could rearrange them: >> >> #define O_DIRECT 0x80000000 >> #define O_DSYNC 0x04000000 >> #define O_CLOEXEC _O_NO_INHERIT >> >>> So my questions are: Am I correct that both conditions for handle >>> inheritance are met, meaning handles really are being inherited by >>> archive_command children? Is there something in Windows that prevents >>> inheritance that I don't know about? If this is a real bug, would it >>> make sense to backpatch to v16 where O_CLOEXEC was added? I'm happy to >>> provide my test code or do additional testing if that would help. >> >> Yeah, seems like it, and like we should back-patch this. I vote for >> doing that after the upcoming minor releases. Holding files open on >> Windows unintentionally is worse on Windows than on Unix (preventing >> directories from being unlinked etc). Of course we've done that for >> decades so I doubt it's really a big deal, but we should clean up this >> mistake. > > Thanks for reviewing this and confirming the analysis. Good to know I > wasn't missing something about Windows handle inheritance. > > Your point about the bit value makes sense - using 0x04000000 (highest > free bit after O_DIRECT) is definitely safer than 0x80000 which could > collide with future fcntl.h additions. I also appreciate the irony you > pointed out - we're currently using _O_NO_INHERIT (which literally > prevents handle inheritance on Windows) for O_DSYNC instead of > O_CLOEXEC. The rearrangement in master to use _O_NO_INHERIT for what it > actually means semantically makes a lot of sense. > > So the plan would be: > > Backport branches (v16+): > #define O_DIRECT 0x80000000 > #define O_CLOEXEC 0x04000000 > #define O_DSYNC _O_NO_INHERIT > > Master: > #define O_DIRECT 0x80000000 > #define O_DSYNC 0x04000000 > #define O_CLOEXEC _O_NO_INHERIT > > And then in pgwin32_open(): > sa.bInheritHandle = (fileFlags & O_CLOEXEC) ? FALSE : TRUE; > > I will prepare a new version of the patch that implements the suggested > change for master. > > The changes for master, along with a tap test, are provided with the attached patch. -- Bryan Green EDB: https://www.enterprisedb.com
Вложения
On 11/7/2025 11:28 AM, Bryan Green wrote: > On 11/6/2025 8:42 AM, Bryan Green wrote: >> On 11/6/2025 7:43 AM, Thomas Munro wrote: ... >> So the plan would be: >> >> Backport branches (v16+): >> #define O_DIRECT 0x80000000 >> #define O_CLOEXEC 0x04000000 >> #define O_DSYNC _O_NO_INHERIT >> >> Master: >> #define O_DIRECT 0x80000000 >> #define O_DSYNC 0x04000000 >> #define O_CLOEXEC _O_NO_INHERIT >> >> And then in pgwin32_open(): >> sa.bInheritHandle = (fileFlags & O_CLOEXEC) ? FALSE : TRUE; >> >> I will prepare a new version of the patch that implements the suggested >> change for master. >> >> > The changes for master, along with a tap test, are provided with the > attached patch. > Thanks to CI discovered a mistake in the makefile and meson.build file for the tests. New patch attached. -- Bryan Green EDB: https://www.enterprisedb.com
Вложения
On Wed, Nov 12, 2025 at 2:01 PM Bryan Green <dbryan.green@gmail.com> wrote: > [v3] "The original commit included a comment suggesting that our open() replacement doesn't create inheritable handles, but it appears this may have been based on a misunderstanding of the code path. In practice, the code was creating inheritable handles in all cases." s/it appears this may have been been/was/ :-) "To fix, define O_CLOEXEC to a nonzero value (0x80000, in the private use range for open() flags). Then honor it in pgwin32_open_handle()" Out of date, maybe skip mentioning the value in the commit message? Maybe add a note here about the back-branches preserving existing values and master cleaning up? Do you happen to have a fixup that supplies the difference in the back-branches so we can see it passing in CI? + * Note: We could instead use SetHandleInformation() after CreateFile() to + * clear HANDLE_FLAG_INHERIT, but setting bInheritHandle=FALSE is simpler + * and achieves the same result. + */ It also wouldn't be thread-safe. That is meaningful today for frontend programs (and hopefully some day soon also in the backend). + sa.bInheritHandle = (fileFlags & O_CLOEXEC) ? FALSE : TRUE; Just out of sheer curiosity, I often see gratuitous FALSE and TRUE in Windowsian code, not false and true and not reduced to eg !(fileFlags & O_CLOEXEC). Is that a code convention thing from somewhere in Windows-land? +++ b/src/test/modules/test_cloexec/test_cloexec.c I like the test. Very helpful for people reliant on CI for Windows coverage. Independently of all this, and just on the off-chance that it might be interesting to you in future, I have previously tried to write tests for our whole Windows filesystem shim layer and found lots of bugs and understood lots of quirks that way, but never got it to be good enough for inclusion in the tree: https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BajSQ_8eu2AogTncOnZ5me2D-Cn66iN_-wZnRjLN%2Bicg%40mail.gmail.com There is some overlap with several of your recent patches, as I was testing some of the same wrappers. One of the main things we've battled with in this project is the whole asynchronous-unlink thing, and generally the NT/VMS file locking concept which can't quite be entirely emulated away, and that was one of my main focuses there after we got CI and started debugging the madness. Doing so revealed a bunch of interesting differences in Windows versions and file systems, and to this day we don't really have a project policy on which Windows filesystems PostgreSQL supports (cf your mention of needing NTFS for sparse files in one of your other threads, though I can't imagine that ReFS AKA DevDrive doesn't have those too being a COW system). Speaking of file I/O, and just as an FYI, I have a bunch of semi-working unfinished patches that bring true scatter/gather I/O (instead of the simple looping fallback in pg_preadv()) and native async I/O (for files, but actually also pipes and sockets but let me stick to talking about files for now) to Windows (traditional OVERLAPPED and/or IoRing.h, neither can do everything we need would you believe). Development via trial-by-CI from the safety of my Unix box is slow and painful going, but... let's say a real Windows programmer with a systems programming bent showed up and were interested in this stuff, I would be more than happy to write down everything I think I know about those subjects and post the unfinished work and then I bet development would go fast... just sayin'.
On 11/11/2025 8:34 PM, Thomas Munro wrote: > On Wed, Nov 12, 2025 at 2:01 PM Bryan Green <dbryan.green@gmail.com> wrote: >> [v3] > > "The original commit included a comment suggesting that our open() > replacement doesn't create inheritable handles, but it appears this > may have been based on a misunderstanding of the code path. In > practice, the code was creating inheritable handles in all cases." > > s/it appears this may have been been/was/ :-) > Changed. > "To fix, define O_CLOEXEC to a nonzero value (0x80000, in the private > use range for open() flags). Then honor it in pgwin32_open_handle()" > Removed. > Out of date, maybe skip mentioning the value in the commit message? > Maybe add a note here about the back-branches preserving existing > values and master cleaning up? Do you happen to have a fixup that > supplies the difference in the back-branches so we can see it passing > in CI? > I have attached a back-patch for v16-v18. > + * Note: We could instead use SetHandleInformation() after CreateFile() to > + * clear HANDLE_FLAG_INHERIT, but setting bInheritHandle=FALSE is simpler > + * and achieves the same result. > + */ > > It also wouldn't be thread-safe. That is meaningful today for > frontend programs (and hopefully some day soon also in the backend). > > + sa.bInheritHandle = (fileFlags & O_CLOEXEC) ? FALSE : TRUE; > > Just out of sheer curiosity, I often see gratuitous FALSE and TRUE in > Windowsian code, not false and true and not reduced to eg !(fileFlags > & O_CLOEXEC). Is that a code convention thing from somewhere in > Windows-land? > Yes, old habits die hard. I learned this pattern on Windows. Interestingly, enough when I am not on Windows I write the way you suggest. > +++ b/src/test/modules/test_cloexec/test_cloexec.c > > I like the test. Very helpful for people reliant on CI for Windows coverage. > > Independently of all this, and just on the off-chance that it might be > interesting to you in future, I have previously tried to write tests > for our whole Windows filesystem shim layer and found lots of bugs and > understood lots of quirks that way, but never got it to be good enough > for inclusion in the tree: > > https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BajSQ_8eu2AogTncOnZ5me2D-Cn66iN_-wZnRjLN%2Bicg%40mail.gmail.com > I shall take a look. > There is some overlap with several of your recent patches, as I was > testing some of the same wrappers. One of the main things we've > battled with in this project is the whole asynchronous-unlink thing, > and generally the NT/VMS file locking concept which can't quite be > entirely emulated away, and that was one of my main focuses there > after we got CI and started debugging the madness. Doing so revealed > a bunch of interesting differences in Windows versions and file > systems, and to this day we don't really have a project policy on > which Windows filesystems PostgreSQL supports (cf your mention of > needing NTFS for sparse files in one of your other threads, though I > can't imagine that ReFS AKA DevDrive doesn't have those too being a > COW system). > > Speaking of file I/O, and just as an FYI, I have a bunch of > semi-working unfinished patches that bring true scatter/gather I/O > (instead of the simple looping fallback in pg_preadv()) and native > async I/O (for files, but actually also pipes and sockets but let me > stick to talking about files for now) to Windows (traditional > OVERLAPPED and/or IoRing.h, neither can do everything we need would > you believe). Development via trial-by-CI from the safety of my Unix > box is slow and painful going, but... let's say a real Windows > programmer with a systems programming bent showed up and were > interested in this stuff, I would be more than happy to write down > everything I think I know about those subjects and post the unfinished > work and then I bet development would go fast... just sayin'. I would absolutely love to read everything you think you know about those subjects and contribute to the work. -- Bryan Green EDB: https://www.enterprisedb.com
Вложения
On 11/12/2025 4:10 PM, Bryan Green wrote: > On 11/11/2025 8:34 PM, Thomas Munro wrote: >> On Wed, Nov 12, 2025 at 2:01 PM Bryan Green <dbryan.green@gmail.com> wrote: >>> [v3] >> >> "The original commit included a comment suggesting that our open() >> replacement doesn't create inheritable handles, but it appears this >> may have been based on a misunderstanding of the code path. In >> practice, the code was creating inheritable handles in all cases." >> >> s/it appears this may have been been/was/ :-) >> > > Changed. > >> "To fix, define O_CLOEXEC to a nonzero value (0x80000, in the private >> use range for open() flags). Then honor it in pgwin32_open_handle()" >> > > Removed. >> Out of date, maybe skip mentioning the value in the commit message? >> Maybe add a note here about the back-branches preserving existing >> values and master cleaning up? Do you happen to have a fixup that >> supplies the difference in the back-branches so we can see it passing >> in CI? >> > > I have attached a back-patch for v16-v18. > >> + * Note: We could instead use SetHandleInformation() after CreateFile() to >> + * clear HANDLE_FLAG_INHERIT, but setting bInheritHandle=FALSE is simpler >> + * and achieves the same result. >> + */ >> >> It also wouldn't be thread-safe. That is meaningful today for >> frontend programs (and hopefully some day soon also in the backend). >> >> + sa.bInheritHandle = (fileFlags & O_CLOEXEC) ? FALSE : TRUE; >> >> Just out of sheer curiosity, I often see gratuitous FALSE and TRUE in >> Windowsian code, not false and true and not reduced to eg !(fileFlags >> & O_CLOEXEC). Is that a code convention thing from somewhere in >> Windows-land? >> > > Yes, old habits die hard. I learned this pattern on Windows. > Interestingly, enough when I am not on Windows I write the way you suggest. > >> +++ b/src/test/modules/test_cloexec/test_cloexec.c >> >> I like the test. Very helpful for people reliant on CI for Windows coverage. >> >> Independently of all this, and just on the off-chance that it might be >> interesting to you in future, I have previously tried to write tests >> for our whole Windows filesystem shim layer and found lots of bugs and >> understood lots of quirks that way, but never got it to be good enough >> for inclusion in the tree: >> >> https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BajSQ_8eu2AogTncOnZ5me2D-Cn66iN_-wZnRjLN%2Bicg%40mail.gmail.com >> > > I shall take a look. > >> There is some overlap with several of your recent patches, as I was >> testing some of the same wrappers. One of the main things we've >> battled with in this project is the whole asynchronous-unlink thing, >> and generally the NT/VMS file locking concept which can't quite be >> entirely emulated away, and that was one of my main focuses there >> after we got CI and started debugging the madness. Doing so revealed >> a bunch of interesting differences in Windows versions and file >> systems, and to this day we don't really have a project policy on >> which Windows filesystems PostgreSQL supports (cf your mention of >> needing NTFS for sparse files in one of your other threads, though I >> can't imagine that ReFS AKA DevDrive doesn't have those too being a >> COW system). >> >> Speaking of file I/O, and just as an FYI, I have a bunch of >> semi-working unfinished patches that bring true scatter/gather I/O >> (instead of the simple looping fallback in pg_preadv()) and native >> async I/O (for files, but actually also pipes and sockets but let me >> stick to talking about files for now) to Windows (traditional >> OVERLAPPED and/or IoRing.h, neither can do everything we need would >> you believe). Development via trial-by-CI from the safety of my Unix >> box is slow and painful going, but... let's say a real Windows >> programmer with a systems programming bent showed up and were >> interested in this stuff, I would be more than happy to write down >> everything I think I know about those subjects and post the unfinished >> work and then I bet development would go fast... just sayin'. > > I would absolutely love to read everything you think you know about > those subjects and contribute to the work. > > Corrected master patch and back patch for v16-v18. -- Bryan Green EDB: https://www.enterprisedb.com
Вложения
On Thu, Nov 13, 2025 at 11:17 AM Bryan Green <dbryan.green@gmail.com> wrote: > Corrected master patch and back patch for v16-v18. Thanks. I wondered what system-generated new handles might appear in a child process and potentially collide with a non-inherited handle's numerical value (perhaps a thread handle or something like that?), but I guess it'd also have to accept a write to confuse the test, which seems unlikely, so that's probably OK. I also hope that the new test could eventually be merged with a general port layer test suite as mentioned earlier. What do you think about these improvements? See attached. * moved and adjusted new comment about flag conversion to cover all three flags, since it's true for all of them * adjusted the comment about why we don't use SetHandleInformation() to highlight that it would be (slightly) leaky * removed O_DIRECT's extra definition from port.h, since it's now in win32_port.h One question is why on earth the open() redirection is in port.h while the "supplements to fcntl.h" are in win32_port.h. Obviously those are tightly coupled. As far as I know there are two forces keeping some Windows porting magic in port.h that we'd ideally isolate in win32_port.h, and this case doesn't appear to qualify for either as far as I can guess, anyway: * some sleep/retry wrappers were thought to be needed by Cygwin too: API-wise it's a POSIX, but it couldn't hide the underlying NT file locking semantics * sometimes we need a later definition time: I recall battling that for #define ftruncate and/or lseek (if you define them before certain system headers are included, you break them) Cygwin's <fcntl.h> defines these flags, if I've found the right file[1], and has its own open() that we're using directly. If it didn't, our code would have failed to compile when Cygwin was being tested in the build farm up until a year or so ago (and it will be tested again soon[2]). So we could probably move at least open() into win32_port.h, in a separate commit. I think it's also quite likely that Cygwin turns on the Windows 10 POSIX directory entry semantics, so even rename() etc could probably be moved over too. (Whether our own porting layer should turn that stuff on too and delete the retry stuff entirely is an open question which no Windows expert has ever opined on, only us Unix hackers battling against random failures in the build farm.) We should probably also set up a CI task for Cygwin if we're keeping support. [1] https://github.com/cygwin/cygwin/blob/main/newlib/libc/include/sys/_default_fcntl.h [2] https://www.postgresql.org/message-id/flat/916d0fd1-a99b-41c4-a017-ff2428bf8cca%40dunslane.net