Обсуждение: [Patch] Windows relation extension failure at 2GB and 4GB

Поиск

Список

Период

Сортировка

[Patch] Windows relation extension failure at 2GB and 4GB

От

Bryan Green

Дата:

28 октября 2025 г., 17:42:11

Hello,

I found two related bugs in PostgreSQL's Windows port that prevent files 
from exceeding 2GB. While unlikely to affect most installations (default 
1GB segments), the code is objectively wrong and worth fixing.

The first bug is a pervasive use of off_t where pgoff_t should be used. 
On Windows, off_t is only 32-bit, causing signed integer overflow at 
exactly 2GB (2^31 bytes). PostgreSQL already defined pgoff_t as __int64 
for this purpose and some function declarations in headers already use 
it, but the implementations weren't updated to match.

The problem shows up in multiple layers:

In fd.c and fd.h, the VfdCache structure's fileSize field uses off_t, as 
do FileSize(), FileTruncate(), and all the File* functions. These are 
the core file I/O abstraction layer that everything else builds on.

In md.c, _mdnblocks() uses off_t for its calculations. The actual 
arithmetic for computing file offsets is fine - the casts to pgoff_t 
work correctly - but passing these values through functions with off_t 
parameters truncates them.

In pg_iovec.h, pg_preadv() and pg_pwritev() take off_t offset 
parameters, truncating any 64-bit offsets passed from above.

In file_utils.c, pg_pwrite_zeros() takes an off_t offset parameter. This
function is called by FileZero() to extend files with zeros, so it's hit
during relation extension. This was the actual culprit in my testing -
mdzeroextend() would compute a correct 64-bit offset, but it got 
truncated to 32-bit (and negative) when passed to pg_pwrite_zeros().

After fixing all those off_t issues, there's a second bug at 4GB in the 
Windows implementations of pg_pwrite()/pg_pread() in win32pwrite.c and 
win32pread.c. The current implementation uses an OVERLAPPED structure 
for positioned I/O, but only sets the Offset field (low 32 bits), 
leaving OffsetHigh at zero. This works up to 4GB by accident, but beyond 
that, offsets wrap around.

I can reproduce both bugs reliably with --with-segsize=8. The file grows 
to exactly 2GB and fails with "could not extend file: Invalid argument" 
despite having 300GB free. After fixing the off_t issues, it grows to 
exactly 4GB and hits the OVERLAPPED bug. Both are independently verifiable.

The fix touches nine files:
   src/include/storage/fd.h - File* function declarations
   src/backend/storage/file/fd.c - File* implementations and VfdCache
   src/backend/storage/smgr/md.c - _mdnblocks and other functions
   src/include/port/pg_iovec.h - pg_preadv/pg_pwritev signatures
   src/include/common/file_utils.h - pg_pwrite_zeros declaration
   src/common/file_utils.c - pg_pwrite_zeros implementation
   src/include/port/win32_port.h - pg_pread/pg_pwrite declarations
   src/port/win32pwrite.c - Windows pwrite implementation
   src/port/win32pread.c - Windows pread implementation

It's safe for all platforms since pgoff_t equals off_t on Unix where 
off_t is already 64-bit. Only Windows behavior changes.

That said, I'm finding off_t used in many other places throughout the 
codebase - buffile.c, various other file utilities such as backup and 
archive, probably more. This is likely causing latent bugs elsewhere on 
Windows, though most are masked by the 1GB default segment size. I'm 
investigating the full scope, but I think this needs to be broken up 
into multiple patches. The core file I/O layer (fd.c, md.c,
pg_pwrite/pg_pread) should probably go first since that's what's 
actively breaking file extension.

Not urgent since few people hit this in practice, but it's clearly wrong 
code.
Someone building with larger segments would see failures at 2GB and 
potential corruption at 4GB. Windows supports files up to 16 exabytes - 
no good reason to limit PostgreSQL to 2GB.

I have attached the patch to fix the relation extension problems for 
Windows to this email.

Can provide the other patches that changes off_t for pgoff_t in the rest 
of the code if there's interest in fixing this.

To reproduce the bugs on Windows:

1) Build with large segment size: meson setup build 
--prefix=C:\pgsql-test -Dsegsize=8.
2) Create a large table and insert data that will make it bigger than 2GB.

CREATE TABLE large_test (
     id bigserial PRIMARY KEY,
     data1 text,
     data2 text,
     data3 text
);

INSERT INTO large_test (data1, data2, data3)
SELECT
     repeat('A', 300),
     repeat('B', 300),
     repeat('C', 300)
FROM generate_series(1, 5000000);

SELECT pg_size_pretty(pg_relation_size('large_test'));

You will notice at this point that the first bug surfaces.

3) If you want to reproduce the 2nd bug then you should apply the patch 
and then comment out 'overlapped.OffsetHigh = (DWORD) (offset >> 32);' 
is win32pwrite.c.
4) Assuming you did 3, do the test in 2 again.  If you are watching the 
data/base/N/xxxxx file growing you will notice that it gets past 2GB but 
now fails at 4GB.


BG

Вложения

0001-Fix-Windows-file-IO.patch

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Michael Paquier

Дата:

06 ноября 2025 г., 08:05:15

On Tue, Oct 28, 2025 at 09:42:11AM -0500, Bryan Green wrote:
> I found two related bugs in PostgreSQL's Windows port that prevent files
> from exceeding 2GB. While unlikely to affect most installations (default 1GB
> segments), the code is objectively wrong and worth fixing.
>
> The first bug is a pervasive use of off_t where pgoff_t should be used. On
> Windows, off_t is only 32-bit, causing signed integer overflow at exactly
> 2GB (2^31 bytes). PostgreSQL already defined pgoff_t as __int64 for this
> purpose and some function declarations in headers already use it, but the
> implementations weren't updated to match.

Ugh.  That's the same problem as "long", which is 8 bytes wide
everywhere except WIN32.  Removing traces of "long" from the code has
been a continuous effort over the years because of these silent
overflow issues.

> After fixing all those off_t issues, there's a second bug at 4GB in the
> Windows implementations of pg_pwrite()/pg_pread() in win32pwrite.c and
> win32pread.c. The current implementation uses an OVERLAPPED structure for
> positioned I/O, but only sets the Offset field (low 32 bits), leaving
> OffsetHigh at zero. This works up to 4GB by accident, but beyond that,
> offsets wrap around.
>
> I can reproduce both bugs reliably with --with-segsize=8. The file grows to
> exactly 2GB and fails with "could not extend file: Invalid argument" despite
> having 300GB free. After fixing the off_t issues, it grows to exactly 4GB
> and hits the OVERLAPPED bug. Both are independently verifiable.

The most popular option in terms of installation on Windows is the EDB
installer, where I bet that a file segment size of 1GB is what's
embedded in the code compiled.  This argument would not hold with WAL
segment sizes if we begin to support even higher sizes than 1GB at
some point, and we use pwrite() in the WAL insert code.  That should
not be a problem even in the near future.

> It's safe for all platforms since pgoff_t equals off_t on Unix where off_t
> is already 64-bit. Only Windows behavior changes.

win32_port.h and port.h say so, yeah.

> Not urgent since few people hit this in practice, but it's clearly wrong
> code.
> Someone building with larger segments would see failures at 2GB and
> potential corruption at 4GB. Windows supports files up to 16 exabytes - no
> good reason to limit PostgreSQL to 2GB.

The same kind of limitations with 4GB files existed with stat() and
fstat(), but it was much more complicated than what you are doing
here, where COPY was not able to work with files larger than 4GB on
WIN32.  See the saga from bed90759fcbc.

> I have attached the patch to fix the relation extension problems for Windows
> to this email.
>
> Can provide the other patches that changes off_t for pgoff_t in the rest of
> the code if there's interest in fixing this.

Yeah, I think that we should rip out these issues, and move to the
more portable pgoff_t across the board.  I doubt that any of this
could be backpatched due to the potential ABI breakages these
signatures changes would cause.  Implementing things in incremental
steps is more sensible when it comes to such changes, as a revert
blast can be reduced if a portion is incorrectly handled.

I'm seeing as well the things you are pointing in buffile.c.  These
should be fixed as well.  The WAL code is less annoying due to the 1GB
WAL segment size limit, still consistency across the board makes the
code easier to reason about, at least.

Among the files you have mentioned, there is also copydir.c.

pg_rewind seems also broken with files larger than 4GB, from what I
can see in libpq_source.c and filemap.c..  Worse.  Oops.

-    /* Note that this changes the file position, despite not using it. */
-    overlapped.Offset = offset;
+    overlapped.Offset = (DWORD) offset;
+    overlapped.OffsetHigh = (DWORD) (offset >> 32);

Based on the docs at [1], yes, this change makes sense.

It seems to me that a couple of extra code paths should be handled in
the first patch, and I have spotted three of them.  None of them are
critical as they are related to WAL segments, just become masked and
inconsistent:
- xlogrecovery.c, pg_pread() called with a cast to off_t.  WAL
segments have a max size of 1GB, meaning that we're OK.
- xlogreader.c, pg_pread() with a cast to off_t.
- walreceiver.c, pg_pwrite().

Except for these three spots, the first patch looks like a cut good
enough on its own.

Glad to see someone who takes time to spend time on this kind of
stuff with portability in mind, by the way.

[1]: https://learn.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-overlapped
--
Michael

Вложения

signature.asc

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Bryan Green

Дата:

06 ноября 2025 г., 09:05:06

On 11/5/2025 11:05 PM, Michael Paquier wrote:
> On Tue, Oct 28, 2025 at 09:42:11AM -0500, Bryan Green wrote:
>> I found two related bugs in PostgreSQL's Windows port that prevent files
>> from exceeding 2GB. While unlikely to affect most installations (default 1GB
>> segments), the code is objectively wrong and worth fixing.
>>
>> The first bug is a pervasive use of off_t where pgoff_t should be used. On
>> Windows, off_t is only 32-bit, causing signed integer overflow at exactly
>> 2GB (2^31 bytes). PostgreSQL already defined pgoff_t as __int64 for this
>> purpose and some function declarations in headers already use it, but the
>> implementations weren't updated to match.
> 
> Ugh.  That's the same problem as "long", which is 8 bytes wide
> everywhere except WIN32.  Removing traces of "long" from the code has
> been a continuous effort over the years because of these silent
> overflow issues.
> 

Exactly - these silent overflows are particularly nasty since they only
manifest under specific conditions (large files on Windows) and can
cause data corruption rather than immediate crashes.

>> After fixing all those off_t issues, there's a second bug at 4GB in the
>> Windows implementations of pg_pwrite()/pg_pread() in win32pwrite.c and
>> win32pread.c. The current implementation uses an OVERLAPPED structure for
>> positioned I/O, but only sets the Offset field (low 32 bits), leaving
>> OffsetHigh at zero. This works up to 4GB by accident, but beyond that,
>> offsets wrap around.
>>
>> I can reproduce both bugs reliably with --with-segsize=8. The file grows to
>> exactly 2GB and fails with "could not extend file: Invalid argument" despite
>> having 300GB free. After fixing the off_t issues, it grows to exactly 4GB
>> and hits the OVERLAPPED bug. Both are independently verifiable.
> 
> The most popular option in terms of installation on Windows is the EDB
> installer, where I bet that a file segment size of 1GB is what's
> embedded in the code compiled.  This argument would not hold with WAL

Right, which is why this has gone unnoticed. The 1GB default masks both
bugs completely. It's only when someone uses --with-segsize > 2 that the
issues appear.

> segment sizes if we begin to support even higher sizes than 1GB at
> some point, and we use pwrite() in the WAL insert code.  That should
> not be a problem even in the near future.
> 
>> It's safe for all platforms since pgoff_t equals off_t on Unix where off_t
>> is already 64-bit. Only Windows behavior changes.
> 
> win32_port.h and port.h say so, yeah.
> 
>> Not urgent since few people hit this in practice, but it's clearly wrong
>> code.
>> Someone building with larger segments would see failures at 2GB and
>> potential corruption at 4GB. Windows supports files up to 16 exabytes - no
>> good reason to limit PostgreSQL to 2GB.
> 
> The same kind of limitations with 4GB files existed with stat() and
> fstat(), but it was much more complicated than what you are doing
> here, where COPY was not able to work with files larger than 4GB on
> WIN32.  See the saga from bed90759fcbc.
> 
>> I have attached the patch to fix the relation extension problems for Windows
>> to this email.
>>
>> Can provide the other patches that changes off_t for pgoff_t in the rest of
>> the code if there's interest in fixing this.
> 
> Yeah, I think that we should rip out these issues, and move to the
> more portable pgoff_t across the board.  I doubt that any of this
> could be backpatched due to the potential ABI breakages these
> signatures changes would cause.  Implementing things in incremental 
> steps is more sensible when it comes to such changes, as a revert
> blast can be reduced if a portion is incorrectly handled.
> 
> I'm seeing as well the things you are pointing in buffile.c.  These
> should be fixed as well.  The WAL code is less annoying due to the 1GB
> WAL segment size limit, still consistency across the board makes the
> code easier to reason about, at least.
> 
> Among the files you have mentioned, there is also copydir.c.
> 
> pg_rewind seems also broken with files larger than 4GB, from what I
> can see in libpq_source.c and filemap.c..  Worse.  Oops.
> 
> -    /* Note that this changes the file position, despite not using it. */
> -    overlapped.Offset = offset;
> +    overlapped.Offset = (DWORD) offset;
> +    overlapped.OffsetHigh = (DWORD) (offset >> 32);
> 
> Based on the docs at [1], yes, this change makes sense.
> 
> It seems to me that a couple of extra code paths should be handled in
> the first patch, and I have spotted three of them.  None of them are
> critical as they are related to WAL segments, just become masked and
> inconsistent:
> - xlogrecovery.c, pg_pread() called with a cast to off_t.  WAL
> segments have a max size of 1GB, meaning that we're OK.
> - xlogreader.c, pg_pread() with a cast to off_t.
> - walreceiver.c, pg_pwrite().
> 

I'll include these in the first patch for consistency even though
they're not currently problematic. Better to fix all the function call
sites together rather than leaving known inconsistencies.

> Except for these three spots, the first patch looks like a cut good
> enough on its own.
> 
> Glad to see someone who takes time to spend time on this kind of
> stuff with portability in mind, by the way.
> 

Windows portability issues tend to hide in corners like this. I'll
prepare the updated patch series addressing your feedback and post v2
shortly.

> [1]: https://learn.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-overlapped
> --
> Michael

Thanks for taking the time to look over this patch.

-- 
Bryan Green
EDB: https://www.enterprisedb.com

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Thomas Munro

Дата:

06 ноября 2025 г., 12:20:09

On Wed, Oct 29, 2025 at 3:42 AM Bryan Green <dbryan.green@gmail.com> wrote:
> That said, I'm finding off_t used in many other places throughout the
> codebase - buffile.c, various other file utilities such as backup and
> archive, probably more. This is likely causing latent bugs elsewhere on
> Windows, though most are masked by the 1GB default segment size. I'm
> investigating the full scope, but I think this needs to be broken up
> into multiple patches. The core file I/O layer (fd.c, md.c,
> pg_pwrite/pg_pread) should probably go first since that's what's
> actively breaking file extension.

The way I understand this situation, there are two kinds of file I/O,
with respect to large files:

1.  Some places *have* to deal with large files (eg navigating in a
potentially large tar file), and there we should already be using
pgoff_t and the relevant system call wrappers should be using the
int64_t stuff Windows provides.  These are primarily frontend code.
2.  Some places use segmentation *specifically because* there are
systems with 32 bit off_t.  These are mostly backend code dealing with
relation data files.  The only system left with narrow off_t is
Windows.

In reality the stuff in category 1 has been developed through a
process of bug reports and patches (970b97e and 970b97e^ springs to
mind as the most recent case I had something to with, but see also
stat()-related stuff, and see aa5518304 where we addressed the one
spot in buffile.c that had to consider multiple segments).  But the
fact that Windows can't use segments > 2GB because the fd.c and
smgr.c/md.c layers work with off_t is certainly a well known
limitation, ie specifically that relation and temporary/buf files are
special in this way.  I'm mostly baffled by the fact that --relsegsize
actually *lets* you set it higher than 2 on that platform.  Perhaps we
should at least backpatch a configure check or static assertion to
block that?  It's not good if it compiles but doesn't actually work.

For master I think it makes sense to clean this up, as you say,
because the fuzzy boundary between the two categories of file I/O is
bound to cause more problems, it's just unfinished business that has
been tackled piecemeal as required by bug reports...  In fact, on a
thread[1] where I explored making the segment size a runtime option
specified at initdb time, I even posted patches much like yours in the
first version, spreading pgoff_t into more places, and then in a later
version it was suggested that it might be better to just block
settings that are too big for your off_t, so I did that.  I probably
thought that we already did that somewhere for the current
compile-time constant...

> Not urgent since few people hit this in practice, but it's clearly wrong
> code.

Yeah.  In my experience dealing with bug reports, the Windows users
community skews very heavily towards just consuming EDB's read-built
installer.  We rarely hear about configuration-level problems, so I
suppose it's not surprising that no one has ever complained that it
lets you configure it in a way that we hackers all know is certainly
going to break.

[1]
https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BBGXwMbrvzXAjL8VMGf25y_ga_XnO741g10y0%3Dm6dDiA%40mail.gmail.com

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Thomas Munro

Дата:

06 ноября 2025 г., 13:10:00

On Thu, Nov 6, 2025 at 10:20 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> The only system left with narrow off_t is Windows.

By the way, that's not always true: Meson + MinGW have a 64-bit off_t
on Windows, because meson decides to set -D_FIILE_OFFSET_BITS=64 for
all C compilers except msvc[1] (only other exclusion is macOS, but
that is 64-bit only these days; there are other systems like FreeBSD
where sizeof(off_t) is always 8 but it doesn't seem to know about that
or bother to check), and MinGW's headers react to that.  I suspect
autoconf's AC_SYS_LARGEFILE would do that too with MinGW, IDK, but
configure.ac doesn't call it for win32 by special condition.  That
creates a strange difference between meson and autoconf builds IMHO,
but if we resolve that in the only direction possible AFAICS we'd
still have a strange difference between MSVC and MinGW.

Observing that mess, I kinda wonder what would happen if we just used
a big hammer to redefine off_t to be __int64 ourselves.  On the face
of it, it sounds like an inherently bad idea that could bite you when
interacting with libraries whose headers use off_t.  On the other
hand, the world of open source libraries we care about might already
be resistant to that chaos, if libraries are being built with and
without -D_FIILE_OFFSET_BITS=64 willy-nilly, or they actually can't
deal with large files at all in which case that's something we'd have
to deal with whatever we do.  I don't know, it's just a thought that
occurred to me while contemplating how unpleasant it is to splatter
pgoff_t all over our tree, and yet *still* have to tread very
carefully with the boundaries of external libraries that might be
using off_t, researching each case...

[1]
https://github.com/mesonbuild/meson/blob/97a1c567c9813176e4bec40f6055f228b2121609/mesonbuild/compilers/compilers.py#L1144

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Bryan Green

Дата:

06 ноября 2025 г., 17:56:14

On 11/6/2025 3:20 AM, Thomas Munro wrote:
> On Wed, Oct 29, 2025 at 3:42 AM Bryan Green <dbryan.green@gmail.com> wrote:
>> That said, I'm finding off_t used in many other places throughout the
>> codebase - buffile.c, various other file utilities such as backup and
>> archive, probably more. This is likely causing latent bugs elsewhere on
>> Windows, though most are masked by the 1GB default segment size. I'm
>> investigating the full scope, but I think this needs to be broken up
>> into multiple patches. The core file I/O layer (fd.c, md.c,
>> pg_pwrite/pg_pread) should probably go first since that's what's
>> actively breaking file extension.
> 
> The way I understand this situation, there are two kinds of file I/O,
> with respect to large files:
> 
> 1.  Some places *have* to deal with large files (eg navigating in a
> potentially large tar file), and there we should already be using
> pgoff_t and the relevant system call wrappers should be using the
> int64_t stuff Windows provides.  These are primarily frontend code.
> 2.  Some places use segmentation *specifically because* there are
> systems with 32 bit off_t.  These are mostly backend code dealing with
> relation data files.  The only system left with narrow off_t is
> Windows.
> 
> In reality the stuff in category 1 has been developed through a
> process of bug reports and patches (970b97e and 970b97e^ springs to
> mind as the most recent case I had something to with, but see also
> stat()-related stuff, and see aa5518304 where we addressed the one
> spot in buffile.c that had to consider multiple segments).  But the
> fact that Windows can't use segments > 2GB because the fd.c and
> smgr.c/md.c layers work with off_t is certainly a well known
> limitation, ie specifically that relation and temporary/buf files are
> special in this way.  I'm mostly baffled by the fact that --relsegsize
> actually *lets* you set it higher than 2 on that platform.  Perhaps we
> should at least backpatch a configure check or static assertion to
> block that?  It's not good if it compiles but doesn't actually work.
> 

I agree that the backpatch should just block setting -relsegsize > 2GB
on Windows.

> For master I think it makes sense to clean this up, as you say,
> because the fuzzy boundary between the two categories of file I/O is
> bound to cause more problems, it's just unfinished business that has
> been tackled piecemeal as required by bug reports...  In fact, on a
> thread[1] where I explored making the segment size a runtime option
> specified at initdb time, I even posted patches much like yours in the
> first version, spreading pgoff_t into more places, and then in a later
> version it was suggested that it might be better to just block
> settings that are too big for your off_t, so I did that.  I probably
> thought that we already did that somewhere for the current
> compile-time constant...
> 

For master, I'd like to proceed with the cleanup approach - spreading
pgoff_t into the core I/O layer (fd.c, md.c, pg_pread/pg_pwrite
wrappers, etc). That would let us eliminate the artificial 2GB ceiling
on Windows and clean up the file I/O category boundary.

>> Not urgent since few people hit this in practice, but it's clearly wrong
>> code.
> 
> Yeah.  In my experience dealing with bug reports, the Windows users
> community skews very heavily towards just consuming EDB's read-built
> installer.  We rarely hear about configuration-level problems, so I
> suppose it's not surprising that no one has ever complained that it
> lets you configure it in a way that we hackers all know is certainly
> going to break.
> 
> [1]
https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BBGXwMbrvzXAjL8VMGf25y_ga_XnO741g10y0%3Dm6dDiA%40mail.gmail.com

Thanks for the feedback.

-- 
Bryan Green
EDB: https://www.enterprisedb.com

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Bryan Green

Дата:

06 ноября 2025 г., 20:17:52

On 11/5/2025 11:05 PM, Michael Paquier wrote:
> On Tue, Oct 28, 2025 at 09:42:11AM -0500, Bryan Green wrote:
>> I found two related bugs in PostgreSQL's Windows port that prevent files
>> from exceeding 2GB. While unlikely to affect most installations (default 1GB
>> segments), the code is objectively wrong and worth fixing.
>>
>> The first bug is a pervasive use of off_t where pgoff_t should be used. On
>> Windows, off_t is only 32-bit, causing signed integer overflow at exactly
>> 2GB (2^31 bytes). PostgreSQL already defined pgoff_t as __int64 for this
>> purpose and some function declarations in headers already use it, but the
>> implementations weren't updated to match.
> 
> Ugh.  That's the same problem as "long", which is 8 bytes wide
> everywhere except WIN32.  Removing traces of "long" from the code has
> been a continuous effort over the years because of these silent
> overflow issues.
> 
>> After fixing all those off_t issues, there's a second bug at 4GB in the
>> Windows implementations of pg_pwrite()/pg_pread() in win32pwrite.c and
>> win32pread.c. The current implementation uses an OVERLAPPED structure for
>> positioned I/O, but only sets the Offset field (low 32 bits), leaving
>> OffsetHigh at zero. This works up to 4GB by accident, but beyond that,
>> offsets wrap around.
>>
>> I can reproduce both bugs reliably with --with-segsize=8. The file grows to
>> exactly 2GB and fails with "could not extend file: Invalid argument" despite
>> having 300GB free. After fixing the off_t issues, it grows to exactly 4GB
>> and hits the OVERLAPPED bug. Both are independently verifiable.
> 
> The most popular option in terms of installation on Windows is the EDB
> installer, where I bet that a file segment size of 1GB is what's
> embedded in the code compiled.  This argument would not hold with WAL
> segment sizes if we begin to support even higher sizes than 1GB at
> some point, and we use pwrite() in the WAL insert code.  That should
> not be a problem even in the near future.
> 
>> It's safe for all platforms since pgoff_t equals off_t on Unix where off_t
>> is already 64-bit. Only Windows behavior changes.
> 
> win32_port.h and port.h say so, yeah.
> 
>> Not urgent since few people hit this in practice, but it's clearly wrong
>> code.
>> Someone building with larger segments would see failures at 2GB and
>> potential corruption at 4GB. Windows supports files up to 16 exabytes - no
>> good reason to limit PostgreSQL to 2GB.
> 
> The same kind of limitations with 4GB files existed with stat() and
> fstat(), but it was much more complicated than what you are doing
> here, where COPY was not able to work with files larger than 4GB on
> WIN32.  See the saga from bed90759fcbc.
> 
>> I have attached the patch to fix the relation extension problems for Windows
>> to this email.
>>
>> Can provide the other patches that changes off_t for pgoff_t in the rest of
>> the code if there's interest in fixing this.
> 
> Yeah, I think that we should rip out these issues, and move to the
> more portable pgoff_t across the board.  I doubt that any of this
> could be backpatched due to the potential ABI breakages these
> signatures changes would cause.  Implementing things in incremental 
> steps is more sensible when it comes to such changes, as a revert
> blast can be reduced if a portion is incorrectly handled.
> 
> I'm seeing as well the things you are pointing in buffile.c.  These
> should be fixed as well.  The WAL code is less annoying due to the 1GB
> WAL segment size limit, still consistency across the board makes the
> code easier to reason about, at least.
> 
> Among the files you have mentioned, there is also copydir.c.
> 
> pg_rewind seems also broken with files larger than 4GB, from what I
> can see in libpq_source.c and filemap.c..  Worse.  Oops.
> 
> -    /* Note that this changes the file position, despite not using it. */
> -    overlapped.Offset = offset;
> +    overlapped.Offset = (DWORD) offset;
> +    overlapped.OffsetHigh = (DWORD) (offset >> 32);
> 
> Based on the docs at [1], yes, this change makes sense.
> 
> It seems to me that a couple of extra code paths should be handled in
> the first patch, and I have spotted three of them.  None of them are
> critical as they are related to WAL segments, just become masked and
> inconsistent:
> - xlogrecovery.c, pg_pread() called with a cast to off_t.  WAL
> segments have a max size of 1GB, meaning that we're OK.
> - xlogreader.c, pg_pread() with a cast to off_t.
> - walreceiver.c, pg_pwrite().
> 
> Except for these three spots, the first patch looks like a cut good
> enough on its own.
> 

Latest patch attached that includes these code paths.

> Glad to see someone who takes time to spend time on this kind of
> stuff with portability in mind, by the way.
> 
> [1]: https://learn.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-overlapped
> --
> Michael

Thanks for the quick reviewing.

-- 
Bryan Green
EDB: https://www.enterprisedb.com

Вложения

v2-0001-Fix-Windows-file-IO.patch

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Andres Freund

Дата:

06 ноября 2025 г., 20:45:30

Hi,

On 2025-11-06 11:17:52 -0600, Bryan Green wrote:
> From d3f7543a35b3b72a7069188302cbfc7e4de9120b Mon Sep 17 00:00:00 2001
> From: Bryan Green <dbryan.green@gmail.com>
> Date: Thu, 6 Nov 2025 10:56:02 -0600
> Subject: [PATCH] Fix Windows file I/O to support files larger than 2GB

Could we add a testcase that actually exercises at least some of the
codepaths? We presumably wouldn't want to actually write that much data, but
it shouldn't be hard to write portable code to create a file with holes...

Greetings,

Andres Freund

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Michael Paquier

Дата:

07 ноября 2025 г., 01:13:33

On Thu, Nov 06, 2025 at 12:45:30PM -0500, Andres Freund wrote:
> Could we add a testcase that actually exercises at least some of the
> codepaths? We presumably wouldn't want to actually write that much data, but
> it shouldn't be hard to write portable code to create a file with holes...

With something that relies on a pg_pwrite() and pg_pread(), that does
not sound like an issue to me.

FWIW, I have wanted a test module that does FS-level operations for
some time.  Here, we could just have thin wrappers of the write and
read calls and a give way for the tests to pass directly arguments to
them via a SQL function call.  That would be easier to extend
depending on what comes next.  Not sure that this is absolutely
mandatory for the sake of the proposal, though, but long-term that's
something we should do more to stress the portability of the code.
--
Michael

Вложения

signature.asc

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Michael Paquier

Дата:

07 ноября 2025 г., 01:20:46

On Thu, Nov 06, 2025 at 11:10:00PM +1300, Thomas Munro wrote:
> Observing that mess, I kinda wonder what would happen if we just used
> a big hammer to redefine off_t to be __int64 ourselves.  On the face
> of it, it sounds like an inherently bad idea that could bite you when
> interacting with libraries whose headers use off_t.  On the other
> hand, the world of open source libraries we care about might already
> be resistant to that chaos, if libraries are being built with and
> without -D_FIILE_OFFSET_BITS=64 willy-nilly, or they actually can't
> deal with large files at all in which case that's something we'd have
> to deal with whatever we do.  I don't know, it's just a thought that
> occurred to me while contemplating how unpleasant it is to splatter
> pgoff_t all over our tree, and yet *still* have to tread very
> carefully with the boundaries of external libraries that might be
> using off_t, researching each case...

Not sure about that.  It's always been something we have tackled with
the various pg_ and PG_ structures and definitions.  Not sure that
this has to apply here since we already have one pgoff_t for the
purpose of portability for quite a few years already (d00a3472cfc4).
--
Michael

Вложения

signature.asc

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Michael Paquier

Дата:

07 ноября 2025 г., 01:28:43

On Thu, Nov 06, 2025 at 11:17:52AM -0600, Bryan Green wrote:
> On 11/5/2025 11:05 PM, Michael Paquier wrote:
>> It seems to me that a couple of extra code paths should be handled in
>> the first patch, and I have spotted three of them.  None of them are
>> critical as they are related to WAL segments, just become masked and
>> inconsistent:
>> - xlogrecovery.c, pg_pread() called with a cast to off_t.  WAL
>> segments have a max size of 1GB, meaning that we're OK.
>> - xlogreader.c, pg_pread() with a cast to off_t.
>> - walreceiver.c, pg_pwrite().
>>
>> Except for these three spots, the first patch looks like a cut good
>> enough on its own.
>
> Latest patch attached that includes these code paths.

That feels OK for me.  Thomas, do you have a different view on the
matter for HEAD?  Like long, I would just switch to something that we
have in the tree that's fixed.

And +1 for the idea to restrict the segment size to never be more than
2GB based on a ./configure and meson check on the back branches.  In
PG15 and older branches, we already enforced a check by the way.  See
src/tools/msvc/Solution.pm which was the only way to compile the code
with visual studio so one would have never seen the limitations except
if they had the idea to edit the perl scripts (FWIW, I've done exactly
that in the past for a past project at $company, never touched the
segsize):
# only allow segsize 1 for now, as we can't do large files yet in windows
die "Bad segsize $options->{segsize}"
    unless $options->{segsize} == 1;

So this is a meson issue that goes down to v16, when using a VS
compiler.  Was there a different compiler where off_t is also 4 bytes?
MinGW is mentioned as clear by Thomas.
--
Michael

Вложения

signature.asc

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Bryan Green

Дата:

07 ноября 2025 г., 04:17:10

On 11/6/25 11:45, Andres Freund wrote:
> Hi,
> 
> On 2025-11-06 11:17:52 -0600, Bryan Green wrote:
>>  From d3f7543a35b3b72a7069188302cbfc7e4de9120b Mon Sep 17 00:00:00 2001
>> From: Bryan Green <dbryan.green@gmail.com>
>> Date: Thu, 6 Nov 2025 10:56:02 -0600
>> Subject: [PATCH] Fix Windows file I/O to support files larger than 2GB
> 
> Could we add a testcase that actually exercises at least some of the
> codepaths? We presumably wouldn't want to actually write that much data, but
> it shouldn't be hard to write portable code to create a file with holes...
> 
> Greetings,
> 
> Andres Freund
Agreed.  Added a test module called test_large_files.  There is a README.

--
Bryan Green
EDB: https://www.enterprisedb.com

Вложения

v3-0001-Fix-Windows-file-IO.patch

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Thomas Munro

Дата:

07 ноября 2025 г., 04:45:32

On Fri, Nov 7, 2025 at 11:29 AM Michael Paquier <michael@paquier.xyz> wrote:
> On Thu, Nov 06, 2025 at 11:17:52AM -0600, Bryan Green wrote:
> > Latest patch attached that includes these code paths.
>
> That feels OK for me.  Thomas, do you have a different view on the
> matter for HEAD?  Like long, I would just switch to something that we
> have in the tree that's fixed.

WFM.

-    /* Note that this changes the file position, despite not using it. */

Why drop these comments?  They still apply.

-

Accidental whitespace change?

         struct radvisory
         {
-            off_t        ra_offset;    /* offset into the file */
+            pgoff_t        ra_offset;    /* offset into the file */

IIRC this is a struct definition from an Apple man page, maybe leave unchanged?

Looking at the v3 that arrived while I was typing this:

+             errmsg("sparse files are only supported on Windows")));

Nit: maybe sparse file test only supported on Windows?  Also, nice test!

> And +1 for the idea to restrict the segment size to never be more than
> 2GB based on a ./configure and meson check on the back branches.  In
> PG15 and older branches, we already enforced a check by the way.  See
> src/tools/msvc/Solution.pm which was the only way to compile the code
> with visual studio so one would have never seen the limitations except
> if they had the idea to edit the perl scripts (FWIW, I've done exactly
> that in the past for a past project at $company, never touched the
> segsize):
> # only allow segsize 1 for now, as we can't do large files yet in windows
> die "Bad segsize $options->{segsize}"
>     unless $options->{segsize} == 1;
>
> So this is a meson issue that goes down to v16, when using a VS
> compiler.  Was there a different compiler where off_t is also 4 bytes?

Ohh, this all makes more sense now.  I wasn't wrong to think there was
already a check, it just didn't get ported to meson.

I wouldn't personally pitch the commit message as "Fix ...", which
sounds like a bug fix.  There *is* a bug, but it's in the meson work.
Something more like "Allow large relation files on Windows" seems more
appropriate for this one, but YMMV.

> MinGW is mentioned as clear by Thomas.

Only MinGW + meson.  MinGW + configure has 32-bit off_t as far as I
can tell because we do:

if test "$PORTNAME" != "win32"; then
   AC_SYS_LARGEFILE
...

I don't personally know of any current Unix without LFS, they just
vary on whether it's always on or you have to ask for it, as autoconf
and meson know.  But I suppose the check for oversized segments should
use sizeof(off_t), not the OS's identity.

There are of course a few filesystems even today that don't let you
make a file as big as our maximum size, but that's another topic.

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Michael Paquier

Дата:

07 ноября 2025 г., 05:12:58

On Fri, Nov 07, 2025 at 02:45:32PM +1300, Thomas Munro wrote:
> Only MinGW + meson.  MinGW + configure has 32-bit off_t as far as I
> can tell because we do:
>
> if test "$PORTNAME" != "win32"; then
>    AC_SYS_LARGEFILE
> ...
>
> I don't personally know of any current Unix without LFS, they just
> vary on whether it's always on or you have to ask for it, as autoconf
> and meson know.  But I suppose the check for oversized segments should
> use sizeof(off_t), not the OS's identity.

Yes, I was first wondering about the addition of a WIN32 check for
meson, but this is a much better idea for both ./configure and meson.

There is a cc.sizeof(), which I guess should be enough to report the
size of off_t, and fail if we try a size larger than 4GB for the
segment file when a 4-byte off_t is detected.  It's something that I'd
rather backpatch first down to v16, before moving on with more pgoff_t
integration in the tree, mostly for history clarity.  That's clearly
an oversight.
--
Michael

Вложения

signature.asc

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Thomas Munro

Дата:

07 ноября 2025 г., 05:56:07

On Fri, Nov 7, 2025 at 3:13 PM Michael Paquier <michael@paquier.xyz> wrote:
> There is a cc.sizeof(), which I guess should be enough to report the
> size of off_t, and fail if we try a size larger than 4GB for the
> segment file when a 4-byte off_t is detected.

(It's signed per POSIX and on Windows so I assume you meant to write 2GB here.)

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Michael Paquier

Дата:

07 ноября 2025 г., 06:32:21

On Fri, Nov 07, 2025 at 03:56:07PM +1300, Thomas Munro wrote:
> On Fri, Nov 7, 2025 at 3:13 PM Michael Paquier <michael@paquier.xyz> wrote:
>> There is a cc.sizeof(), which I guess should be enough to report the
>> size of off_t, and fail if we try a size larger than 4GB for the
>> segment file when a 4-byte off_t is detected.
>
> (It's signed per POSIX and on Windows so I assume you meant to write 2GB here.)

Yes, right..
--
Michael

Вложения

signature.asc

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Michael Paquier

Дата:

08 ноября 2025 г., 12:40:36

On Fri, Nov 07, 2025 at 12:32:21PM +0900, Michael Paquier wrote:
> On Fri, Nov 07, 2025 at 03:56:07PM +1300, Thomas Munro wrote:
>> (It's signed per POSIX and on Windows so I assume you meant to write 2GB here.)
>
> Yes, right..

So, please find attached a patch that adds a check for large files in
meson as we do now in ./configure, for a backpatch down to v16 (once
the branch freeze is lifted next week of course).

Thoughts?
--
Michael

Вложения

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Bryan Green

Дата:

11 ноября 2025 г., 22:22:49

On 11/8/2025 3:40 AM, Michael Paquier wrote:
> On Fri, Nov 07, 2025 at 12:32:21PM +0900, Michael Paquier wrote:
>> On Fri, Nov 07, 2025 at 03:56:07PM +1300, Thomas Munro wrote:
>>> (It's signed per POSIX and on Windows so I assume you meant to write 2GB here.)
>>
>> Yes, right..
> 
> So, please find attached a patch that adds a check for large files in
> meson as we do now in ./configure, for a backpatch down to v16 (once
> the branch freeze is lifted next week of course). 
> 
> Thoughts?
> --
> Michael
I have attached v4 of the patch after correcting some whitespace errors
and a struct that didn't need the pgoff_t modification as requested by
Thomas.  I also have attached Michael's patch where both patches can be
found by cfbot for the commitfest.

-- 
Bryan Green
EDB: https://www.enterprisedb.com

Вложения

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Michael Paquier

Дата:

12 ноября 2025 г., 10:58:43

On Tue, Nov 11, 2025 at 01:22:49PM -0600, Bryan Green wrote:
> I have attached v4 of the patch after correcting some whitespace errors
> and a struct that didn't need the pgoff_t modification as requested by
> Thomas.  I also have attached Michael's patch where both patches can be
> found by cfbot for the commitfest.

Thanks.  As the stamps have been pushed for the next minor release, I
have applied and backpatched the meson check for now.  I'll look at
your patch next, for HEAD.
--
Michael

Вложения

signature.asc

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Dagfinn Ilmari Mannsåker

Дата:

12 ноября 2025 г., 13:55:47

Michael Paquier <michael@paquier.xyz> writes:

> On Tue, Nov 11, 2025 at 01:22:49PM -0600, Bryan Green wrote:
>> I have attached v4 of the patch after correcting some whitespace errors
>> and a struct that didn't need the pgoff_t modification as requested by
>> Thomas.  I also have attached Michael's patch where both patches can be
>> found by cfbot for the commitfest.
>
> Thanks.  As the stamps have been pushed for the next minor release, I
> have applied and backpatched the meson check for now.  I'll look at
> your patch next, for HEAD.

I just noticed that the check is for 2GB, but the error message says
1GB.

- ilmari

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Thomas Munro

Дата:

12 ноября 2025 г., 14:26:03

On Wed, Nov 12, 2025 at 11:55 PM Dagfinn Ilmari Mannsåker
<ilmari@ilmari.org> wrote:
> I just noticed that the check is for 2GB, but the error message says
> 1GB.

-Dsegsize only accepts integers here and it's expressed in GB, so only
1 will actually work.  Presumably you could set a size up to 2GB - 1
block using -Dsegsize_blocks instead and it would work, but that's
clearly documented as developer-only.  I noticed that too and
scratched my head for a moment, but I think Michael used a defensible
cut-off and a defensible error message, they just disagree :-)

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Andres Freund

Дата:

12 ноября 2025 г., 17:19:34

Hi,

On 2025-11-11 13:22:49 -0600, Bryan Green wrote:
> diff --git a/src/test/modules/test_large_files/README b/src/test/modules/test_large_files/README
> new file mode 100644
> index 0000000000..d7caae49e6
> --- /dev/null
> +++ b/src/test/modules/test_large_files/README
> @@ -0,0 +1,53 @@
> +Test Module for Windows Large File I/O
> +
> +This test module provides functions to test PostgreSQL's ability to
> +handle files larger than 4GB on Windows.
> +
> +Requirements
> +
> +- Windows platform
> +- PostgreSQL built with segment size greater than 2GB
> +- NTFS filesystem (for sparse file support)

I don't think this test should only run on windows. That a) makes it harder to
maintain for non-windows devs b) assumes that only windows could ever have
issues.

Greetings,

Andres Freund

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Michael Paquier

Дата:

13 ноября 2025 г., 04:16:18

On Thu, Nov 13, 2025 at 12:26:03AM +1300, Thomas Munro wrote:
> On Wed, Nov 12, 2025 at 11:55 PM Dagfinn Ilmari Mannsåker
> <ilmari@ilmari.org> wrote:
>> I just noticed that the check is for 2GB, but the error message says
>> 1GB.
>
> -Dsegsize only accepts integers here and it's expressed in GB, so only
> 1 will actually work.  Presumably you could set a size up to 2GB - 1
> block using -Dsegsize_blocks instead and it would work, but that's
> clearly documented as developer-only.  I noticed that too and
> scratched my head for a moment, but I think Michael used a defensible
> cut-off and a defensible error message, they just disagree :-)

Yep, I was also scratching my head a bit on this one for the meson
bit, dived into its history before sticking to this message last
weekend for consistency.

In summary, the current formula is the same as d3b111e3205b, with a
wording much older than that: 3c6248a828af.  The original option in
this commit was only settable with GB in mind as units for the segment
size, hence a 4-byte off_t could work only with 1GB back originally,
hence the error message worded this way.  I doubt that it's worth
caring much for fancier segment sizes, but if we do we'd better change
that for meson and configure.
--
Michael

Вложения

signature.asc

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Michael Paquier

Дата:

13 ноября 2025 г., 07:05:33

On Wed, Nov 12, 2025 at 04:58:43PM +0900, Michael Paquier wrote:
> Thanks.  As the stamps have been pushed for the next minor release, I
> have applied and backpatched the meson check for now.  I'll look at
> your patch next, for HEAD.

Moving on to the I/O routine changes.  There was a little bit of
noise in the diffs, like one more comment removed that should still be
around.  Indentation has needed some adjustment as well, there was one
funny diff with a cast to pgoff_t.  And done this part as a first
step, because that's already a nice cut.

Then, about the test module.

src/test/modules/Makefile was missing, and once updated I have noticed
the extra REGRESS in the module's Makefile that made the tests fail
because we just have a TAP test.  This also meant that TAP_TESTS = 1
was also missing from the Makefile.  I've fixed these myself as per
the attached.

Anyway, I agree with the point about the restriction with WIN32: there
can be advantages in being able to run that in other places.  I think
that we should add a new value for PG_TEST_EXTRA and execute the test
based on that, or on small machines with little disk space (think
small SD cards), this is going to blow up.

Also, is there a point in making that a TAP test?  A SQL test should
work OK based on the set of SQL functions introduced for the file
creation step and the validation steps.  We could also use alternate
outputs if required.
--
Michael

Вложения

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Bryan Green

Дата:

13 ноября 2025 г., 19:58:54

On 11/12/2025 10:05 PM, Michael Paquier wrote:
> On Wed, Nov 12, 2025 at 04:58:43PM +0900, Michael Paquier wrote:
>> Thanks.  As the stamps have been pushed for the next minor release, I
>> have applied and backpatched the meson check for now.  I'll look at
>> your patch next, for HEAD.
> > Moving on to the I/O routine changes.  There was a little bit of
> noise in the diffs, like one more comment removed that should still be
> around.  Indentation has needed some adjustment as well, there was one
> funny diff with a cast to pgoff_t.  And done this part as a first
> step, because that's already a nice cut.
> 

Apologies for the state of this and your loss of time.  I was testing a
new workflow and attached the wrong revision.  No excuse, just what
happened.  I will be more careful and do a closer review of diffs going
forward.

> Then, about the test module.
> 
> src/test/modules/Makefile was missing, and once updated I have noticed
> the extra REGRESS in the module's Makefile that made the tests fail
> because we just have a TAP test.  This also meant that TAP_TESTS = 1
> was also missing from the Makefile.  I've fixed these myself as per
> the attached.
> 

I had started down the path of using sql and doing regression testing
and decide instead that a tap test would be better for my specific case
of testing on Windows.

> Anyway, I agree with the point about the restriction with WIN32: there
> can be advantages in being able to run that in other places.  I think
> that we should add a new value for PG_TEST_EXTRA and execute the test
> based on that, or on small machines with little disk space (think
> small SD cards), this is going to blow up.
> 

I was focused on testing the overlapped i/o portion of this for windows
and that is why I went with a tap test.

> Also, is there a point in making that a TAP test?  A SQL test should
> work OK based on the set of SQL functions introduced for the file
> creation step and the validation steps.  We could also use alternate
> outputs if required.
> --
> Michael

Thanks for all the work Michael.  I owe you for the cleanup.  I assume
you are suggesting that we shift from test for windows-specific bugs to
testing for whether any platform that supports N-bit file offsets,
whether PG's I/O layer can actually use them?  Basically we could check
the size of off_t or pgoff_t and the test at those offsets specifically.
 I think we would still want to use sparse files though.

The argument for a TAP test in this case would be File::Temp handles
cleanup automatically for us (even on test failure).  Also, no need for
alternate output files.

I agree we should go to a cross-platform test.  I'm 51/49 in favor of
using TAP tests still, but if you, or others, feel strongly otherwise, I
can restructure it to work that way.

-- 
Bryan Green
EDB: https://www.enterprisedb.com

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Michael Paquier

Дата:

14 ноября 2025 г., 09:44:50

On Thu, Nov 13, 2025 at 10:58:54AM -0600, Bryan Green wrote:
> On 11/12/2025 10:05 PM, Michael Paquier wrote:
>> > Moving on to the I/O routine changes.  There was a little bit of
>> noise in the diffs, like one more comment removed that should still be
>> around.  Indentation has needed some adjustment as well, there was one
>> funny diff with a cast to pgoff_t.  And done this part as a first
>> step, because that's already a nice cut.
>
> Apologies for the state of this and your loss of time.  I was testing a
> new workflow and attached the wrong revision.  No excuse, just what
> happened.  I will be more careful and do a closer review of diffs going
> forward.

No worries.  Thanks for all your efforts here.

> I had started down the path of using sql and doing regression testing
> and decide instead that a tap test would be better for my specific case
> of testing on Windows.

How much do we really care about the case of FSCTL_SET_SPARSE?  We
don't use it in the tree, and I doubt we will, but perhaps you have
some plans to use it for something I am unaware of, that would justify
its existence?

> The argument for a TAP test in this case would be File::Temp handles
> cleanup automatically for us (even on test failure).  Also, no need for
> alternate output files.
>
> I agree we should go to a cross-platform test.  I'm 51/49 in favor of
> using TAP tests still, but if you, or others, feel strongly otherwise, I
> can restructure it to work that way.

There are a couple of options here:
- Use NO_INSTALLCHECK so as the test would never be run on an existing
deployment, only check.  We could use that on top of a PG_TEST_EXTRA
to check with a large offset if the writes cannot be cheap..
- For alternate output, the module could have a SQL function that
returns the size of off_t or equivalent, mixed with an \if to avoid
the test for a sizeof 4 bytes.

If others argue in favor of a TAP test as well, that's OK by me.
However, there is nothing in the current patch that justifies that:
the proposal only does direct SQL function calls and does not need a
specific configuration or any cluster manipulations (aka restarts,
etc).
--
Michael

Вложения

signature.asc

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Bryan Green

Дата:

27 ноября 2025 г., 10:52:33

On 11/14/2025 12:44 AM, Michael Paquier wrote:
> On Thu, Nov 13, 2025 at 10:58:54AM -0600, Bryan Green wrote:
>> On 11/12/2025 10:05 PM, Michael Paquier wrote:
>>>> Moving on to the I/O routine changes.  There was a little bit of
>>> noise in the diffs, like one more comment removed that should still be
>>> around.  Indentation has needed some adjustment as well, there was one
>>> funny diff with a cast to pgoff_t.  And done this part as a first
>>> step, because that's already a nice cut.
>>
>> Apologies for the state of this and your loss of time.  I was testing a
>> new workflow and attached the wrong revision.  No excuse, just what
>> happened.  I will be more careful and do a closer review of diffs going
>> forward.
> 
> No worries.  Thanks for all your efforts here.
> 
>> I had started down the path of using sql and doing regression testing
>> and decide instead that a tap test would be better for my specific case
>> of testing on Windows.
> 
> How much do we really care about the case of FSCTL_SET_SPARSE?  We
> don't use it in the tree, and I doubt we will, but perhaps you have
> some plans to use it for something I am unaware of, that would justify
> its existence?
> 

No plans for it.  Dropped.

>> The argument for a TAP test in this case would be File::Temp handles
>> cleanup automatically for us (even on test failure).  Also, no need for
>> alternate output files.
>>
>> I agree we should go to a cross-platform test.  I'm 51/49 in favor of
>> using TAP tests still, but if you, or others, feel strongly otherwise, I
>> can restructure it to work that way.
> 
> There are a couple of options here:
> - Use NO_INSTALLCHECK so as the test would never be run on an existing
> deployment, only check.  We could use that on top of a PG_TEST_EXTRA
> to check with a large offset if the writes cannot be cheap..
> - For alternate output, the module could have a SQL function that
> returns the size of off_t or equivalent, mixed with an \if to avoid
> the test for a sizeof 4 bytes.
> 
> If others argue in favor of a TAP test as well, that's OK by me.
> However, there is nothing in the current patch that justifies that:

Agreed. I've reworked this as a SQL regression test per your suggestions.

The test now uses OpenTemporaryFile() via the VFD layer, which handles
cleanup automatically, so there's no need for TAP's File::Temp. A
test_large_files_offset_size() function returns sizeof(pgoff_t), and
the SQL uses \if to skip on platforms where that's less than 8 bytes.
NO_INSTALLCHECK is set.
One issue came up during testing: at 2GB+1, the OVERLAPPED.OffsetHigh
field is naturally zero, so commenting out the OffsetHigh fix didn't
cause the test to fail. I've changed the test offset to 4GB+1 where
OffsetHigh must be non-zero. The test now catches both bugs. FileSize()
provides independent verification that writes actually reached the
correct offset.

I have changed the name of the patch to reflect that it is not just
adding tests, but includes the change for the problem.

Updated patch attached.

> --
> Michael


-- 
Bryan Green
EDB: https://www.enterprisedb.com

Вложения

v6-0001-Fix-Windows-file-I-O-for-offsets-beyond-2GB.patch

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Michael Paquier

Дата:

02 декабря 2025 г., 05:46:56

On Thu, Nov 27, 2025 at 01:52:33AM -0600, Bryan Green wrote:
> The test now uses OpenTemporaryFile() via the VFD layer, which handles
> cleanup automatically, so there's no need for TAP's File::Temp. A
> test_large_files_offset_size() function returns sizeof(pgoff_t), and
> the SQL uses \if to skip on platforms where that's less than 8 bytes.
> NO_INSTALLCHECK is set.
> One issue came up during testing: at 2GB+1, the OVERLAPPED.OffsetHigh
> field is naturally zero, so commenting out the OffsetHigh fix didn't
> cause the test to fail. I've changed the test offset to 4GB+1 where
> OffsetHigh must be non-zero. The test now catches both bugs. FileSize()
> provides independent verification that writes actually reached the
> correct offset.
>
> I have changed the name of the patch to reflect that it is not just
> adding tests, but includes the change for the problem.
>
> Updated patch attached.

Pretty cool result, and the test fails in the Windows CI with
84fb27511dbe reverted:
- test_large_files_test_4gb_boundary
-------------------------------------
- 4GB boundary test passed
-(1 row)
-
+ERROR:  file size is 9 bytes, expected at least 4294967305 bytes -
offset truncated

-# If we don't have largefile support, can't handle segment size >= 2GB.
-if cc.sizeof('off_t', args: test_c_args) < 8
-  segsize_bytes = segsize * blocksize
-  if segsize_bytes >= (2 * 1024 * 1024 * 1024)
-    error('Large file support is not enabled. Segment size cannot be larger than 1GB.')
-  endif
-endif

I doubt that we can drop this check yet.  There are still a lot of
places in the tree that need to be switched from off_t to pgoff_t,
like the buffer APIs, etc.

+SELECT test_large_files_offset_size() >= 8 AS has_large_file_support \gset
+-- Only run test on platforms with 64-bit offsets
+\if :has_large_file_support

That would be sufficient to make the test conditional.  However we
already know that pgoff_t is forced at 8 bytes all the time in the
tree, so why do we need that.  If that was based on off_t, with tests
around it, that would be adapted, of course.

+PG_FUNCTION_INFO_V1(test_large_files_test_4gb_boundary);
+Datum
+test_large_files_test_4gb_boundary(PG_FUNCTION_ARGS)

As of this patch, this includes one function that opens a temporary
file, writes some data into it twice, checks its size, reads a few
bytes, closes the file.  When designing such test modules, it is
important to make them modular, IMO, so as they can be extended at
will for more cases in the future.  How about introducing a set of
thin SQL wrappers around all these File*() functions, taking in input
what they need.  For the data written, this would be some bytea in
input combined with an offset.  For the data read, a chunk of data to
return, with an offset where the data was read from.  Then the
comparisons could be done on the two byteas, for example.  FileClose()
is triggered at transaction commit through CleanupTempFiles(), so we
could return the File as an int4 when passed around to the SQL
functions (cannot rely on pg_read_file() as the path is not fixed),
passing it in input of the read, write and close function (close is
optional as we could rely on an explicit commit).

The new module is missing a .gitignore, leading to files present in
the tree when using configure/make after a test run.  You could just
copy one from one of the other test modules.
--
Michael

Вложения

signature.asc

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Andres Freund

Дата:

02 декабря 2025 г., 05:59:52

Hi,

On 2025-12-02 11:46:56 +0900, Michael Paquier wrote:
> I doubt that we can drop this check yet.  There are still a lot of
> places in the tree that need to be switched from off_t to pgoff_t,
> like the buffer APIs, etc.

Hm? What are you thinking about re buffer APIs?

Greetings,

Andres Freund

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Michael Paquier

Дата:

02 декабря 2025 г., 06:02:39

On Mon, Dec 01, 2025 at 09:59:52PM -0500, Andres Freund wrote:
> On 2025-12-02 11:46:56 +0900, Michael Paquier wrote:
>> I doubt that we can drop this check yet.  There are still a lot of
>> places in the tree that need to be switched from off_t to pgoff_t,
>> like the buffer APIs, etc.
>
> Hm? What are you thinking about re buffer APIs?

buffile.h and buffile.c still have traces of off_t.
--
Michael

Вложения

signature.asc

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Andres Freund

Дата:

02 декабря 2025 г., 06:05:52

On 2025-12-02 12:02:39 +0900, Michael Paquier wrote:
> On Mon, Dec 01, 2025 at 09:59:52PM -0500, Andres Freund wrote:
> > On 2025-12-02 11:46:56 +0900, Michael Paquier wrote:
> >> I doubt that we can drop this check yet.  There are still a lot of
> >> places in the tree that need to be switched from off_t to pgoff_t,
> >> like the buffer APIs, etc.
> > 
> > Hm? What are you thinking about re buffer APIs?
> 
> buffile.h and buffile.c still have traces of off_t.

Oh, I was interpreting buffer as bufmgr.c...

Re: [Patch] Windows relation extension failure at 2GB and 4GB

От

Michael Paquier

Дата:

02 декабря 2025 г., 06:12:28

On Mon, Dec 01, 2025 at 10:05:52PM -0500, Andres Freund wrote:
> On 2025-12-02 12:02:39 +0900, Michael Paquier wrote:
>> buffile.h and buffile.c still have traces of off_t.
>
> Oh, I was interpreting buffer as bufmgr.c...

Sorry for being rather unclear here ;D
--
Michael

Вложения

signature.asc

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: [Patch] Windows relation extension failure at 2GB and 4GB

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения