Обсуждение: Switch buffile.c/h to use pgoff_t
Hi all, (Added Bryan in CC as he has been looking at this stuff previously.) An mentioned on this thread and as a part of the quest to remove more of long in the tree, buffile.c and buffile.h still rely on an unportable off_t, which is signed 4 bytes on Windows: https://www.postgresql.org/message-id/0f238ff4-c442-42f5-adb8-01b762c94ca1@gmail.com Please find attached a patch to do the switch. I was surprised to see that the amount of code to adapt was limited, the routines of buffile.h changed in this commit being used in other places that keep track of offsets. Hence these other files just need to do a off_t => pgoff_t flip in a couple of structures to be updated, as far as I can see. This removes a couple of extra long casts, as well as one comment in BufFileSeek() that relates to overflows for large offsets, that would not exist with this switch, which is nice. Thanks, -- Michael
Вложения
> On Dec 19, 2025, at 09:43, Michael Paquier <michael@paquier.xyz> wrote: > > Hi all, > (Added Bryan in CC as he has been looking at this stuff previously.) > > An mentioned on this thread and as a part of the quest to remove more > of long in the tree, buffile.c and buffile.h still rely on an > unportable off_t, which is signed 4 bytes on Windows: > https://www.postgresql.org/message-id/0f238ff4-c442-42f5-adb8-01b762c94ca1@gmail.com > > Please find attached a patch to do the switch. I was surprised to see > that the amount of code to adapt was limited, the routines of > buffile.h changed in this commit being used in other places that keep > track of offsets. Hence these other files just need to do a off_t => > pgoff_t flip in a couple of structures to be updated, as far as I can > see. > > This removes a couple of extra long casts, as well as one comment in > BufFileSeek() that relates to overflows for large offsets, that would > not exist with this switch, which is nice. > > Thanks, > -- > Michael > <0001-Switch-buffile.c-h-to-use-portable-pgoff_t.patch> ``` while (wpos < file->nbytes) { - off_t availbytes; + pgoff_t availbytes; instr_time io_start; instr_time io_time; @@ -524,7 +524,7 @@ BufFileDumpBuffer(BufFile *file) bytestowrite = file->nbytes - wpos; availbytes = MAX_PHYSICAL_FILESIZE - file->curOffset; - if ((off_t) bytestowrite > availbytes) + if ((pgoff_t) bytestowrite > availbytes) bytestowrite = (int) availbytes; ``` bytestowrite is of type int, loosing it to pgoff_t then compare with availbytes, if bytestowrite > availbytes, then availbytesmust be within the range of int, so the next assignment “bytestowrite = (int) availbytes” is safe, but makes readingdifficult. Given MAX_PHYSICAL_FILESIZE is just 1G (2^30), why availbytes has to be pgoff_t instead of just int? Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
Hi, On Fri, Dec 19, 2025 at 11:00:54AM +0800, Chao Li wrote: > Given MAX_PHYSICAL_FILESIZE is just 1G (2^30), why availbytes has to be pgoff_t instead of just int? I agree that int would work, but maybe it's using pgoff_t just to be on the safe side of things should MAX_PHYSICAL_FILESIZE become 2^31 or higher one day? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Fri, Dec 19, 2025 at 11:00:54AM +0800, Chao Li wrote: > Given MAX_PHYSICAL_FILESIZE is just 1G (2^30), why availbytes has to > be pgoff_t instead of just int? The point of such changes would be to lift this barrier at some point, which is what the other thread I am mentioning upthread is also pointing at. It does not change the fact that this code is currently not portable as written: off_t can be 4 or 8 bytes depending on the environment, and pgoff_t exists to be a stable alternative. This relates as well to the use of long in the tree, all coming down to WIN32. -- Michael
Вложения
On Fri, Dec 19, 2025 at 02:22:02PM +0900, Michael Paquier wrote: > The point of such changes would be to lift this barrier at some point, > which is what the other thread I am mentioning upthread is also > pointing at. It does not change the fact that this code is currently > not portable as written: off_t can be 4 or 8 bytes depending on the > environment, and pgoff_t exists to be a stable alternative. This > relates as well to the use of long in the tree, all coming down to > WIN32. Getting rid of a couple more long assumptions while removing one portability comment from buffile.c is appealing while the change is not invasive, so applied. -- Michael
Вложения
On Fri, Dec 19, 2025 at 1:22 PM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Dec 19, 2025 at 11:00:54AM +0800, Chao Li wrote:
> Given MAX_PHYSICAL_FILESIZE is just 1G (2^30), why availbytes has to
> be pgoff_t instead of just int?
The point of such changes would be to lift this barrier at some point,
which is what the other thread I am mentioning upthread is also
pointing at. It does not change the fact that this code is currently
not portable as written: off_t can be 4 or 8 bytes depending on the
environment, and pgoff_t exists to be a stable alternative. This
relates as well to the use of long in the tree, all coming down to
WIN32.
--
Michael
Sorry, I didn’t explain myself clearly earlier. My main point was to avoid the awkward mixed-type casts here:
```
if ((pgoff_t) bytestowrite > availbytes)
bytestowrite = (int) availbytes;
bytestowrite = (int) availbytes;
```
I agree that changing availbytes to int would not be a good choice. Instead, I tried making bytestowrite a pgoff_t, so that the comparison and assignment can be done without casts, while still keeping the code correct if MAX_PHYSICAL_FILESIZE is lifted in the future.
I’ve attached a small patch along these lines. It compiles without warnings, and "make check" passes on my side. What do you think?
Best regards,
==
Chao Li (Evan)---------------------
HighGo Software Co., Ltd.
Вложения
On Tue, Dec 23, 2025 at 10:59:45AM +0800, Chao Li wrote: > I’ve attached a small patch along these lines. It compiles without > warnings, and "make check" passes on my side. What do you think? I don't think it is right. bytestowrite is not a file offset, and the code has been using an int due to BufFile->nbytes. This leads to a more confusing result. -- Michael
Вложения
On Wed, Dec 24, 2025 at 2:15 PM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Dec 23, 2025 at 10:59:45AM +0800, Chao Li wrote:
> I’ve attached a small patch along these lines. It compiles without
> warnings, and "make check" passes on my side. What do you think?
I don't think it is right. bytestowrite is not a file offset, and the
code has been using an int due to BufFile->nbytes. This leads to a
more confusing result.
--
Michael
Make sense, bytestowrite is not a file offset. So, in the current code, availbytes is not a file offset either, but it is defined as pgoff_t, which has the same confusion, right? Also bytestowrite is casted to pgoff_t, it's the same confusion again.
How about using "ssize_t" for both bytestowrite and availbytes? It's still signed, broader than int, and the odd type casts are eliminated.
In win32_port.h:
```
#ifndef _WIN64
typedef long ssize_t;
#else
typedef __int64 ssize_t;
#endif
typedef long ssize_t;
#else
typedef __int64 ssize_t;
#endif
```
Best regards,
==
Chao Li (Evan)---------------------
HighGo Software Co., Ltd.
Вложения
On Wed, Dec 24, 2025 at 05:01:57PM +0800, Chao Li wrote: > Make sense, bytestowrite is not a file offset. So, in the current code, > availbytes is not a file offset either, but it is defined as pgoff_t, which > has the same confusion, right? Also bytestowrite is casted to pgoff_t, it's > the same confusion again. Yeah, actually this suggestion makes more sense. availbytes is a computation made of a maximal size and an offset, so defining it as an offset from the start is kind of weird. Now I don't think that your suggested set of changes could become more consistent with a few more changes. For example, what about pos and nbytes in BufFile? While ssize_t is more consistent with FileRead() and FileWrite(), this code is written to care about signedness while ssize_t has a stricter range per posix, hence could int64 be a better choice for the whole interface? int64 is already what we use for BufFileSize(), which is due to the limit of MAX_PHYSICAL_FILESIZE of course. -- Michael
Вложения
On Thu, Dec 25, 2025 at 08:42:04AM +0900, Michael Paquier wrote: > Now I don't think that your suggested set of changes could become more > consistent with a few more changes. Cough. "I think that your suggested set of changes could become more consistent with a few more changes." Cough. -- Michael
Вложения
On Thu, Dec 25, 2025 at 7:42 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Dec 24, 2025 at 05:01:57PM +0800, Chao Li wrote:
> Make sense, bytestowrite is not a file offset. So, in the current code,
> availbytes is not a file offset either, but it is defined as pgoff_t, which
> has the same confusion, right? Also bytestowrite is casted to pgoff_t, it's
> the same confusion again.
Yeah, actually this suggestion makes more sense. availbytes is a
computation made of a maximal size and an offset, so defining it as an
offset from the start is kind of weird.
Now I think that your suggested set of changes could become more
consistent with a few more changes. For example, what about pos and
nbytes in BufFile? While ssize_t is more consistent with FileRead()
and FileWrite(), this code is written to care about signedness while
ssize_t has a stricter range per posix, hence could int64 be a better
choice for the whole interface? int64 is already what we use for
BufFileSize(), which is due to the limit of MAX_PHYSICAL_FILESIZE of
course.
--
Michael
Yeah, int64 feels a better fit.
WRT. changing pos and nbytes in BufFile to int64, I agree that also makes sense. BufFile is a private struct in buffile.c, though it's indirectly exposed by:
```
typedef struct BufFile BufFile;
```
None of the callers accesses its fields directly. Changing the two fields from int to int64 will bump the structure size 16 bytes, but likely there is not a case where numerous BufFile objects need to be created, thus runtime impact should be minimal.
In attached v3, I have applied int64 to the 2 struct fields and corresponding local variables. I ran a clean build, no warning was introduced, and "make check" passed.
Best regards,
--
Chao Li (Evan)HighGo Software Co., Ltd.
Вложения
On Thu, Dec 25, 2025 at 10:40:02AM +0800, Chao Li wrote: > In attached v3, I have applied int64 to the 2 struct fields and > corresponding local variables. I ran a clean build, no warning was > introduced, and "make check" passed. The advantage of being able to make the code transparently more pluggable for max large sizes while we already use 8-byte offsets is kind of nice, I guess. "availbytes" is a clarification bonus as it is not an offset per-se. I can see that you have missed on cast spot, so adjusted things a bit, then applied the result. -- Michael
Вложения
> On Dec 26, 2025, at 07:45, Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Dec 25, 2025 at 10:40:02AM +0800, Chao Li wrote: >> In attached v3, I have applied int64 to the 2 struct fields and >> corresponding local variables. I ran a clean build, no warning was >> introduced, and "make check" passed. > > The advantage of being able to make the code transparently more > pluggable for max large sizes while we already use 8-byte offsets is > kind of nice, I guess. "availbytes" is a clarification bonus as it is > not an offset per-se. I can see that you have missed on cast spot, so > adjusted things a bit, then applied the result. > -- > Michael Thanks a lot for pushing. WRT to the original (int) casts, I intentionally removed them, because file->nbytes, newOffset and file->curOffset are allsigned 64 bit integers now. Compiling with -Wextra won’t get a warning, so technically, the type cast is no longer needed. I saw you changed to: ``` - file->pos = (int) (newOffset - file->curOffset); + file->pos = (int64) (newOffset - file->curOffset); - file->pos = (int) (newOffset - file->curOffset); + file->pos = (int64) newOffset - file->curOffset; - file->nbytes = (int) (newOffset - file->curOffset); + file->nbytes = (int64) newOffset - file->curOffset; ``` The latter two places missed (), but that should also work, just a little inconsistent. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/