Обсуждение: IO wait events for COPY FROM/TO PROGRAM or file
Hi hackers,
Following up on the discussion about wait event coverage for COPY operations [1], here's a tiny patch that adds two new IO wait events:
This enables diagnosing for cases like:
- storage I/O bottlenecks during bulk loads slow exports to files (COPY FROM/TO '/path/to/file')
- pipe buffer congestion in ETL pipelines (COPY FROM/TO PROGRAM)
Following up on the discussion about wait event coverage for COPY operations [1], here's a tiny patch that adds two new IO wait events:
- COPY_DATA_READ: COPY FROM blocking on file/program read
- COPY_DATA_WRITE: COPY TO blocking on file/program write
- storage I/O bottlenecks during bulk loads slow exports to files (COPY FROM/TO '/path/to/file')
- pipe buffer congestion in ETL pipelines (COPY FROM/TO PROGRAM)
Note: as it turned out, COPY FROM/TO STDIN/STDOUT already have coverage via Client/ClientRead and Client/ClientWrite at the protocol layer (thus, pg_dump/pg_restore are already covered). So only file/program cases needed instrumentation.
Nikx
Вложения
On Thu, Jan 08, 2026 at 01:53:17PM -0800, Nikolay Samokhvalov wrote:
> Following up on the discussion about wait event coverage for COPY
> operations [1], here's a tiny patch that adds two new IO wait events:
> - COPY_DATA_READ: COPY FROM blocking on file/program read
> - COPY_DATA_WRITE: COPY TO blocking on file/program write
case COPY_FILE:
+ pgstat_report_wait_start(WAIT_EVENT_COPY_DATA_WRITE);
if (fwrite(fe_msgbuf->data, fe_msgbuf->len, 1,
cstate->copy_file) != 1 ||
ferror(cstate->copy_file))
@@ -486,6 +487,7 @@ CopySendEndOfRow(CopyToState cstate)
(errcode_for_file_access(),
errmsg("could not write to COPY file: %m")));
}
+ pgstat_report_wait_end();
Hmm. Makes sense to me (perhaps you have played with an LLM to find
out fread() and fwrite() patterns that would need an event?), but I
don't think that the part about COPY TO is done right. The wait event
should be set during the fwrite() call only, similarly to the COPY
FROM case, by saving the result returned by fwrite() in a separate
variable.
--
Michael
Вложения
Hi,
Thanks for this patch. I have a few comments.
> Hmm. Makes sense to me (perhaps you have played with an LLM to find
> out fread() and fwrite() patterns that would need an event?), but I
> don't think that the part about COPY TO is done right. The wait event
> should be set during the fwrite() call only, similarly to the COPY
> FROM case, by saving the result returned by fwrite() in a separate
> variable.
I see other cases such as inside copy_file() in which the read() is wrapped
by wait_start and wait_end, but in the write() case, the wait_end() is after
the error handling ( as the attached v1 ).
```
pgstat_report_wait_start(WAIT_EVENT_COPY_FILE_READ);
nbytes = read(srcfd, buffer, COPY_BUF_SIZE);
pgstat_report_wait_end();
.....
..........
pgstat_report_wait_start(WAIT_EVENT_COPY_FILE_WRITE);
if ((int) write(dstfd, buffer, nbytes) != nbytes)
{
/* if write didn't set errno, assume problem is no disk space */
if (errno == 0)
errno = ENOSPC;
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not write to file \"%s\": %m", tofile)));
}
pgstat_report_wait_end();
```
another example is write_relmap_file():
```
/* Write new data to the file. */
pgstat_report_wait_start(WAIT_EVENT_RELATION_MAP_WRITE);
if (write(fd, newmap, sizeof(RelMapFile)) != sizeof(RelMapFile))
{
/* if write didn't set errno, assume problem is no disk space */
if (errno == 0)
errno = ENOSPC;
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not write file \"%s\": %m",
maptempfilename)));
}
pgstat_report_wait_end();
```
I don't see any issue with this approach for fwrite().
Another comment. Wouldn't be better to use "COPY FROM" and "COPY TO" in the
names to make it more obvious they are related to the COPY command?
so instead of:
+COPY_DATA_READ "Waiting for a read from a file or program during COPY FROM."
+COPY_DATA_WRITE "Waiting for a write to a file or program during COPY TO."
this:
COPY_FROM_READ: "Waiting to read data from a file or program during COPY FROM."
COPY_TO_WRITE: "Waiting to write data to a file or program during COPY TO."
--
Sami Imseih
Amazon Web Services (AWS)
On Fri, Jan 9, 2026 at 8:52 AM Sami Imseih <samimseih@gmail.com> wrote:
>
> Hi,
>
> Thanks for this patch. I have a few comments.
>
> > Hmm. Makes sense to me (perhaps you have played with an LLM to find
> > out fread() and fwrite() patterns that would need an event?), but I
> > don't think that the part about COPY TO is done right. The wait event
> > should be set during the fwrite() call only, similarly to the COPY
> > FROM case, by saving the result returned by fwrite() in a separate
> > variable.
>
> I see other cases such as inside copy_file() in which the read() is wrapped
> by wait_start and wait_end, but in the write() case, the wait_end() is after
> the error handling ( as the attached v1 ).
>
> ```
> pgstat_report_wait_start(WAIT_EVENT_COPY_FILE_READ);
> nbytes = read(srcfd, buffer, COPY_BUF_SIZE);
> pgstat_report_wait_end();
> .....
> ..........
> pgstat_report_wait_start(WAIT_EVENT_COPY_FILE_WRITE);
> if ((int) write(dstfd, buffer, nbytes) != nbytes)
> {
> /* if write didn't set errno, assume problem is no disk space */
> if (errno == 0)
> errno = ENOSPC;
> ereport(ERROR,
> (errcode_for_file_access(),
> errmsg("could not write to file \"%s\": %m", tofile)));
> }
> pgstat_report_wait_end();
> ```
> another example is write_relmap_file():
>
> ```
> /* Write new data to the file. */
> pgstat_report_wait_start(WAIT_EVENT_RELATION_MAP_WRITE);
> if (write(fd, newmap, sizeof(RelMapFile)) != sizeof(RelMapFile))
> {
> /* if write didn't set errno, assume problem is no disk space */
> if (errno == 0)
> errno = ENOSPC;
> ereport(ERROR,
> (errcode_for_file_access(),
> errmsg("could not write file \"%s\": %m",
> maptempfilename)));
> }
> pgstat_report_wait_end();
> ```
>
> I don't see any issue with this approach for fwrite().
>
> Another comment. Wouldn't be better to use "COPY FROM" and "COPY TO" in the
> names to make it more obvious they are related to the COPY command?
>
> so instead of:
>
> +COPY_DATA_READ "Waiting for a read from a file or program during COPY FROM."
> +COPY_DATA_WRITE "Waiting for a write to a file or program during COPY TO."
>
> this:
>
> COPY_FROM_READ: "Waiting to read data from a file or program during COPY FROM."
> COPY_TO_WRITE: "Waiting to write data to a file or program during COPY TO."
IMHO this suggestion makes sense to avoid confusion with similar wait
event names like COPY_FILE_READ etc. So using COPY_FROM and COPY_TO
is clearer for the purpose.
--
Regards,
Dilip Kumar
Google
On Thu, Jan 08, 2026 at 09:22:26PM -0600, Sami Imseih wrote: > I don't see any issue with this approach for fwrite(). Please feel free to discard my comment, then :) > Another comment. Wouldn't be better to use "COPY FROM" and "COPY TO" in the > names to make it more obvious they are related to the COPY command? Yeah, we had better do that. That's slightly cleaner for the user if they mix both COPY types at the same time, not requiring a mental mapping that FROM is a read and TO is a write. -- Michael
Вложения
On Sun, Jan 11, 2026 at 02:22 Michael Paquier <michael@paquier.xyz> wrote:
On Thu, Jan 08, 2026 at 09:22:26PM -0600, Sami Imseih wrote:
> I don't see any issue with this approach for fwrite().
Please feel free to discard my comment, then :)
> Another comment. Wouldn't be better to use "COPY FROM" and "COPY TO" in the
> names to make it more obvious they are related to the COPY command?
Yeah, we had better do that. That's slightly cleaner for the user if
they mix both COPY types at the same time, not requiring a mental
mapping that FROM is a read and TO is a write.
v2 attached with the rename per feedback: COPY_DATA_READ/WRITE →
COPY_FROM_READ/COPY_TO_WRITE.
Given how small this patch is, any chance it could still make PG19?
COPY_FROM_READ/COPY_TO_WRITE.
Given how small this patch is, any chance it could still make PG19?
Вложения
On Sat, Jan 31, 2026 at 11:51:39AM -0800, Nikolay Samokhvalov wrote: > v2 attached with the rename per feedback: COPY_DATA_READ/WRITE → > COPY_FROM_READ/COPY_TO_WRITE. > > Given how small this patch is, any chance it could still make PG19? I'll double-check all that, no worries. -- Michael
Вложения
On Sat, Jan 31, 2026 at 11:51:39AM -0800, Nikolay Samokhvalov wrote: > v2 attached with the rename per feedback: COPY_DATA_READ/WRITE → > COPY_FROM_READ/COPY_TO_WRITE. > > Given how small this patch is, any chance it could still make PG19? While double-checking the code, one thing felt incorrect in the description of these two new wait events, as copyto.c and copyfromparse.c can set copy_file under copy_dest=COPY_FILE for three cases, not two contrary to what is said upthread: - PROGRESS_COPY_TYPE_FILE - PROGRESS_COPY_TYPE_PROGRAM - PROGRESS_COPY_TYPE_PIPE, for a non-DestRemote. The COPY code is designed to be pluggable depending on what extension code passes down as option, and not mentioning the pipe case would be incorrect. Added a mention about this part, and called it a day. -- Michael
Вложения
On Mon, Feb 2, 2026 at 7:27 PM Michael Paquier <michael@paquier.xyz> wrote:
On Sat, Jan 31, 2026 at 11:51:39AM -0800, Nikolay Samokhvalov wrote:
> v2 attached with the rename per feedback: COPY_DATA_READ/WRITE →
> COPY_FROM_READ/COPY_TO_WRITE.
>
> Given how small this patch is, any chance it could still make PG19?
While double-checking the code, one thing felt incorrect in the
description of these two new wait events, as copyto.c and
copyfromparse.c can set copy_file under copy_dest=COPY_FILE for three
cases, not two contrary to what is said upthread:
- PROGRESS_COPY_TYPE_FILE
- PROGRESS_COPY_TYPE_PROGRAM
- PROGRESS_COPY_TYPE_PIPE, for a non-DestRemote.
The COPY code is designed to be pluggable depending on what extension
code passes down as option, and not mentioning the pipe case would be
incorrect. Added a mention about this part, and called it a day.
Thank you!