Hi devs,
I came across a hang issue when COPY to a FIFO file, because the FIFO is not opened for read on the other end. The backtrace from master branch is like:
#0 0x000000332ccc6c30 in __open_nocancel () from /lib64/libc.so.6
#1 0x000000332cc6b693 in __GI__IO_file_open () from /lib64/libc.so.6
#2 0x000000332cc6b7dc in _IO_new_file_fopen () from /lib64/libc.so.6
#3 0x000000332cc60a04 in __fopen_internal () from /lib64/libc.so.6
#4 0x00000000007d0536 in AllocateFile ()
#5 0x00000000005f3d8e in BeginCopyFrom ()
#6 0x00000000005ef192 in DoCopy ()
#7 0x000000000080bb22 in standard_ProcessUtility ()
#8 0x000000000080b4d6 in ProcessUtility ()
#9 0x000000000080a5cf in PortalRunUtility ()
#10 0x000000000080a78c in PortalRunMulti ()
#11 0x0000000000809dff in PortalRun ()
#12 0x0000000000803f85 in exec_simple_query ()
#13 0x000000000080807f in PostgresMain ()
#14 0x000000000077f639 in BackendRun ()
#15 0x000000000077eceb in BackendStartup ()
#16 0x000000000077b185 in ServerLoop ()
#17 0x000000000077a7ee in PostmasterMain ()
#18 0x00000000006c93de in main ()
Reproduction is simple:
-- mkfifo /tmp/test.dat # bash
copy pg_class to '/tmp/test.dat';
-- try pg_cancel_backend or pg_terminate_backend from other sessions
The problem is that, if we are trapped here, we cannot cancel or terminate this backend process unless we open the FIFO for read.
I am not sure whether this should be categorized as a bug, since it is caused by wrong usage of FIFO indeed, but the backend cannot be terminated anyhow.
I see recv and send call in secure_read/secure_write are implemented as non-blocking style to make them interruptible, is it worthy to turn fopen into non-blocking style as well?
Same thing would happen for file_fdw on an unopened FIFO.
Cheers,
Kenan