Обсуждение: Hang issue when COPY to/from an unopened FIFO

Поиск
Список
Период
Сортировка

Hang issue when COPY to/from an unopened FIFO

От
Kenan Yao
Дата:
Hi devs,

I came across a hang issue when COPY to a FIFO file, because the FIFO is not opened for read on the other end. The backtrace from master branch is like:
#0  0x000000332ccc6c30 in __open_nocancel () from /lib64/libc.so.6
#1  0x000000332cc6b693 in __GI__IO_file_open () from /lib64/libc.so.6
#2  0x000000332cc6b7dc in _IO_new_file_fopen () from /lib64/libc.so.6
#3  0x000000332cc60a04 in __fopen_internal () from /lib64/libc.so.6
#4  0x00000000007d0536 in AllocateFile ()
#5  0x00000000005f3d8e in BeginCopyFrom ()
#6  0x00000000005ef192 in DoCopy ()
#7  0x000000000080bb22 in standard_ProcessUtility ()
#8  0x000000000080b4d6 in ProcessUtility ()
#9  0x000000000080a5cf in PortalRunUtility ()
#10 0x000000000080a78c in PortalRunMulti ()
#11 0x0000000000809dff in PortalRun ()
#12 0x0000000000803f85 in exec_simple_query ()
#13 0x000000000080807f in PostgresMain ()
#14 0x000000000077f639 in BackendRun ()
#15 0x000000000077eceb in BackendStartup ()
#16 0x000000000077b185 in ServerLoop ()
#17 0x000000000077a7ee in PostmasterMain ()
#18 0x00000000006c93de in main ()
Reproduction is simple:
-- mkfifo /tmp/test.dat # bash
copy pg_class to '/tmp/test.dat';
-- try pg_cancel_backend or pg_terminate_backend from other sessions
The problem is that, if we are trapped here, we cannot cancel or terminate this backend process unless we open the FIFO for read.

I am not sure whether this should be categorized as a bug, since it is caused by wrong usage of FIFO indeed, but the backend cannot be terminated anyhow.

I see recv and send call in secure_read/secure_write are implemented as non-blocking style to make them interruptible, is it worthy to turn fopen into non-blocking style as well?

Same thing would happen for file_fdw on an unopened FIFO.

Cheers,
Kenan


Re: Hang issue when COPY to/from an unopened FIFO

От
Tom Lane
Дата:
Kenan Yao <kyao@pivotal.io> writes:
> -- mkfifo /tmp/test.dat # bash
> copy pg_class to '/tmp/test.dat';
> -- try pg_cancel_backend or pg_terminate_backend from other sessions

This does not seem like a supported case to me.  I see few if any reasons
to want to do that rather than doing copy-to-program or copy-to-client.
We're certainly not going to want to add any overhead to the COPY code
paths in order to allow it.
        regards, tom lane



Re: Hang issue when COPY to/from an unopened FIFO

От
Stephen Frost
Дата:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Kenan Yao <kyao@pivotal.io> writes:
> > -- mkfifo /tmp/test.dat # bash
> > copy pg_class to '/tmp/test.dat';
> > -- try pg_cancel_backend or pg_terminate_backend from other sessions
>
> This does not seem like a supported case to me.  I see few if any reasons
> to want to do that rather than doing copy-to-program or copy-to-client.
> We're certainly not going to want to add any overhead to the COPY code
> paths in order to allow it.

The complaint is that there's no way to safely kill a process which has
gotten stuck in a fopen() call.  I sympathize with that point of view as
there are many ways in which a process could get stuck in a fopen() or
similar call and it would be nice to have a way to kill such processes
without bouncing the entire server (though I wonder if this is a way to
end up with a dead backend that sticks around after the postmaster has
quit too, which is also quite bad...).

Thanks!

Stephen

Re: Hang issue when COPY to/from an unopened FIFO

От
Andres Freund
Дата:
On 2016-07-14 10:20:42 -0400, Tom Lane wrote:
> Kenan Yao <kyao@pivotal.io> writes:
> > -- mkfifo /tmp/test.dat # bash
> > copy pg_class to '/tmp/test.dat';
> > -- try pg_cancel_backend or pg_terminate_backend from other sessions
> 
> This does not seem like a supported case to me.  I see few if any reasons
> to want to do that rather than doing copy-to-program or copy-to-client.
> We're certainly not going to want to add any overhead to the COPY code
> paths in order to allow it.

Agreed on that.

Said overhead would be a good reason to get rid of using buffered IO at
some point though - we're doing our own buffering anyway, and the stream
code adds noticeable overhead (doubles the cache footprint
basically). Even worse, it uses locking internally on many
platforms... In that case adding sane EINTR handling seems trivial.

Andres