Обсуждение: [Patch] Check file type before calling AllocateFile() for files outof pg data directory to avoid potential issues (e.g. hang).

Поиск
Список
Период
Сортировка
Hello, Postgres hackers.

I happened to see a hang issue when running a simple copy query. The root cause and repro way are quite simple.

mkfifo /tmp/a

run sql:
copy (select generate_series(1, 10)) to '/tmp/a';

It hangs at AllocateFile()->fopen() because that file was previously created as a fifo file, and it is not ctrl+c cancellable (on Linux).

#0  0x00007f52df1c45a0 in __open_nocancel () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f52df151f20 in _IO_file_open (is32not64=4, read_write=4, prot=438, posix_mode=<optimized out>, filename=0x7ffe64199a10 "\360\303[\001", fp=0x1649c40) at fileops.c:229
#2  _IO_new_file_fopen (fp=fp@entry=0x1649c40, filename=filename@entry=0x15bc3f0 "/tmp/a", mode=<optimized out>, mode@entry=0xaa0bb7 "w", is32not64=is32not64@entry=1) at fileops.c:339
#3  0x00007f52df1465e4 in __fopen_internal (filename=0x15bc3f0 "/tmp/a", mode=0xaa0bb7 "w", is32=1) at iofopen.c:90
#4  0x00000000007a0e90 in AllocateFile (name=0x15bc3f0 "/tmp/a", mode=mode@entry=0xaa0bb7 "w") at fd.c:2229
#5  0x00000000005c51b4 in BeginCopyTo (pstate=pstate@entry=0x15b95f0, rel=rel@entry=0x0, query=<optimized out>, queryRelId=queryRelId@entry=0, filename=<optimized out>, is_program=<optimized out>, attnamelist=0x0, options=0x0) at copy.c:1919
#6  0x00000000005c8999 in DoCopy (pstate=pstate@entry=0x15b95f0, stmt=stmt@entry=0x1596b60, stmt_location=0, stmt_len=48, processed=processed@entry=0x7ffe64199cd8) at copy.c:1078
#7  0x00000000007d717a in standard_ProcessUtility (pstmt=0x1596ea0, queryString=0x1595dc0 "copy (select generate_series(1, 10)) to '/tmp/a';", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1596f80, completionTag=0x7ffe64199f90 "") at utility.c:551

This is, in theory, not a 100% bug, but it is probably not unusual to see conflicts of files between postgresql sqls and other applications on the same node so I think the fix is needed. I checked all code that calls AllocateFile() and wrote a simple patch to do sanity check (if the file exists it must be a regular file) for those files which are probably out of the postgres data directories which we probably want to ignore. This is actually not a perfect fix since it is not atomic (check and open), but it should fix most of the scenarios. To be perfect, we might want to refactor AllocateFile() to allow atomic check&create using either 'x' in fopen() or O_EXCL in open(), also it seems that we might not want to create temp file for AllocateFile() with fixed filenames. This is beyond of this patch of course.

Thanks.


Вложения
Hi,

On 2019-04-24 12:46:15 +0800, Paul Guo wrote:
> This is, in theory, not a 100% bug, but it is probably not unusual to see
> conflicts of files between postgresql sqls and other applications on the
> same node so I think the fix is needed. I checked all code that calls
> AllocateFile() and wrote a simple patch to do sanity check (if the file
> exists it must be a regular file) for those files which are probably out of
> the postgres data directories which we probably want to ignore. This is
> actually not a perfect fix since it is not atomic (check and open), but it
> should fix most of the scenarios. To be perfect, we might want to refactor
> AllocateFile() to allow atomic check&create using either 'x' in fopen()
> or O_EXCL in open(), also it seems that we might not want to create temp
> file for AllocateFile() with fixed filenames. This is beyond of this patch
> of course.

This seems like a bad idea to me. IMO we want to support using a pipe
etc here. If the admin creates a fifo like this without attaching a
program it seems like it's their fault.

Greetings,

Andres Freund





On Wed, Apr 24, 2019 at 12:49 PM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2019-04-24 12:46:15 +0800, Paul Guo wrote:
> This is, in theory, not a 100% bug, but it is probably not unusual to see
> conflicts of files between postgresql sqls and other applications on the
> same node so I think the fix is needed. I checked all code that calls
> AllocateFile() and wrote a simple patch to do sanity check (if the file
> exists it must be a regular file) for those files which are probably out of
> the postgres data directories which we probably want to ignore. This is
> actually not a perfect fix since it is not atomic (check and open), but it
> should fix most of the scenarios. To be perfect, we might want to refactor
> AllocateFile() to allow atomic check&create using either 'x' in fopen()
> or O_EXCL in open(), also it seems that we might not want to create temp
> file for AllocateFile() with fixed filenames. This is beyond of this patch
> of course.

This seems like a bad idea to me. IMO we want to support using a pipe
etc here. If the admin creates a fifo like this without attaching a
program it seems like it's their fault.

Oh, I never know this application scenario before. So yes, for this, we need to keep the current code logic in copy code.
 

Greetings,

Andres Freund
On 2019-Apr-24, Paul Guo wrote:

> On Wed, Apr 24, 2019 at 12:49 PM Andres Freund <andres@anarazel.de> wrote:

> > This seems like a bad idea to me. IMO we want to support using a pipe
> > etc here. If the admin creates a fifo like this without attaching a
> > program it seems like it's their fault.
> 
> Oh, I never know this application scenario before. So yes, for this, we
> need to keep the current code logic in copy code.

But the pgstat.c patch seems reasonable to me.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2019-Apr-24, Paul Guo wrote:
>> On Wed, Apr 24, 2019 at 12:49 PM Andres Freund <andres@anarazel.de> wrote:
>>> This seems like a bad idea to me. IMO we want to support using a pipe
>>> etc here. If the admin creates a fifo like this without attaching a
>>> program it seems like it's their fault.

>> Oh, I never know this application scenario before. So yes, for this, we
>> need to keep the current code logic in copy code.

> But the pgstat.c patch seems reasonable to me.

Nah, I don't buy that one either.  Nobody has any business creating any
non-Postgres files in the stats directory ... and if somebody does want
to stick a FIFO in there, perhaps for debug purposes, why should we stop
them?

The case with COPY is a bit different, since there it's reasonable to be
worried about collisions with other users' files --- but I agree with
Andres that this change would eliminate too many valid use-cases.

            regards, tom lane



On Wed, Apr 24, 2019 at 10:36:03AM -0400, Tom Lane wrote:
> Nah, I don't buy that one either.  Nobody has any business creating any
> non-Postgres files in the stats directory ... and if somebody does want
> to stick a FIFO in there, perhaps for debug purposes, why should we stop
> them?

I have never used a FIFO in Postgres for debugging purposes, but that
sounds plausible.  I am not sure either the changes proposed in the
patch are a good idea.
--
Michael

Вложения


On Wed, Apr 24, 2019 at 10:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2019-Apr-24, Paul Guo wrote:
>> On Wed, Apr 24, 2019 at 12:49 PM Andres Freund <andres@anarazel.de> wrote:
>>> This seems like a bad idea to me. IMO we want to support using a pipe
>>> etc here. If the admin creates a fifo like this without attaching a
>>> program it seems like it's their fault.

>> Oh, I never know this application scenario before. So yes, for this, we
>> need to keep the current code logic in copy code.

> But the pgstat.c patch seems reasonable to me.

Nah, I don't buy that one either.  Nobody has any business creating any
non-Postgres files in the stats directory ... and if somebody does want
to stick a FIFO in there, perhaps for debug purposes, why should we stop
them?

For the pgstat case, the files for AllocateFile() are actually temp files which
are soon renamed to other file names. Users might not want to set them as fifo files.
For developers 'tail -f' might be sufficient for debugging purpose.

      const char *tmpfile = permanent ? PGSTAT_STAT_PERMANENT_TMPFILE : pgstat_stat_tmpname;
      fpout = AllocateFile(tmpfile, PG_BINARY_W);
      fwrite(fpout, ...);
      rename(tmpfile, statfile);

I'm not sure if those hardcoded temp filenames (not just those in pgstat) are used across postgres reboot.
If no, we should instead call glibc function to create unique temp files and also remove those hardcode temp
filename variables, else we also might want them to be regular files.


The case with COPY is a bit different, since there it's reasonable to be
worried about collisions with other users' files --- but I agree with
Andres that this change would eliminate too many valid use-cases.

                        regards, tom lane
Paul Guo <pguo@pivotal.io> writes:
> On Wed, Apr 24, 2019 at 10:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>>> But the pgstat.c patch seems reasonable to me.

>> Nah, I don't buy that one either.  Nobody has any business creating any
>> non-Postgres files in the stats directory ... and if somebody does want
>> to stick a FIFO in there, perhaps for debug purposes, why should we stop
>> them?

> I'm not sure if those hardcoded temp filenames (not just those in pgstat)
> are used across postgres reboot.
> If no, we should instead call glibc function to create unique temp files
> and also remove those hardcode temp
> filename variables, else we also might want them to be regular files.

I do not see any actual need to change anything here.

Note that the whole business might look quite different by next year or
so, if the shmem-based stats collector patch gets merged.  So I'm hesitant
to throw unnecessary changes into that code right now anyway --- it'd just
break that WIP patch.  But in any case, the stats directory is a PG
private directory, and just like everything else inside $PGDATA, it is
very much "no user-serviceable parts inside".  Anybody sticking a FIFO
(or anything else) in there had better be a developer with some quite
specific debugging purpose in mind.  So I don't see a reason for file
type checks in pgstat, any more than we have them for, say, relation
data files.

            regards, tom lane