Re: Let file_fdw access COPY FROM PROGRAM

Поиск
Список
Период
Сортировка
От Amit Langote
Тема Re: Let file_fdw access COPY FROM PROGRAM
Дата
Msg-id 6fc03b16-291a-61e2-cf1d-13aa360b4555@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: Let file_fdw access COPY FROM PROGRAM  (Corey Huinker <corey.huinker@gmail.com>)
Ответы Re: Let file_fdw access COPY FROM PROGRAM  (Corey Huinker <corey.huinker@gmail.com>)
Список pgsql-hackers
On 2016/09/07 3:12, Corey Huinker wrote:
> On Fri, Sep 2, 2016 at 5:07 AM, Amit Langote wrote:
>> I am not familiar with win32 stuff too, so I don't have much to say about
>> that.  Maybe someone else can chime in to help with that.
> 
> The regressions basically *can't* test this because we'd need a shell
> command we know works on any architecture.

OK.

>> @@ -97,8 +99,9 @@ typedef struct FileFdwPlanState
>>  typedef struct FileFdwExecutionState
>>  {
>>      char       *filename;       /* file to read */
>> -    List       *options;        /* merged COPY options, excluding
>> filename */
>> -    CopyState   cstate;         /* state of reading file */
>> +    char       *program;        /* program to read output from */
>> +    List       *options;        /* merged COPY options, excluding
>> filename and program */
>> +    CopyState   cstate;         /* state of reading file or program */
>>
>> Have you considered a Boolean flag is_program instead of char **program
>> similar to how copy.c does it?  (See a related comment further below)
> 
> Considered it yes, but decided against it when I started to write my
> version. When Adam delivered his version of the patch, I noticed he had
> made the same design decision. Only one of them will be initialized, and
> the boolean will byte align to 4 bytes, so it's the same storage allocated
> either way.
> 
> Either we're fine with two variables, or we think file_name is poorly
> named. I have only a slight preference for the two variables, and defer to
> the group for a preference.

OK, let's defer it to the committer.

>> -     * This is because the filename is one of those options, and we don't
>> want
>> -     * non-superusers to be able to determine which file gets read.
>> +     * This is because the filename or program string are two of those
>> +     * options, and we don't want non-superusers to be able to determine
>> which
>> +     * file gets read or what command is run.
>>
>> I'm no native English speaker, but I wonder if this should read: filename
>> or program string *is* one of those options ... OR filename *and* program
>> are two of those options ... ?  Also, "are two of those options" sounds a
>> bit odd to me because I don't see that used as often as "one of those
>> whatever".  I guess that's quite a bit of nitpicking about a C comment, ;)
> 
> Given that C comments constitute a large portion of our documentation, I
> fully support making them as clear as possible.
> 
> I don't remember if this is Adam's comment or mine. Adam - if you're out
> there, chime in.
> 
> The original paragraph was:
> 
> Changing table-level options requires superuser privileges, for security
> reasons: only a superuser should be able to determine which file is read.
> In principle non-superusers could be allowed to change the other options,
> but that's not supported at present.
> 
> 
> How does this paragraph sound?:
> 
> Changing table-level options requires superuser privileges, for security
> reasons: only a superuser should be able to determine which file is read,
> or which program is run. In principle non-superusers could be allowed to
> change the other options, but that's not supported at present.

Hmm, just a little modification would make it better for me:

... for security reasons.  For example, only superuser should be able to
determine which file read or which program is run.

Although that could be just me.

>> @@ -257,9 +264,26 @@ file_fdw_validator(PG_FUNCTION_ARGS)
>>                  ereport(ERROR,
>>                          (errcode(ERRCODE_SYNTAX_ERROR),
>>                           errmsg("conflicting or redundant options")));
>> +            if (program)
>> +                ereport(ERROR,
>> +                        (errcode(ERRCODE_SYNTAX_ERROR),
>> +                         errmsg("conflicting or redundant options")));
>>              filename = defGetString(def);
>>          }
>>
>> +        else if (strcmp(def->defname, "program") == 0)
>> +        {
>> +            if (filename)
>> +                ereport(ERROR,
>> +                        (errcode(ERRCODE_SYNTAX_ERROR),
>> +                         errmsg("conflicting or redundant options")));
>> +            if (program)
>> +                ereport(ERROR,
>> +                        (errcode(ERRCODE_SYNTAX_ERROR),
>> +                         errmsg("conflicting or redundant options")));
>> +            program = defGetString(def);
>> +        }
>>
>> Why does it check for filename here?  Also the other way around above.
> 
> We don't want them to specify both program and filename, nor do we allow 2
> programs, or 2 filenames.

Ah, I forgot about the mutually exclusive options part.

>> @@ -632,11 +671,18 @@ fileBeginForeignScan(ForeignScanState *node, int
>> eflags)
>>       * Create CopyState from FDW options.  We always acquire all columns,
>> so
>>       * as to match the expected ScanTupleSlot signature.
>>       */
>> -    cstate = BeginCopyFrom(node->ss.ss_currentRelation,
>> -                           filename,
>> -                           false,
>> -                           NIL,
>> -                           options);
>> +    if(filename)
>> +        cstate = BeginCopyFrom(node->ss.ss_currentRelation,
>> +                               filename,
>> +                               false,
>> +                               NIL,
>> +                               options);
>> +    else
>> +        cstate = BeginCopyFrom(node->ss.ss_currentRelation,
>> +                               program,
>> +                               true,
>> +                               NIL,
>> +                               options)
>>
>> Like I mentioned above, if there was a is_program Boolean flag instead of
>> separate filename and program, this would be just:
>>
>> +    cstate = BeginCopyFrom(node->ss.ss_currentRelation,
>> +                           filename,
>> +                           is_program,
>> +                           NIL,
>> +                           options);
>>
>> That would get rid of if-else blocks here and a couple of other places.
> 
> It would, pushing the complexity out to the user. Which could be fine, but
> if we do that then "filename" is a misnomer.

This is an internal state so I'm not sure how this would be pushing
complexity out to the user.  Am I perhaps misreading what you said?

What a user sees is that there are two separate foreign table options -
filename and program.  That we handle them internally using a string to
identify either file or program and a Boolean flag to show which of the
two is just an internal implementation detail.

COPY does it that way internally and I just saw that psql's \copy does it
the same way too.  In the latter's case, following is the options struct
(src/bin/psql/copy.c):

struct copy_options
{   char       *before_tofrom;  /* COPY string before TO/FROM */   char       *after_tofrom;   /* COPY string after
TO/FROMfilename */   char       *file;           /* NULL = stdin/stdout */   bool        program;        /* is 'file' a
programto popen? */   bool        psql_inout;     /* true = use psql stdin/stdout */   bool        from;           /*
true= FROM, false = TO */
 
};

But as you said above, this could be deferred to the committer.

>> diff --git a/contrib/file_fdw/input/file_fdw.source
>> b/contrib/file_fdw/input/file_fdw.source
>> index 685561f..eae34a4 100644
>> --- a/contrib/file_fdw/input/file_fdw.source
>> +++ b/contrib/file_fdw/input/file_fdw.source
>>
>> You forgot to include expected output diffs.
> 
> Having regression tests for this is extremely problematic, because the
> program invoked would need to be an invokable command on any architecture
> supported by postgres. I'm pretty sure no such command exists.

Craig and Michael elsewhere suggested something about adding TAP tests. If
that helps the situation, maybe you could.

I will mark the CF entry status to "Waiting on author" till you post a new
patch.

Thanks,
Amit





В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: ICU integration
Следующее
От: vinayak
Дата:
Сообщение: Re: Transactions involving multiple postgres foreign servers