Обсуждение: Let file_fdw access COPY FROM PROGRAM

Поиск
Список
Период
Сортировка

Let file_fdw access COPY FROM PROGRAM

От
Corey Huinker
Дата:
A while back, there was a push to make COPY gzip-aware. That didn't happen, but COPY FROM PROGRAM did, and it scratches the same itch.

I have a similar need, but with file_fdw foreign tables. I have .csv.gz files downloaded to the server, but those CSVs have 100+ columns in them, and in this case I only really care about a half dozen of those columns. I'd like to avoid:
- the overhead of writing the uncompressed file to disk and then immediately re-reading it
- writing unwanted columns to a temp/work table via COPY, and then immediately re-reading them
- multicorn fdw because it ends up making a python string out of all data cells
- a csv parsing tool like csvtool or mlr, because they output another CSV which must be reparsed from scratch

Since file_fdw leverages COPY, it seemed like it would be easy to add the FROM PROGRAM feature to file_fdw. I began asking questions on #postgresql IRC, only to discover that Adam Gomaa ( akgomaa@gmail.com ) had already written such a thing, but hadn't submitted it. Attached is a small rework of his patch, along with documentation.

NOTE: The regression test includes unix commands in the program option. I figured that wouldn't work for win32 systems, so I checked to see what the regression tests do to test COPY FROM PROGRAM...and I couldn't find any. So I guess the test exists as a proof of concept that will get excised before final commit.



Вложения

Re: Let file_fdw access COPY FROM PROGRAM

От
Craig Ringer
Дата:
On 3 June 2016 at 04:48, Corey Huinker <corey.huinker@gmail.com> wrote:
A while back, there was a push to make COPY gzip-aware. That didn't happen, but COPY FROM PROGRAM did, and it scratches the same itch.
 
- writing unwanted columns to a temp/work table via COPY, and then immediately re-reading them

Without wanting to take away from the value of letting file FDW access FROM PROGRAM, I think this really merits a solution that applies to COPY as well. Variants on "how do I COPY just some columns from a CSV" is a real FAQ, and it doesn't seem like it'd be excessively hard to support. We'd just need some way to pass a list of column-ordinals or header-names.
 
Not asking you to do that work, just pointing out that this particular issue applies to COPY its self as well.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Let file_fdw access COPY FROM PROGRAM

От
Corey Huinker
Дата:
On Fri, Jun 3, 2016 at 1:06 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 3 June 2016 at 04:48, Corey Huinker <corey.huinker@gmail.com> wrote:
A while back, there was a push to make COPY gzip-aware. That didn't happen, but COPY FROM PROGRAM did, and it scratches the same itch.
 
- writing unwanted columns to a temp/work table via COPY, and then immediately re-reading them

Without wanting to take away from the value of letting file FDW access FROM PROGRAM, I think this really merits a solution that applies to COPY as well. Variants on "how do I COPY just some columns from a CSV" is a real FAQ, and it doesn't seem like it'd be excessively hard to support. We'd just need some way to pass a list of column-ordinals or header-names.
 
Not asking you to do that work, just pointing out that this particular issue applies to COPY its self as well.

I agree, they are two different but slightly overlapping issues. file_fdw needs a way to handle compressed/filtered files, and COPY needs the ability to skip columns. But right now COPY is all about getting the shape of the input from the shape of the destination table.

I had the bright idea of creating a custom datatype, call it "skip" / "filler" / "nada" / "devnull" or something like that, that would map any and all values to NULL, thus allowing COPY to naively "store" the unwanted values into nothingness. 

That idea falls apart when it hits InputFunctionCall() in fmgr.c, which prevents any text string from coercing into NULL. I didn't think I was up to the task of making InputFunctionCall respect a special case type (void?) or somehow promoting the pseudo type void into a legit column type (unknown side-effects there). I did create a type that coerced all input into the empty string, but the disk usage of that was still pretty significant.

So maybe if there was a subspecies of COPY that returned SETOF RECORD:

SELECT important_column1, important_column2
FROM copy_srf( program := 'zcat filename', format := 'csv', header := true ) AS t(bogus_col1 text, important_column1 integer, important_column2 date, bogus_col2 text, ... );

That would allow COPY to keep it's current efficiency and simplicity, while (hopefully) avoiding the unwanted data from ever hitting disk. It would also be vaguely reminiscent of SQL*Loader.
 

Re: Let file_fdw access COPY FROM PROGRAM

От
Robert Haas
Дата:
On Thu, Jun 2, 2016 at 4:48 PM, Corey Huinker <corey.huinker@gmail.com> wrote:
> A while back, there was a push to make COPY gzip-aware. That didn't happen,
> but COPY FROM PROGRAM did, and it scratches the same itch.
>
> I have a similar need, but with file_fdw foreign tables. I have .csv.gz
> files downloaded to the server, but those CSVs have 100+ columns in them,
> and in this case I only really care about a half dozen of those columns. I'd
> like to avoid:
> - the overhead of writing the uncompressed file to disk and then immediately
> re-reading it
> - writing unwanted columns to a temp/work table via COPY, and then
> immediately re-reading them
> - multicorn fdw because it ends up making a python string out of all data
> cells
> - a csv parsing tool like csvtool or mlr, because they output another CSV
> which must be reparsed from scratch
>
> Since file_fdw leverages COPY, it seemed like it would be easy to add the
> FROM PROGRAM feature to file_fdw. I began asking questions on #postgresql
> IRC, only to discover that Adam Gomaa ( akgomaa@gmail.com ) had already
> written such a thing, but hadn't submitted it. Attached is a small rework of
> his patch, along with documentation.

His failure to submit that here himself raises the question of whether
he is OK with that code being released under the PostgreSQL license.
If this patch is going to be considered, I think we should have a post
from him clarifying that matter.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Let file_fdw access COPY FROM PROGRAM

От
Adam Gomaa
Дата:
I'm fine with the code being released under the PostgreSQL license.

Thanks,
Adam

Re: Let file_fdw access COPY FROM PROGRAM

От
Amit Langote
Дата:
Hi Corey,

Here are some comments and a review of the patch.

On 2016/06/03 5:48, Corey Huinker wrote:
> A while back, there was a push to make COPY gzip-aware. That didn't happen,
> but COPY FROM PROGRAM did, and it scratches the same itch.
> 
> I have a similar need, but with file_fdw foreign tables. I have .csv.gz
> files downloaded to the server, but those CSVs have 100+ columns in them,
> and in this case I only really care about a half dozen of those columns.
> I'd like to avoid:
> - the overhead of writing the uncompressed file to disk and then
> immediately re-reading it
> - writing unwanted columns to a temp/work table via COPY, and then
> immediately re-reading them
> - multicorn fdw because it ends up making a python string out of all data
> cells
> - a csv parsing tool like csvtool or mlr, because they output another CSV
> which must be reparsed from scratch

This feature seems desirable to me too.

> Since file_fdw leverages COPY, it seemed like it would be easy to add the
> FROM PROGRAM feature to file_fdw. I began asking questions on #postgresql
> IRC, only to discover that Adam Gomaa ( akgomaa@gmail.com ) had already
> written such a thing, but hadn't submitted it. Attached is a small rework
> of his patch, along with documentation.

Yeah, the fact that file_fdw leverages COPY makes including this feature a
breeze.  Various concerns such as security, popen() vs execve() were
addressed when COPY FROM PROGRAM/STDIN/STDOUT feature was proposed [1], so
we will not have to go through that again (hopefully).

> NOTE: The regression test includes unix commands in the program option. I
> figured that wouldn't work for win32 systems, so I checked to see what the
> regression tests do to test COPY FROM PROGRAM...and I couldn't find any. So
> I guess the test exists as a proof of concept that will get excised before
> final commit.

I am not familiar with win32 stuff too, so I don't have much to say about
that.  Maybe someone else can chime in to help with that.

About the patch:

* applies cleanly, compiles fine
* basic functionality seems to work (have not done any extensive tests though)


-     Specifies the file to be read.  Required.  Must be an absolute path
name.
+     Specifies the file to be read.  Must be an absolute path name.
+     Either <literal>filename</literal> or <literal>program</literal> must be
+     specified.  They are mutually exclusive.
+    </para>
+   </listitem>
+  </varlistentry>

The note about filename and program being mutually exclusive could be
placed at the end of list of options.  Or perhaps mention this note as
part of the description of program option, because filename is already
defined by that point.

+   <listitem>
+    <para>
+     Specifies the command to executed.

s/to executed/to be executed/g

+     Either <literal>program</literal> or <literal>filename</literal> must be
+     specified.  They are mutually exclusive.    </para>

Oh I see this has already been mentioned in program option description.
Did you intend to specify the same in both filename and program descriptions?

@@ -97,8 +99,9 @@ typedef struct FileFdwPlanStatetypedef struct FileFdwExecutionState{    char       *filename;
/*file to read */
 
-    List       *options;        /* merged COPY options, excluding filename */
-    CopyState   cstate;         /* state of reading file */
+    char       *program;        /* program to read output from */
+    List       *options;        /* merged COPY options, excluding
filename and program */
+    CopyState   cstate;         /* state of reading file or program */

Have you considered a Boolean flag is_program instead of char **program
similar to how copy.c does it?  (See a related comment further below)

-     * This is because the filename is one of those options, and we don't
want
-     * non-superusers to be able to determine which file gets read.
+     * This is because the filename or program string are two of those
+     * options, and we don't want non-superusers to be able to determine
which
+     * file gets read or what command is run.

I'm no native English speaker, but I wonder if this should read: filename
or program string *is* one of those options ... OR filename *and* program
are two of those options ... ?  Also, "are two of those options" sounds a
bit odd to me because I don't see that used as often as "one of those
whatever".  I guess that's quite a bit of nitpicking about a C comment, ;)

@@ -257,9 +264,26 @@ file_fdw_validator(PG_FUNCTION_ARGS)                ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),                        errmsg("conflicting or redundant options")));
 
+            if (program)
+                ereport(ERROR,
+                        (errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("conflicting or redundant options")));            filename = defGetString(def);
}

+        else if (strcmp(def->defname, "program") == 0)
+        {
+            if (filename)
+                ereport(ERROR,
+                        (errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("conflicting or redundant options")));
+            if (program)
+                ereport(ERROR,
+                        (errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("conflicting or redundant options")));
+            program = defGetString(def);
+        }

Why does it check for filename here?  Also the other way around above.

@@ -632,11 +671,18 @@ fileBeginForeignScan(ForeignScanState *node, int eflags)     * Create CopyState from FDW options.
We always acquire all columns, so     * as to match the expected ScanTupleSlot signature.     */
 
-    cstate = BeginCopyFrom(node->ss.ss_currentRelation,
-                           filename,
-                           false,
-                           NIL,
-                           options);
+    if(filename)
+        cstate = BeginCopyFrom(node->ss.ss_currentRelation,
+                               filename,
+                               false,
+                               NIL,
+                               options);
+    else
+        cstate = BeginCopyFrom(node->ss.ss_currentRelation,
+                               program,
+                               true,
+                               NIL,
+                               options)

Like I mentioned above, if there was a is_program Boolean flag instead of
separate filename and program, this would be just:

+    cstate = BeginCopyFrom(node->ss.ss_currentRelation,
+                           filename,
+                           is_program,
+                           NIL,
+                           options);

That would get rid of if-else blocks here and a couple of other places.

diff --git a/contrib/file_fdw/input/file_fdw.source
b/contrib/file_fdw/input/file_fdw.source
index 685561f..eae34a4 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source

You forgot to include expected output diffs.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/002101cd9190%24081c4140%241854c3c0%24%40lab.ntt.co.jp





Re: Let file_fdw access COPY FROM PROGRAM

От
Corey Huinker
Дата:
On Fri, Sep 2, 2016 at 5:07 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

I am not familiar with win32 stuff too, so I don't have much to say about
that.  Maybe someone else can chime in to help with that.

The regressions basically *can't* test this because we'd need a shell command we know works on any architecture.
 

About the patch:

* applies cleanly, compiles fine
* basic functionality seems to work (have not done any extensive tests though)


-     Specifies the file to be read.  Required.  Must be an absolute path
name.
+     Specifies the file to be read.  Must be an absolute path name.
+     Either <literal>filename</literal> or <literal>program</literal> must be
+     specified.  They are mutually exclusive.
+    </para>
+   </listitem>
+  </varlistentry>

The note about filename and program being mutually exclusive could be
placed at the end of list of options.  Or perhaps mention this note as
part of the description of program option, because filename is already
defined by that point.

+   <listitem>
+    <para>
+     Specifies the command to executed.

s/to executed/to be executed/g

Correct. I will fix that when other issues below are resolved.
 

+     Either <literal>program</literal> or <literal>filename</literal> must be
+     specified.  They are mutually exclusive.
     </para>

Oh I see this has already been mentioned in program option description.
Did you intend to specify the same in both filename and program descriptions?

It's been a while since I repackaged Adam's code, but generally I'm in favor of some redundancy if the two mutually exclusive things are documented far enough apart.

 

@@ -97,8 +99,9 @@ typedef struct FileFdwPlanState
 typedef struct FileFdwExecutionState
 {
     char       *filename;       /* file to read */
-    List       *options;        /* merged COPY options, excluding filename */
-    CopyState   cstate;         /* state of reading file */
+    char       *program;        /* program to read output from */
+    List       *options;        /* merged COPY options, excluding
filename and program */
+    CopyState   cstate;         /* state of reading file or program */

Have you considered a Boolean flag is_program instead of char **program
similar to how copy.c does it?  (See a related comment further below)

Considered it yes, but decided against it when I started to write my version. When Adam delivered his version of the patch, I noticed he had made the same design decision. Only one of them will be initialized, and the boolean will byte align to 4 bytes, so it's the same storage allocated either way.

Either we're fine with two variables, or we think file_name is poorly named. I have only a slight preference for the two variables, and defer to the group for a preference.
 

-     * This is because the filename is one of those options, and we don't
want
-     * non-superusers to be able to determine which file gets read.
+     * This is because the filename or program string are two of those
+     * options, and we don't want non-superusers to be able to determine
which
+     * file gets read or what command is run.

I'm no native English speaker, but I wonder if this should read: filename
or program string *is* one of those options ... OR filename *and* program
are two of those options ... ?  Also, "are two of those options" sounds a
bit odd to me because I don't see that used as often as "one of those
whatever".  I guess that's quite a bit of nitpicking about a C comment, ;)

Given that C comments constitute a large portion of our documentation, I fully support making them as clear as possible.

I don't remember if this is Adam's comment or mine. Adam - if you're out there, chime in.

The original paragraph was:

Changing table-level options requires superuser privileges, for security reasons: only a superuser should be able to determine which file is read. In principle non-superusers could be allowed to change the other options, but that's not supported at present.

How does this paragraph sound?:

Changing table-level options requires superuser privileges, for security reasons: only a superuser should be able to determine which file is read, or which program is run. In principle non-superusers could be allowed to change the other options, but that's not supported at present.



@@ -257,9 +264,26 @@ file_fdw_validator(PG_FUNCTION_ARGS)
                 ereport(ERROR,
                         (errcode(ERRCODE_SYNTAX_ERROR),
                          errmsg("conflicting or redundant options")));
+            if (program)
+                ereport(ERROR,
+                        (errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("conflicting or redundant options")));
             filename = defGetString(def);
         }

+        else if (strcmp(def->defname, "program") == 0)
+        {
+            if (filename)
+                ereport(ERROR,
+                        (errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("conflicting or redundant options")));
+            if (program)
+                ereport(ERROR,
+                        (errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("conflicting or redundant options")));
+            program = defGetString(def);
+        }

Why does it check for filename here?  Also the other way around above.

We don't want them to specify both program and filename, nor do we allow 2 programs, or 2 filenames.


@@ -632,11 +671,18 @@ fileBeginForeignScan(ForeignScanState *node, int eflags)
      * Create CopyState from FDW options.  We always acquire all columns, so
      * as to match the expected ScanTupleSlot signature.
      */
-    cstate = BeginCopyFrom(node->ss.ss_currentRelation,
-                           filename,
-                           false,
-                           NIL,
-                           options);
+    if(filename)
+        cstate = BeginCopyFrom(node->ss.ss_currentRelation,
+                               filename,
+                               false,
+                               NIL,
+                               options);
+    else
+        cstate = BeginCopyFrom(node->ss.ss_currentRelation,
+                               program,
+                               true,
+                               NIL,
+                               options)

Like I mentioned above, if there was a is_program Boolean flag instead of
separate filename and program, this would be just:

+    cstate = BeginCopyFrom(node->ss.ss_currentRelation,
+                           filename,
+                           is_program,
+                           NIL,
+                           options);

That would get rid of if-else blocks here and a couple of other places.

It would, pushing the complexity out to the user. Which could be fine, but if we do that then "filename" is a misnomer.
 

diff --git a/contrib/file_fdw/input/file_fdw.source
b/contrib/file_fdw/input/file_fdw.source
index 685561f..eae34a4 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source

You forgot to include expected output diffs.

Having regression tests for this is extremely problematic, because the program invoked would need to be an invokable command on any architecture supported by postgres. I'm pretty sure no such command exists.

Re: Let file_fdw access COPY FROM PROGRAM

От
Craig Ringer
Дата:
<p dir="ltr"><p dir="ltr">On 7 Sep. 2016 02:14, "Corey Huinker" <<a
href="mailto:corey.huinker@gmail.com">corey.huinker@gmail.com</a>>wrote:<br /> ><p dir="ltr">> Having
regressiontests for this is extremely problematic, because the program invoked would need to be an invokable command on
anyarchitecture supported by postgres. I'm pretty sure no such command exists.<p dir="ltr">Your best bet will be using
theTAP framework. There you can use Perl logic.<p dir="ltr">I'm not sure where to put such a test though. It doesn't
reallymake sense in src/test/recovery/ .<br /> 

Re: Let file_fdw access COPY FROM PROGRAM

От
Michael Paquier
Дата:
On Wed, Sep 7, 2016 at 7:53 AM, Craig Ringer
<craig.ringer@2ndquadrant.com> wrote:
> On 7 Sep. 2016 02:14, "Corey Huinker" <corey.huinker@gmail.com> wrote:
>>
>
>> Having regression tests for this is extremely problematic, because the
>> program invoked would need to be an invokable command on any architecture
>> supported by postgres. I'm pretty sure no such command exists.
>
> Your best bet will be using the TAP framework. There you can use Perl logic.
>
> I'm not sure where to put such a test though. It doesn't really make sense
> in src/test/recovery/ .

There is nothing preventing the addition of a TAP test where there are
normal regression tests, so if you want a test for file_fdw you should
add it there, then change its Makefile to have the following target
rules:
check: prove-check

prove-check:       $(prove_check)
-- 
Michael



Re: Let file_fdw access COPY FROM PROGRAM

От
Amit Langote
Дата:
On 2016/09/07 3:12, Corey Huinker wrote:
> On Fri, Sep 2, 2016 at 5:07 AM, Amit Langote wrote:
>> I am not familiar with win32 stuff too, so I don't have much to say about
>> that.  Maybe someone else can chime in to help with that.
> 
> The regressions basically *can't* test this because we'd need a shell
> command we know works on any architecture.

OK.

>> @@ -97,8 +99,9 @@ typedef struct FileFdwPlanState
>>  typedef struct FileFdwExecutionState
>>  {
>>      char       *filename;       /* file to read */
>> -    List       *options;        /* merged COPY options, excluding
>> filename */
>> -    CopyState   cstate;         /* state of reading file */
>> +    char       *program;        /* program to read output from */
>> +    List       *options;        /* merged COPY options, excluding
>> filename and program */
>> +    CopyState   cstate;         /* state of reading file or program */
>>
>> Have you considered a Boolean flag is_program instead of char **program
>> similar to how copy.c does it?  (See a related comment further below)
> 
> Considered it yes, but decided against it when I started to write my
> version. When Adam delivered his version of the patch, I noticed he had
> made the same design decision. Only one of them will be initialized, and
> the boolean will byte align to 4 bytes, so it's the same storage allocated
> either way.
> 
> Either we're fine with two variables, or we think file_name is poorly
> named. I have only a slight preference for the two variables, and defer to
> the group for a preference.

OK, let's defer it to the committer.

>> -     * This is because the filename is one of those options, and we don't
>> want
>> -     * non-superusers to be able to determine which file gets read.
>> +     * This is because the filename or program string are two of those
>> +     * options, and we don't want non-superusers to be able to determine
>> which
>> +     * file gets read or what command is run.
>>
>> I'm no native English speaker, but I wonder if this should read: filename
>> or program string *is* one of those options ... OR filename *and* program
>> are two of those options ... ?  Also, "are two of those options" sounds a
>> bit odd to me because I don't see that used as often as "one of those
>> whatever".  I guess that's quite a bit of nitpicking about a C comment, ;)
> 
> Given that C comments constitute a large portion of our documentation, I
> fully support making them as clear as possible.
> 
> I don't remember if this is Adam's comment or mine. Adam - if you're out
> there, chime in.
> 
> The original paragraph was:
> 
> Changing table-level options requires superuser privileges, for security
> reasons: only a superuser should be able to determine which file is read.
> In principle non-superusers could be allowed to change the other options,
> but that's not supported at present.
> 
> 
> How does this paragraph sound?:
> 
> Changing table-level options requires superuser privileges, for security
> reasons: only a superuser should be able to determine which file is read,
> or which program is run. In principle non-superusers could be allowed to
> change the other options, but that's not supported at present.

Hmm, just a little modification would make it better for me:

... for security reasons.  For example, only superuser should be able to
determine which file read or which program is run.

Although that could be just me.

>> @@ -257,9 +264,26 @@ file_fdw_validator(PG_FUNCTION_ARGS)
>>                  ereport(ERROR,
>>                          (errcode(ERRCODE_SYNTAX_ERROR),
>>                           errmsg("conflicting or redundant options")));
>> +            if (program)
>> +                ereport(ERROR,
>> +                        (errcode(ERRCODE_SYNTAX_ERROR),
>> +                         errmsg("conflicting or redundant options")));
>>              filename = defGetString(def);
>>          }
>>
>> +        else if (strcmp(def->defname, "program") == 0)
>> +        {
>> +            if (filename)
>> +                ereport(ERROR,
>> +                        (errcode(ERRCODE_SYNTAX_ERROR),
>> +                         errmsg("conflicting or redundant options")));
>> +            if (program)
>> +                ereport(ERROR,
>> +                        (errcode(ERRCODE_SYNTAX_ERROR),
>> +                         errmsg("conflicting or redundant options")));
>> +            program = defGetString(def);
>> +        }
>>
>> Why does it check for filename here?  Also the other way around above.
> 
> We don't want them to specify both program and filename, nor do we allow 2
> programs, or 2 filenames.

Ah, I forgot about the mutually exclusive options part.

>> @@ -632,11 +671,18 @@ fileBeginForeignScan(ForeignScanState *node, int
>> eflags)
>>       * Create CopyState from FDW options.  We always acquire all columns,
>> so
>>       * as to match the expected ScanTupleSlot signature.
>>       */
>> -    cstate = BeginCopyFrom(node->ss.ss_currentRelation,
>> -                           filename,
>> -                           false,
>> -                           NIL,
>> -                           options);
>> +    if(filename)
>> +        cstate = BeginCopyFrom(node->ss.ss_currentRelation,
>> +                               filename,
>> +                               false,
>> +                               NIL,
>> +                               options);
>> +    else
>> +        cstate = BeginCopyFrom(node->ss.ss_currentRelation,
>> +                               program,
>> +                               true,
>> +                               NIL,
>> +                               options)
>>
>> Like I mentioned above, if there was a is_program Boolean flag instead of
>> separate filename and program, this would be just:
>>
>> +    cstate = BeginCopyFrom(node->ss.ss_currentRelation,
>> +                           filename,
>> +                           is_program,
>> +                           NIL,
>> +                           options);
>>
>> That would get rid of if-else blocks here and a couple of other places.
> 
> It would, pushing the complexity out to the user. Which could be fine, but
> if we do that then "filename" is a misnomer.

This is an internal state so I'm not sure how this would be pushing
complexity out to the user.  Am I perhaps misreading what you said?

What a user sees is that there are two separate foreign table options -
filename and program.  That we handle them internally using a string to
identify either file or program and a Boolean flag to show which of the
two is just an internal implementation detail.

COPY does it that way internally and I just saw that psql's \copy does it
the same way too.  In the latter's case, following is the options struct
(src/bin/psql/copy.c):

struct copy_options
{   char       *before_tofrom;  /* COPY string before TO/FROM */   char       *after_tofrom;   /* COPY string after
TO/FROMfilename */   char       *file;           /* NULL = stdin/stdout */   bool        program;        /* is 'file' a
programto popen? */   bool        psql_inout;     /* true = use psql stdin/stdout */   bool        from;           /*
true= FROM, false = TO */
 
};

But as you said above, this could be deferred to the committer.

>> diff --git a/contrib/file_fdw/input/file_fdw.source
>> b/contrib/file_fdw/input/file_fdw.source
>> index 685561f..eae34a4 100644
>> --- a/contrib/file_fdw/input/file_fdw.source
>> +++ b/contrib/file_fdw/input/file_fdw.source
>>
>> You forgot to include expected output diffs.
> 
> Having regression tests for this is extremely problematic, because the
> program invoked would need to be an invokable command on any architecture
> supported by postgres. I'm pretty sure no such command exists.

Craig and Michael elsewhere suggested something about adding TAP tests. If
that helps the situation, maybe you could.

I will mark the CF entry status to "Waiting on author" till you post a new
patch.

Thanks,
Amit





Re: Let file_fdw access COPY FROM PROGRAM

От
Corey Huinker
Дата:
<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Tue, Sep 6, 2016 at 6:53 PM, Craig Ringer <span
dir="ltr"><<ahref="mailto:craig.ringer@2ndquadrant.com" target="_blank">craig.ringer@2ndquadrant.com</a>></span>
wrote:<br/><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span
class=""><pdir="ltr"><p dir="ltr">On 7 Sep. 2016 02:14, "Corey Huinker" <<a href="mailto:corey.huinker@gmail.com"
target="_blank">corey.huinker@gmail.com</a>>wrote:<br /> ><p dir="ltr">> Having regression tests for this is
extremelyproblematic, because the program invoked would need to be an invokable command on any architecture supported
bypostgres. I'm pretty sure no such command exists.</span><p dir="ltr">Your best bet will be using the TAP framework.
Thereyou can use Perl logic.<p dir="ltr">I'm not sure where to put such a test though. It doesn't really make sense in
src/test/recovery/.<br /></blockquote></div><br /></div><div class="gmail_extra">And the TAP test would detect the
operatingsystem and know to create an FDW that has the PROGRAM value 'cat test_data.csv' on Unix, 'type test_data.csv'
onwindows, and 'type test_data.csv;1' on VMS? </div></div> 

Re: Let file_fdw access COPY FROM PROGRAM

От
Craig Ringer
Дата:
On 7 September 2016 at 11:21, Corey Huinker <corey.huinker@gmail.com> wrote:
> On Tue, Sep 6, 2016 at 6:53 PM, Craig Ringer <craig.ringer@2ndquadrant.com>

> And the TAP test would detect the operating system and know to create an FDW
> that has the PROGRAM value 'cat test_data.csv' on Unix, 'type test_data.csv'
> on windows, and 'type test_data.csv;1' on VMS?

Right. Or just "perl emit_test_data.pl" that works for all of them,
since TAP is perl so you can safely assume you have Perl.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Remote DBA, Training
&Services
 



Re: Let file_fdw access COPY FROM PROGRAM

От
Corey Huinker
Дата:
On Tue, Sep 6, 2016 at 9:46 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2016/09/07 3:12, Corey Huinker wrote:
> On Fri, Sep 2, 2016 at 5:07 AM, Amit Langote wrote:
>> I am not familiar with win32 stuff too, so I don't have much to say about
>> that.  Maybe someone else can chime in to help with that.
>
> The regressions basically *can't* test this because we'd need a shell
> command we know works on any architecture.

OK.

Well...maybe not, depending on what Craig and other can do to educate me about the TAP tests.
 
>
> Changing table-level options requires superuser privileges, for security
> reasons: only a superuser should be able to determine which file is read,
> or which program is run. In principle non-superusers could be allowed to
> change the other options, but that's not supported at present.

Hmm, just a little modification would make it better for me:

... for security reasons.  For example, only superuser should be able to
determine which file read or which program is run.

Although that could be just me.

In this case "determine" is unclear whether a non-superuser can set the program to be run, or is capable of knowing which program is set to be run by the fdw.

We may want some more opinions on what is the most clear.
 

>> @@ -632,11 +671,18 @@ fileBeginForeignScan(ForeignScanState *node, int
>> eflags)
>>       * Create CopyState from FDW options.  We always acquire all columns,
>> so
>>       * as to match the expected ScanTupleSlot signature.
>>       */
>> -    cstate = BeginCopyFrom(node->ss.ss_currentRelation,
>> -                           filename,
>> -                           false,
>> -                           NIL,
>> -                           options);
>> +    if(filename)
>> +        cstate = BeginCopyFrom(node->ss.ss_currentRelation,
>> +                               filename,
>> +                               false,
>> +                               NIL,
>> +                               options);
>> +    else
>> +        cstate = BeginCopyFrom(node->ss.ss_currentRelation,
>> +                               program,
>> +                               true,
>> +                               NIL,
>> +                               options)
>>
>> Like I mentioned above, if there was a is_program Boolean flag instead of
>> separate filename and program, this would be just:
>>
>> +    cstate = BeginCopyFrom(node->ss.ss_currentRelation,
>> +                           filename,
>> +                           is_program,
>> +                           NIL,
>> +                           options);
>>
>> That would get rid of if-else blocks here and a couple of other places.
>
> It would, pushing the complexity out to the user. Which could be fine, but
> if we do that then "filename" is a misnomer.

This is an internal state so I'm not sure how this would be pushing
complexity out to the user.  Am I perhaps misreading what you said?

Indeed, it is internal state. Maybe we rename the variable file_command or something.
 

What a user sees is that there are two separate foreign table options -
filename and program.  That we handle them internally using a string to
identify either file or program and a Boolean flag to show which of the
two is just an internal implementation detail.

COPY does it that way internally and I just saw that psql's \copy does it
the same way too.  In the latter's case, following is the options struct
(src/bin/psql/copy.c):

struct copy_options
{
    char       *before_tofrom;  /* COPY string before TO/FROM */
    char       *after_tofrom;   /* COPY string after TO/FROM filename */
    char       *file;           /* NULL = stdin/stdout */
    bool        program;        /* is 'file' a program to popen? */
    bool        psql_inout;     /* true = use psql stdin/stdout */
    bool        from;           /* true = FROM, false = TO */
};

But as you said above, this could be deferred to the committer.

Yeah, and that made for zero storage savings: a char pointer which is never assigned a string takes up as much space as a 4-byte-aligned boolean. And the result is that "file" really means program, which I found slightly awkward.


>> diff --git a/contrib/file_fdw/input/file_fdw.source
>> b/contrib/file_fdw/input/file_fdw.source
>> index 685561f..eae34a4 100644
>> --- a/contrib/file_fdw/input/file_fdw.source
>> +++ b/contrib/file_fdw/input/file_fdw.source
>>
>> You forgot to include expected output diffs.
>
> Having regression tests for this is extremely problematic, because the
> program invoked would need to be an invokable command on any architecture
> supported by postgres. I'm pretty sure no such command exists.

Craig and Michael elsewhere suggested something about adding TAP tests. If
that helps the situation, maybe you could.

Yeah, I need to educate myself about those.
 

I will mark the CF entry status to "Waiting on author" till you post a new
patch.


Thanks.
 


Re: Let file_fdw access COPY FROM PROGRAM

От
Corey Huinker
Дата:
On Tue, Sep 6, 2016 at 11:24 PM, Craig Ringer <craig.ringer@2ndquadrant.com> wrote:
On 7 September 2016 at 11:21, Corey Huinker <corey.huinker@gmail.com> wrote:
> On Tue, Sep 6, 2016 at 6:53 PM, Craig Ringer <craig.ringer@2ndquadrant.com>

> And the TAP test would detect the operating system and know to create an FDW
> that has the PROGRAM value 'cat test_data.csv' on Unix, 'type test_data.csv'
> on windows, and 'type test_data.csv;1' on VMS?

Right. Or just "perl emit_test_data.pl" that works for all of them,
since TAP is perl so you can safely assume you have Perl.

Thanks. I was mentally locked in more basic OS commands. Am I right in thinking perl is about the *only* OS command you can be sure is on every architecture? 

The platforms page says we support S/390 but no mention of VM/MVS/CMS. Did we do an OS/400 port yet? ::ducks::

Re: Let file_fdw access COPY FROM PROGRAM

От
Craig Ringer
Дата:
On 7 September 2016 at 11:37, Corey Huinker <corey.huinker@gmail.com> wrote:
> On Tue, Sep 6, 2016 at 11:24 PM, Craig Ringer <craig.ringer@2ndquadrant.com>
> wrote:
>>
>> On 7 September 2016 at 11:21, Corey Huinker <corey.huinker@gmail.com>
>> wrote:
>> > On Tue, Sep 6, 2016 at 6:53 PM, Craig Ringer
>> > <craig.ringer@2ndquadrant.com>
>>
>> > And the TAP test would detect the operating system and know to create an
>> > FDW
>> > that has the PROGRAM value 'cat test_data.csv' on Unix, 'type
>> > test_data.csv'
>> > on windows, and 'type test_data.csv;1' on VMS?
>>
>> Right. Or just "perl emit_test_data.pl" that works for all of them,
>> since TAP is perl so you can safely assume you have Perl.
>
>
> Thanks. I was mentally locked in more basic OS commands. Am I right in
> thinking perl is about the *only* OS command you can be sure is on every
> architecture?

Probably, there's a lot of crazy out there.

TAP tests can be conditionally run based on architecture, but
something like this is probably worth testing as widely as possible.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Remote DBA, Training
&Services
 



Re: Let file_fdw access COPY FROM PROGRAM

От
Amit Langote
Дата:
On 2016/09/07 12:29, Corey Huinker wrote:
> On Tue, Sep 6, 2016 at 9:46 PM, Amit Langote wrote:
>> OK.
> Well...maybe not, depending on what Craig and other can do to educate me
> about the TAP tests.

Sure.

>>> Changing table-level options requires superuser privileges, for security
>>> reasons: only a superuser should be able to determine which file is read,
>>> or which program is run. In principle non-superusers could be allowed to
>>> change the other options, but that's not supported at present.
>>
>> Hmm, just a little modification would make it better for me:
>>
>> ... for security reasons.  For example, only superuser should be able to
>> determine which file read or which program is run.
>>
>> Although that could be just me.
> 
> In this case "determine" is unclear whether a non-superuser can set the
> program to be run, or is capable of knowing which program is set to be run
> by the fdw.

Hmm, it is indeed unclear.

How about:

... for security reasons.  For example, only superuser should be able to
*change* which file is read or which program is run.

I just realized this is not just about a C comment.  There is a line in
documentation as well which needs an update.  Any conclusion here should
be applied there.

> We may want some more opinions on what is the most clear.

Certainly.

>> But as you said above, this could be deferred to the committer.
>>
> 
> Yeah, and that made for zero storage savings: a char pointer which is never
> assigned a string takes up as much space as a 4-byte-aligned boolean. And
> the result is that "file" really means program, which I found slightly
> awkward.

My only intent to push for that approach is to have consistency with other
code implementing a similar feature although it may not be that important.

Thanks,
Amit





Re: Let file_fdw access COPY FROM PROGRAM

От
Corey Huinker
Дата:
<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Tue, Sep 6, 2016 at 11:44 PM, Craig Ringer <span
dir="ltr"><<ahref="mailto:craig.ringer@2ndquadrant.com" target="_blank">craig.ringer@2ndquadrant.com</a>></span>
wrote:<br/><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex"><spanclass="gmail-">On 7 September 2016 at 11:37, Corey Huinker <<a
href="mailto:corey.huinker@gmail.com">corey.huinker@gmail.com</a>>wrote:<br /> > On Tue, Sep 6, 2016 at 11:24 PM,
CraigRinger <<a href="mailto:craig.ringer@2ndquadrant.com">craig.ringer@2ndquadrant.com</a>><br /> > wrote:<br
/>>><br /> >> On 7 September 2016 at 11:21, Corey Huinker <<a
href="mailto:corey.huinker@gmail.com">corey.huinker@gmail.com</a>><br/> >> wrote:<br /> >> > On Tue,
Sep6, 2016 at 6:53 PM, Craig Ringer<br /> >> > <<a
href="mailto:craig.ringer@2ndquadrant.com">craig.ringer@2ndquadrant.com</a>><br/> >><br /> >> > And
theTAP test would detect the operating system and know to create an<br /> >> > FDW<br /> >> > that
hasthe PROGRAM value 'cat test_data.csv' on Unix, 'type<br /> >> > test_data.csv'<br /> >> > on
windows,and 'type test_data.csv;1' on VMS?<br /> >><br /> >> Right. Or just "perl <a
href="http://emit_test_data.pl"rel="noreferrer" target="_blank">emit_test_data.pl</a>" that works for all of them,<br
/>>> since TAP is perl so you can safely assume you have Perl.<br /> ><br /> ><br /> > Thanks. I was
mentallylocked in more basic OS commands. Am I right in<br /> > thinking perl is about the *only* OS command you can
besure is on every<br /> > architecture?<br /><br /></span>Probably, there's a lot of crazy out there.<br /><br />
TAPtests can be conditionally run based on architecture, but<br /> something like this is probably worth testing as
widelyas possible.<br /><div class="gmail-HOEnZb"><div class="gmail-h5"><br /> --<br />  Craig Ringer                 
 <ahref="http://www.2ndQuadrant.com/" rel="noreferrer" target="_blank">http://www.2ndQuadrant.com/</a><br />
 PostgreSQLDevelopment, 24x7 Support, Remote DBA, Training & Services<br /></div></div></blockquote></div><br
/></div><divclass="gmail_extra"><br /></div><div class="gmail_extra">Stylistically, would a separate .pl file for the
emitterbe preferable to something inline like<br /><br /></div><blockquote style="margin:0 0 0
40px;border:none;padding:0px"><divclass="gmail_extra">perl -e 'print "a\tb\tcc\t4\n"; print
"b\tc\tdd\t5\n"'</div></blockquote><divclass="gmail_extra"><br /></div><div class="gmail_extra">?<br /></div><div
class="gmail_extra"><br/></div><div class="gmail_extra">I'm inclined to go inline to cut down on the number of moving
parts,but I can see where perl's readability is a barrier to some, and either way I want to follow established
patterns.</div><divclass="gmail_extra"><br /></div><div class="gmail_extra"><br /></div><div class="gmail_extra">[*]
Forthose who don't perl, the command prints:</div><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div
class="gmail_extra">a      b       cc      4</div><div class="gmail_extra"><div class="gmail_extra">b       c       dd
    5</div></div></blockquote></div> 

Re: Let file_fdw access COPY FROM PROGRAM

От
Craig Ringer
Дата:
<p dir="ltr"><p dir="ltr">On 9 Sep. 2016 03:45, "Corey Huinker" <<a
href="mailto:corey.huinker@gmail.com">corey.huinker@gmail.com</a>>wrote:<br /> ><br /> > <p dir="ltr">>
Stylistically,would a separate .pl file for the emitter be preferable to something inline like<br /> ><br />
>>perl -e 'print "a\tb\tcc\t4\n"; print "b\tc\tdd\t5\n"'<p dir="ltr">I'd be fine with that and a suitable
comment.Just be careful with different platforms' shell escaping rules. 

Re: Let file_fdw access COPY FROM PROGRAM

От
Corey Huinker
Дата:
<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Thu, Sep 8, 2016 at 6:59 PM, Craig Ringer <span
dir="ltr"><<ahref="mailto:craig.ringer@2ndquadrant.com" target="_blank">craig.ringer@2ndquadrant.com</a>></span>
wrote:<br/><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span
class=""><pdir="ltr"><p dir="ltr">On 9 Sep. 2016 03:45, "Corey Huinker" <<a href="mailto:corey.huinker@gmail.com"
target="_blank">corey.huinker@gmail.com</a>>wrote:<br /> ><br /> > <p dir="ltr">> Stylistically, would a
separate.pl file for the emitter be preferable to something inline like<br /> ><br /> >> perl -e 'print
"a\tb\tcc\t4\n";print "b\tc\tdd\t5\n"'</span><p dir="ltr">I'd be fine with that and a suitable comment. Just be careful
withdifferent platforms' shell escaping rules.</blockquote></div>Do perl command switches on windows/VMS use /e instead
of-e?  If so, that'd be a great argument doing just "perl filename".</div><div class="gmail_extra"><br /></div></div> 

Re: Let file_fdw access COPY FROM PROGRAM

От
Corey Huinker
Дата:
V2 of this patch:

Changes:
* rebased to most recent master
* removed non-tap test that assumed the existence of Unix sed program
* added non-tap test that assumes the existence of perl 
* switched from filename/program to filename/is_program to more closely follow patterns in copy.c
* slight wording change in C comments


On Thu, Sep 8, 2016 at 6:59 PM, Craig Ringer <craig.ringer@2ndquadrant.com> wrote:

On 9 Sep. 2016 03:45, "Corey Huinker" <corey.huinker@gmail.com> wrote:
>
>

> Stylistically, would a separate .pl file for the emitter be preferable to something inline like
>
>> perl -e 'print "a\tb\tcc\t4\n"; print "b\tc\tdd\t5\n"'

I'd be fine with that and a suitable comment. Just be careful with different platforms' shell escaping rules.


Вложения

Re: Let file_fdw access COPY FROM PROGRAM

От
Amit Langote
Дата:
On 2016/09/11 8:04, Corey Huinker wrote:
> V2 of this patch:
> 
> Changes:
> * rebased to most recent master
> * removed non-tap test that assumed the existence of Unix sed program
> * added non-tap test that assumes the existence of perl
> * switched from filename/program to filename/is_program to more closely
> follow patterns in copy.c
> * slight wording change in C comments

This version looks mostly good to me.  Except some whitespace noise in
some hunks:

@@ -139,7 +142,9 @@ static bool fileIsForeignScanParallelSafe(PlannerInfo
*root, RelOptInfo *rel, */static bool is_valid_option(const char *option, Oid context);static void fileGetOptions(Oid
foreigntableid,
-               char **filename, List **other_options);
+                           char **filename,
+                           bool *is_program,

Space after "is_program,"

@@ -196,16 +201,17 @@ file_fdw_validator(PG_FUNCTION_ARGS)
    /*     * Only superusers are allowed to set options of a file_fdw foreign
table.
-     * This is because the filename is one of those options, and we don't
want
-     * non-superusers to be able to determine which file gets read.
+     * The reason for this is to prevent non-superusers from changing the

Space after "the"

-        if (stat(filename, &stat_buf) == 0)
+        if ((! is_program) && (stat(filename, &stat_buf) == 0)))

Space between ! and is_program.


-        if (strcmp(def->defname, "filename") == 0)
+        if ((strcmp(def->defname, "filename") == 0) ||
(strcmp(def->defname, "program") == 0))

I think the usual style would be to split the if statement into two lines
as follows to keep within 80 characters per line [1]:

+        if ((strcmp(def->defname, "filename") == 0) ||
+            (strcmp(def->defname, "program") == 0))

And likewise for:

-                   &fdw_private->filename, &fdw_private->options);
+                   &fdw_private->filename, &fdw_private->is_program,
&fdw_private->options);

By the way, doesn't the following paragraph in file-fdw.sgml need an update?
<para> Changing table-level options requires superuser privileges, for security reasons: only a superuser should be
ableto determine which file is read. In principle non-superusers could be allowed to change the other options, but
that'snot supported at present.</para>
 


I would like to mark this now as "Ready for Committer".

Thanks,
Amit

[1] https://www.postgresql.org/docs/devel/static/source-format.html





Re: Let file_fdw access COPY FROM PROGRAM

От
Corey Huinker
Дата:
Thanks for the review!

I agree with all the code cleanups suggested and have made then in the attached patch, to save the committer some time.

Also in this patch, I changed sgml para to
 <para>
  Changing table-level options requires superuser privileges, for security
  reasons: only a superuser should be able to determine which file is read
  or which program is run.  In principle non-superusers could be allowed to
  change the other options, but that's not supported at present.
 </para>

"Determine" is an odd word in this context. I understand it to mean "set/change", but I can see where a less familiar reader would take the meaning to be "has permission to see the value already set". Either way, it now mentions program as an option in addition to filename.


On Mon, Sep 12, 2016 at 1:59 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2016/09/11 8:04, Corey Huinker wrote:
> V2 of this patch:
>
> Changes:
> * rebased to most recent master
> * removed non-tap test that assumed the existence of Unix sed program
> * added non-tap test that assumes the existence of perl
> * switched from filename/program to filename/is_program to more closely
> follow patterns in copy.c
> * slight wording change in C comments

This version looks mostly good to me.  Except some whitespace noise in
some hunks:

@@ -139,7 +142,9 @@ static bool fileIsForeignScanParallelSafe(PlannerInfo
*root, RelOptInfo *rel,
  */
 static bool is_valid_option(const char *option, Oid context);
 static void fileGetOptions(Oid foreigntableid,
-               char **filename, List **other_options);
+                           char **filename,
+                           bool *is_program,

Space after "is_program,"

@@ -196,16 +201,17 @@ file_fdw_validator(PG_FUNCTION_ARGS)

     /*
      * Only superusers are allowed to set options of a file_fdw foreign
table.
-     * This is because the filename is one of those options, and we don't
want
-     * non-superusers to be able to determine which file gets read.
+     * The reason for this is to prevent non-superusers from changing the

Space after "the"

-        if (stat(filename, &stat_buf) == 0)
+        if ((! is_program) && (stat(filename, &stat_buf) == 0)))

Space between ! and is_program.


-        if (strcmp(def->defname, "filename") == 0)
+        if ((strcmp(def->defname, "filename") == 0) ||
(strcmp(def->defname, "program") == 0))

I think the usual style would be to split the if statement into two lines
as follows to keep within 80 characters per line [1]:

+        if ((strcmp(def->defname, "filename") == 0) ||
+            (strcmp(def->defname, "program") == 0))

And likewise for:

-                   &fdw_private->filename, &fdw_private->options);
+                   &fdw_private->filename, &fdw_private->is_program,
&fdw_private->options);

By the way, doesn't the following paragraph in file-fdw.sgml need an update?

 <para>
  Changing table-level options requires superuser privileges, for security
  reasons: only a superuser should be able to determine which file is read.
  In principle non-superusers could be allowed to change the other options,
  but that's not supported at present.
 </para>


I would like to mark this now as "Ready for Committer".

Thanks,
Amit

[1] https://www.postgresql.org/docs/devel/static/source-format.html



Вложения

Re: Let file_fdw access COPY FROM PROGRAM

От
Corey Huinker
Дата:
On Mon, Sep 12, 2016 at 1:59 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
On 2016/09/11 8:04, Corey Huinker wrote:
> V2 of this patch:
>
> Changes:
> * rebased to most recent master
> * removed non-tap test that assumed the existence of Unix sed program
> * added non-tap test that assumes the existence of perl
> * switched from filename/program to filename/is_program to more closely
> follow patterns in copy.c
> * slight wording change in C comments

This version looks mostly good to me.  Except some whitespace noise in
some hunks:

@@ -139,7 +142,9 @@ static bool fileIsForeignScanParallelSafe(PlannerInfo
*root, RelOptInfo *rel,
  */
 static bool is_valid_option(const char *option, Oid context);
 static void fileGetOptions(Oid foreigntableid,
-               char **filename, List **other_options);
+                           char **filename,
+                           bool *is_program,

Space after "is_program,"

@@ -196,16 +201,17 @@ file_fdw_validator(PG_FUNCTION_ARGS)

     /*
      * Only superusers are allowed to set options of a file_fdw foreign
table.
-     * This is because the filename is one of those options, and we don't
want
-     * non-superusers to be able to determine which file gets read.
+     * The reason for this is to prevent non-superusers from changing the

Space after "the"

-        if (stat(filename, &stat_buf) == 0)
+        if ((! is_program) && (stat(filename, &stat_buf) == 0)))

Space between ! and is_program.


-        if (strcmp(def->defname, "filename") == 0)
+        if ((strcmp(def->defname, "filename") == 0) ||
(strcmp(def->defname, "program") == 0))

I think the usual style would be to split the if statement into two lines
as follows to keep within 80 characters per line [1]:

+        if ((strcmp(def->defname, "filename") == 0) ||
+            (strcmp(def->defname, "program") == 0))

And likewise for:

-                   &fdw_private->filename, &fdw_private->options);
+                   &fdw_private->filename, &fdw_private->is_program,
&fdw_private->options);

By the way, doesn't the following paragraph in file-fdw.sgml need an update?

 <para>
  Changing table-level options requires superuser privileges, for security
  reasons: only a superuser should be able to determine which file is read.
  In principle non-superusers could be allowed to change the other options,
  but that's not supported at present.
 </para>


I would like to mark this now as "Ready for Committer".

Thanks,
Amit

[1] https://www.postgresql.org/docs/devel/static/source-format.html



(reposting non-top-posted...sorry)

Thanks for the review!

I agree with all the code cleanups suggested and have made then in the attached patch, to save the committer some time.

Also in this patch, I changed sgml para to
 <para>
  Changing table-level options requires superuser privileges, for security
  reasons: only a superuser should be able to determine which file is read
  or which program is run.  In principle non-superusers could be allowed to
  change the other options, but that's not supported at present.
 </para>

"Determine" is an odd word in this context. I understand it to mean "set/change", but I can see where a less familiar reader would take the meaning to be "has permission to see the value already set". Either way, it now mentions program as an option in addition to filename.


Вложения

Re: Let file_fdw access COPY FROM PROGRAM

От
Amit Langote
Дата:
On 2016/09/13 2:01, Corey Huinker wrote:
> Thanks for the review!
> 
> I agree with all the code cleanups suggested and have made then in the
> attached patch, to save the committer some time.

Thanks.  Have already marked the patch as ready for the committer.

> Also in this patch, I changed sgml para to
>  <para>
>   Changing table-level options requires superuser privileges, for security
>   reasons: only a superuser should be able to determine which file is read
>   or which program is run.  In principle non-superusers could be allowed to
>   change the other options, but that's not supported at present.
>  </para>
> 
> "Determine" is an odd word in this context. I understand it to mean
> "set/change", but I can see where a less familiar reader would take the
> meaning to be "has permission to see the value already set". Either way,
> it now mentions program as an option in addition to filename.

Agreed.

Thanks,
Amit





Re: Let file_fdw access COPY FROM PROGRAM

От
Tom Lane
Дата:
Corey Huinker <corey.huinker@gmail.com> writes:
> [ file_fdw_program_v3.diff ]

Pushed with cosmetic adjustments, mostly more work on the comments and
documentation.

I did not push the proposed test case; it's unportable.  The upthread
suggestion to add a TAP test would have been all right, because
--enable-tap-tests requires Perl, but the basic regression tests must
not.  I'm a bit dubious that it'd be worth the work to create such a
test anyway, when COPY FROM PROGRAM itself hasn't got one.

What *would* be worth some effort is allowing ANALYZE on a file_fdw
table reading from a program.  I concur that that probably can't be
the default behavior, but always falling back to the 10-block default
with no pg_stats stats is a really horrid prospect.

One idea is to invent another table-level FDW option "analyze".
If we could make that default to true for files and false for programs,
it'd preserve the desired default behavior, but it would add a feature
for plain files too: if they're too unstable to be worth analyzing,
you could turn it off.

Another thought is that maybe manual ANALYZE should go through in any
case, and the FDW option would only be needed to control auto-analyze.
Although I'm not sure what to think about scripted cases like
vacuumdb --analyze.  Maybe we'd need two flags, one permitting explicit
ANALYZE and one for autoanalyze, which could have different defaults.

Another thing that felt a little unfinished was the cost estimation
behavior.  Again, it's not clear how to do any better by default,
but is it worth the trouble to provide an FDW option to let the user
set the cost estimate for reading the table?  I'm not sure honestly.
Since there's only one path for the FDW itself, the cost estimate
doesn't matter in simple cases, and I'm not sure how much it matters
even in more complicated ones.  It definitely sucks that we don't
have a rows estimate that has anything to do with reality, but allowing
ANALYZE would be enough to handle that.
        regards, tom lane