Re: Let file_fdw access COPY FROM PROGRAM

Поиск
Список
Период
Сортировка
От Corey Huinker
Тема Re: Let file_fdw access COPY FROM PROGRAM
Дата
Msg-id CADkLM=dx9Q5difswB7AMRcmfk3AaDd9bVYRxA1iAOiy3qF5+gA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Let file_fdw access COPY FROM PROGRAM  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Ответы Re: Let file_fdw access COPY FROM PROGRAM  (Craig Ringer <craig.ringer@2ndquadrant.com>)
Re: Let file_fdw access COPY FROM PROGRAM  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Список pgsql-hackers
On Fri, Sep 2, 2016 at 5:07 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

I am not familiar with win32 stuff too, so I don't have much to say about
that.  Maybe someone else can chime in to help with that.

The regressions basically *can't* test this because we'd need a shell command we know works on any architecture.
 

About the patch:

* applies cleanly, compiles fine
* basic functionality seems to work (have not done any extensive tests though)


-     Specifies the file to be read.  Required.  Must be an absolute path
name.
+     Specifies the file to be read.  Must be an absolute path name.
+     Either <literal>filename</literal> or <literal>program</literal> must be
+     specified.  They are mutually exclusive.
+    </para>
+   </listitem>
+  </varlistentry>

The note about filename and program being mutually exclusive could be
placed at the end of list of options.  Or perhaps mention this note as
part of the description of program option, because filename is already
defined by that point.

+   <listitem>
+    <para>
+     Specifies the command to executed.

s/to executed/to be executed/g

Correct. I will fix that when other issues below are resolved.
 

+     Either <literal>program</literal> or <literal>filename</literal> must be
+     specified.  They are mutually exclusive.
     </para>

Oh I see this has already been mentioned in program option description.
Did you intend to specify the same in both filename and program descriptions?

It's been a while since I repackaged Adam's code, but generally I'm in favor of some redundancy if the two mutually exclusive things are documented far enough apart.

 

@@ -97,8 +99,9 @@ typedef struct FileFdwPlanState
 typedef struct FileFdwExecutionState
 {
     char       *filename;       /* file to read */
-    List       *options;        /* merged COPY options, excluding filename */
-    CopyState   cstate;         /* state of reading file */
+    char       *program;        /* program to read output from */
+    List       *options;        /* merged COPY options, excluding
filename and program */
+    CopyState   cstate;         /* state of reading file or program */

Have you considered a Boolean flag is_program instead of char **program
similar to how copy.c does it?  (See a related comment further below)

Considered it yes, but decided against it when I started to write my version. When Adam delivered his version of the patch, I noticed he had made the same design decision. Only one of them will be initialized, and the boolean will byte align to 4 bytes, so it's the same storage allocated either way.

Either we're fine with two variables, or we think file_name is poorly named. I have only a slight preference for the two variables, and defer to the group for a preference.
 

-     * This is because the filename is one of those options, and we don't
want
-     * non-superusers to be able to determine which file gets read.
+     * This is because the filename or program string are two of those
+     * options, and we don't want non-superusers to be able to determine
which
+     * file gets read or what command is run.

I'm no native English speaker, but I wonder if this should read: filename
or program string *is* one of those options ... OR filename *and* program
are two of those options ... ?  Also, "are two of those options" sounds a
bit odd to me because I don't see that used as often as "one of those
whatever".  I guess that's quite a bit of nitpicking about a C comment, ;)

Given that C comments constitute a large portion of our documentation, I fully support making them as clear as possible.

I don't remember if this is Adam's comment or mine. Adam - if you're out there, chime in.

The original paragraph was:

Changing table-level options requires superuser privileges, for security reasons: only a superuser should be able to determine which file is read. In principle non-superusers could be allowed to change the other options, but that's not supported at present.

How does this paragraph sound?:

Changing table-level options requires superuser privileges, for security reasons: only a superuser should be able to determine which file is read, or which program is run. In principle non-superusers could be allowed to change the other options, but that's not supported at present.



@@ -257,9 +264,26 @@ file_fdw_validator(PG_FUNCTION_ARGS)
                 ereport(ERROR,
                         (errcode(ERRCODE_SYNTAX_ERROR),
                          errmsg("conflicting or redundant options")));
+            if (program)
+                ereport(ERROR,
+                        (errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("conflicting or redundant options")));
             filename = defGetString(def);
         }

+        else if (strcmp(def->defname, "program") == 0)
+        {
+            if (filename)
+                ereport(ERROR,
+                        (errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("conflicting or redundant options")));
+            if (program)
+                ereport(ERROR,
+                        (errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("conflicting or redundant options")));
+            program = defGetString(def);
+        }

Why does it check for filename here?  Also the other way around above.

We don't want them to specify both program and filename, nor do we allow 2 programs, or 2 filenames.


@@ -632,11 +671,18 @@ fileBeginForeignScan(ForeignScanState *node, int eflags)
      * Create CopyState from FDW options.  We always acquire all columns, so
      * as to match the expected ScanTupleSlot signature.
      */
-    cstate = BeginCopyFrom(node->ss.ss_currentRelation,
-                           filename,
-                           false,
-                           NIL,
-                           options);
+    if(filename)
+        cstate = BeginCopyFrom(node->ss.ss_currentRelation,
+                               filename,
+                               false,
+                               NIL,
+                               options);
+    else
+        cstate = BeginCopyFrom(node->ss.ss_currentRelation,
+                               program,
+                               true,
+                               NIL,
+                               options)

Like I mentioned above, if there was a is_program Boolean flag instead of
separate filename and program, this would be just:

+    cstate = BeginCopyFrom(node->ss.ss_currentRelation,
+                           filename,
+                           is_program,
+                           NIL,
+                           options);

That would get rid of if-else blocks here and a couple of other places.

It would, pushing the complexity out to the user. Which could be fine, but if we do that then "filename" is a misnomer.
 

diff --git a/contrib/file_fdw/input/file_fdw.source
b/contrib/file_fdw/input/file_fdw.source
index 685561f..eae34a4 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source

You forgot to include expected output diffs.

Having regression tests for this is extremely problematic, because the program invoked would need to be an invokable command on any architecture supported by postgres. I'm pretty sure no such command exists.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Vacuum: allow usage of more than 1GB of work mem
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: Logical Replication WIP