Обсуждение: Allow COPY's 'text' format to output a header

Поиск

Список

Период

Сортировка

Allow COPY's 'text' format to output a header

От

Simon Muller

Дата:

14 мая 2018 г., 04:18:38

This patch adds the capability to use the HEADER feature with the "text" format of the COPY command. The patch includes the related update to documentation and an additional regression test for this feature.

Currently you can only add a header line (which lists the column names) when exporting with COPY to the CSV format, but I much prefer using the default "text" format. This feature is also currently listed on the to-do list (https://wiki.postgresql.org/wiki/Todo#COPY) where it seems to have been requested some years ago.

Hopefully I've done everything correctly and the patch is acceptable enough to be considered for application.

Simon Muller

Вложения

0001-Allow-COPY-s-text-format-to-output-a-header.patch

Re: Allow COPY's 'text' format to output a header

От

David Steele

Дата:

14 мая 2018 г., 05:01:00

Hi Simon,

On 5/13/18 6:18 PM, Simon Muller wrote:
> This patch adds the capability to use the HEADER feature with the "text"
> format of the COPY command. The patch includes the related update to
> documentation and an additional regression test for this feature.
> 
> Currently you can only add a header line (which lists the column names)
> when exporting with COPY to the CSV format, but I much prefer using the
> default "text" format. This feature is also currently listed on the
> to-do list (https://wiki.postgresql.org/wiki/Todo#COPY) where it seems
> to have been requested some years ago.
> 
> Hopefully I've done everything correctly and the patch is acceptable
> enough to be considered for application.

This patch makes sense to me and looks reasonable.

We're in the middle of a feature freeze that will last most of the
summer, so be sure to enter your patch into the next commitfest so it
can be considered when the freeze is over.

https://commitfest.postgresql.org/18/

Regards,
-- 
-David
david@pgmasters.net

Re: Allow COPY's 'text' format to output a header

От

Michael Paquier

Дата:

14 мая 2018 г., 06:37:07

On Sun, May 13, 2018 at 07:01:00PM -0400, David Steele wrote:
> This patch makes sense to me and looks reasonable.

One "potential" problem is if a relation has a full set of column which
allows the input of text-like data: if the header has been added with
COPY TO, and that the user forgets to add again the header option with
COPY FROM, then an extra row will be generated but there is the same
problem with CSV format :)

One comment I have about the patch is that there is no test for
COPY FROM with an output file which has a header.  In this case if
HEADER is true then the file can be loaded.  If HEADER is wrong, an
error should normally be raised because of the format (well, let's
discard the case of the relation with text-only columns).  So the tests
could be extended a bit even for CSV.

> We're in the middle of a feature freeze that will last most of the
> summer, so be sure to enter your patch into the next commitfest so it
> can be considered when the freeze is over.
>
> https://commitfest.postgresql.org/18/

Yes, you will need to be patient a couple of months here.
--
Michael

Вложения

signature.asc

Re: Allow COPY's 'text' format to output a header

От

Simon Muller

Дата:

14 мая 2018 г., 12:35:34

Okay, I've added this to the next commitfest at https://commitfest.postgresql.org/18/1629/.

Thanks both Michael and David for the feedback so far.

On 14 May 2018 at 02:37, Michael Paquier <michael@paquier.xyz> wrote:

On Sun, May 13, 2018 at 07:01:00PM -0400, David Steele wrote:
> This patch makes sense to me and looks reasonable.

One "potential" problem is if a relation has a full set of column which
allows the input of text-like data: if the header has been added with
COPY TO, and that the user forgets to add again the header option with
COPY FROM, then an extra row will be generated but there is the same
problem with CSV format :)

One comment I have about the patch is that there is no test for
COPY FROM with an output file which has a header. In this case if
HEADER is true then the file can be loaded. If HEADER is wrong, an
error should normally be raised because of the format (well, let's
discard the case of the relation with text-only columns). So the tests
could be extended a bit even for CSV.

> We're in the middle of a feature freeze that will last most of the
> summer, so be sure to enter your patch into the next commitfest so it
> can be considered when the freeze is over.
>
> https://commitfest.postgresql.org/18/

Yes, you will need to be patient a couple of months here.
--
Michael

Re: Allow COPY's 'text' format to output a header

От

Andrew Dunstan

Дата:

14 мая 2018 г., 17:36:06

On 05/14/2018 02:35 AM, Simon Muller wrote:
> Okay, I've added this to the next commitfest at
> https://commitfest.postgresql.org/18/1629/.
>
> Thanks both Michael and David for the feedback so far.
>
>

(Please don't top-post on PostgreSQL lists.)

I'm not necessarily opposed to this, but I'm not certain about the use
case either. The original request seemed to stem from a false impression
that CSV mode can't produce or consume tab-delimited files. But it can,
and in fact it's saner for almost all uses than text format. Postgres'
text format is really intended for Postgres' use. CSV format is more
appropriate for dealing with external programs, whether the delimiter be
a tab or a comma.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Allow COPY's 'text' format to output a header

От

Garick Hamlin

Дата:

15 мая 2018 г., 00:44:59

On Mon, May 14, 2018 at 09:37:07AM +0900, Michael Paquier wrote:
> On Sun, May 13, 2018 at 07:01:00PM -0400, David Steele wrote:
> > This patch makes sense to me and looks reasonable.
> 
> One "potential" problem is if a relation has a full set of column which
> allows the input of text-like data: if the header has been added with
> COPY TO, and that the user forgets to add again the header option with
> COPY FROM, then an extra row will be generated but there is the same
> problem with CSV format :)

Yeah, I wonder if that can be addressed.  

I wonder if there was a way to let COPY FROM detect or ignore headers
as appropriate and rather than cause silently result in headers being
added as data.

Maybe a blank line after the header line could prevent this confusion?

Garick

Re: Allow COPY's 'text' format to output a header

От

"David G. Johnston"

Дата:

15 мая 2018 г., 00:55:45

On Mon, May 14, 2018 at 11:44 AM, Garick Hamlin <ghamlin@isc.upenn.edu> wrote:

I wonder if there was a way to let COPY FROM detect or ignore headers

as appropriate and rather than cause silently result in headers being
added as data.

Not reliably

Maybe a blank line after the header line could prevent this confusion

+1 for allowing HEADER with FORMAT text. It doesn't interfere with COPY and even if I were to agree that CSV format is the better one this seems like an unnecessary area to impose preferences. If TSV with Header meets someone's need providing a minimal (and consistent with expectations) syntax to accomplish that goal seems reasonable, as does the patch.

David J.

Re: Allow COPY's 'text' format to output a header

От

Isaac Morland

Дата:

15 мая 2018 г., 02:08:47

While we're discussing COPY options, what do people think of an option for COPY FROM with header to require that the headers match the target column names? This would help to ensure that the file is actually the right one.

On 14 May 2018 at 14:55, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Mon, May 14, 2018 at 11:44 AM, Garick Hamlin <ghamlin@isc.upenn.edu> wrote:
I wonder if there was a way to let COPY FROM detect or ignore headers
as appropriate and rather than cause silently result in headers being
added as data.

Not reliably

Maybe a blank line after the header line could prevent this confusion

No

+1 for allowing HEADER with FORMAT text. It doesn't interfere with COPY and even if I were to agree that CSV format is the better one this seems like an unnecessary area to impose preferences. If TSV with Header meets someone's need providing a minimal (and consistent with expectations) syntax to accomplish that goal seems reasonable, as does the patch.

David J.

Re: Allow COPY's 'text' format to output a header

От

Michael Paquier

Дата:

15 мая 2018 г., 06:23:31

On Mon, May 14, 2018 at 04:08:47PM -0400, Isaac Morland wrote:
> While we're discussing COPY options, what do people think of an option for
> COPY FROM with header to require that the headers match the target column
> names? This would help to ensure that the file is actually the right one.

I am personally not much into such sanity check logics in COPY FWIW if
we can live without.
--
Michael

Вложения

signature.asc

Re: Allow COPY's 'text' format to output a header

От

"Daniel Verite"

Дата:

15 мая 2018 г., 20:26:09

    Andrew Dunstan wrote:

> I'm not necessarily opposed to this, but I'm not certain about the use
> case either.

+1.
The downside is that it would create the need, when using COPY TO,
to know whether an input file was generated with or without header,
and a hazard on mistakes.
If you say it was and it wasn't, you quietly loose the first row of data.
If you say it wasn't and in fact it was, either there's a
datatype mismatch or you quietly get a spurious row of data.

This complication should be balanced by some advantage.
What can we do with the header?
If you already have the table ready to COPY in, you don't
need that information. The only reason why COPY TO
needs to know about the header is to throw it away.
And if you don't have the table created yet, a header
with just the column names is hardly sufficient to create it,
isn't it?

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: Allow COPY's 'text' format to output a header

От

Isaac Morland

Дата:

15 мая 2018 г., 21:52:54

On 15 May 2018 at 10:26, Daniel Verite <daniel@manitou-mail.org> wrote:

Andrew Dunstan wrote:

> I'm not necessarily opposed to this, but I'm not certain about the use
> case either.

+1.
The downside is that it would create the need, when using COPY TO,
to know whether an input file was generated with or without header,
and a hazard on mistakes.
If you say it was and it wasn't, you quietly loose the first row of data.
If you say it wasn't and in fact it was, either there's a
datatype mismatch or you quietly get a spurious row of data.

Just to be clear, we're talking about my "header match" feature, not the basic idea of allowing a header in text format?

You already need to know whether or not there is a header, no matter what: there is no way to avoid needing to know the format of the data to be imported. And certainly if "header" is an option, one has to know whether or not to set it in any given situation.

The "header match" helps ensure the file is the right one by requiring the header contents to match the field names, rather than just being thrown away.

I don't view it as a way to avoid pre-defining the table. It just increases the chance that the wrong file won't load but will instead trigger an error condition immediately.

Note that this advantage includes what happens if you specify header but the file has no header: as long as you actually specified header match, the error will be caught unless the first row of actual data happens to match the field names, which is almost always highly unlikely and frequently impossible (e.g., a person with firstname "firstname", surname "surname", birthday "birthday" and so on).

One can imagine extensions of the idea: for example, the header could actually be used to identify the columns, so the column order in the file doesn't matter. There could also be an "AS" syntax to allow the target field names to be different from the field names in the header. I have occasionally found myself wanting to ignore certain columns of the file. But these are all significantly more complicated than just looking at the header and requiring it to match the target field names.

If one specifies no header but there actually is a header in the file, then loading will fail in many cases but it depends on what the header in the file looks like. This part is unaffected by my idea.

This complication should be balanced by some advantage.
What can we do with the header?
If you already have the table ready to COPY in, you don't
need that information. The only reason why COPY TO
needs to know about the header is to throw it away.
And if you don't have the table created yet, a header
with just the column names is hardly sufficient to create it,
isn't it?

Re: Allow COPY's 'text' format to output a header

От

Tom Lane

Дата:

15 мая 2018 г., 22:06:12

Isaac Morland <isaac.morland@gmail.com> writes:
> On 15 May 2018 at 10:26, Daniel Verite <daniel@manitou-mail.org> wrote:
>> Andrew Dunstan wrote:
>>> I'm not necessarily opposed to this, but I'm not certain about the use
>>> case either.

>> The downside is that it would create the need, when using COPY TO,
>> to know whether an input file was generated with or without header,
>> and a hazard on mistakes.
>> If you say it was and it wasn't, you quietly loose the first row of data.
>> If you say it wasn't and in fact it was, either there's a
>> datatype mismatch or you quietly get a spurious row of data.

> Just to be clear, we're talking about my "header match" feature, not the
> basic idea of allowing a header in text format?

AFAICS, Daniel's just reacting to the basic idea of a header line.
I agree that by itself that's not worth much.  However, if we added
your proposed option to insist that the column names match during COPY
IN, I think that that could have some value.  It would allow
forestalling one common type of pilot error, ie copying the wrong file
entirely.  (It'd also prevent copying in data that has the wrong column
order, but I think that's a less common scenario.  I might be wrong
about that.)

> One can imagine extensions of the idea: for example, the header could
> actually be used to identify the columns, so the column order in the file
> doesn't matter. There could also be an "AS" syntax to allow the target
> field names to be different from the field names in the header. I have
> occasionally found myself wanting to ignore certain columns of the file.
> But these are all significantly more complicated than just looking at the
> header and requiring it to match the target field names.

Yeah, and every bit of flexibility you add raises the chance of an
undetected error.  COPY isn't intended as a general ETL facility,
so I'd mostly be -1 on adding such things.  But I can see the value
of confirming that you're copying the right file, and a header match
check would go a long way towards doing that.

            regards, tom lane

Re: Allow COPY's 'text' format to output a header

От

"David G. Johnston"

Дата:

15 мая 2018 г., 22:12:41

On Tuesday, May 15, 2018, Tom Lane <tgl@sss.pgh.pa.us> wrote:

AFAICS, Daniel's just reacting to the basic idea of a header line.
I agree that by itself that's not worth much. However, if we added
your proposed option to insist that the column names match during COPY
IN, I think that that could have some value.

I'm fine for adding it without the added matching behavior, though turning the boolean into an enum is appealing.

HEADER { true | false | match }

Though we'd need to accept all variants of Boolean for compatability...

I'm of the opinion that text and csv should be the same excepting their defaults for some of the options.

David J.

Re: Allow COPY's 'text' format to output a header

От

"Daniel Verite"

Дата:

15 мая 2018 г., 23:45:27

    Isaac Morland wrote:

> Just to be clear, we're talking about my "header match" feature, not the
> basic idea of allowing a header in text format?

For my reply it was on merely allowing it, as does the current
patch at https://commitfest.postgresql.org/18/1629

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: Allow COPY's 'text' format to output a header

От

Robert Haas

Дата:

16 мая 2018 г., 21:11:17

On Tue, May 15, 2018 at 12:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> One can imagine extensions of the idea: for example, the header could
>> actually be used to identify the columns, so the column order in the file
>> doesn't matter. There could also be an "AS" syntax to allow the target
>> field names to be different from the field names in the header. I have
>> occasionally found myself wanting to ignore certain columns of the file.
>> But these are all significantly more complicated than just looking at the
>> header and requiring it to match the target field names.
>
> Yeah, and every bit of flexibility you add raises the chance of an
> undetected error.  COPY isn't intended as a general ETL facility,
> so I'd mostly be -1 on adding such things.  But I can see the value
> of confirming that you're copying the right file, and a header match
> check would go a long way towards doing that.

True.

FWIW, I'm +1 on this idea.  I think a header line is a pretty common
need, and if you're exporting a large amount of data, it could be
pretty annoying to have to first run COPY, and then do

(echo blah,blah1,blah2; cat copyoutput.txt)>whatireallywant.txt

There's a lot of value in being able to export from program A
*exactly* what program B wants to import, rather than something that
is close but has to be massaged.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Allow COPY's 'text' format to output a header

От

Simon Muller

Дата:

05 июля 2018 г., 02:44:32

On 14 May 2018 at 08:35, Simon Muller <samullers@gmail.com> wrote:

Okay, I've added this to the next commitfest at https://commitfest.postgresql.org/18/1629/.

Thanks both Michael and David for the feedback so far.

I noticed through the patch tester link at http://commitfest.cputube.org/ that my patch caused a file_fdw test to fail (since I previously tested only with "make check" and not with "make check-world").

This v2 patch should fix that.

Вложения

text_header_v2.patch

Re: Allow COPY's 'text' format to output a header

От

Simon Muller

Дата:

11 июля 2018 г., 00:51:00

On 4 July 2018 at 22:44, Simon Muller <samullers@gmail.com> wrote:

I noticed through the patch tester link at http://commitfest.cputube.org/ that my patch caused a file_fdw test to fail (since I previously tested only with "make check" and not with "make check-world").

This v2 patch should fix that.

This patch just fixes a newline issue introduced in my previous patch.

Вложения

text_header_v3.patch

Re: Allow COPY's 'text' format to output a header

От

Cynthia Shang

Дата:

25 июля 2018 г., 23:24:08

On 4 July 2018 at 22:44, Simon Muller <samullers@gmail.com> wrote:
I noticed through the patch tester link at http://commitfest.cputube.org/ that my patch caused a file_fdw test to fail (since I previously tested only with "make check" and not with "make check-world").

This v2 patch should fix that.

This patch just fixes a newline issue introduced in my previous patch.

I've reviewed this patch and feel this patch addresses the original ask. I tested it manually trying to break it and, as mentioned previously, it's behavior is the same as the CSV copy with regards to it's shortcomings. However, I feel

1) a "copy from" test is needed and

2) the current "copy to" test is (along with a few others) in the wrong file.

With regards to #2, the copy.source tests are for things requiring replacement when running the tests. Given that these copy tests do not, I have moved the current last set of copy tests to the copy2.sql file and have provided an attached patch.

With regards to #1, the patch I have provided can then be used and the following added as the COPY TO/FROM tests (perhaps after line 426 of the attached copy2.sql file). Note that I moved the FROM test before the TO test and omitted the "(format text, header true)" in the FROM test since it is another way the command can be invoked.

copy copytest3 from stdin header;
this is just a line full of junk that would error out if parsed
11 a 1
22 b 2
\.

copy copytest3 to stdout with (format text, header true);

As for the matching check of the header in the discussion of this patch, I feel that is a separate patch that can be added later since it would affect the general functionality of the copy command, not just the ability to have a text header.

Best,

- Cynthia Shang

Вложения

move-copy-tests-v1.patch

Re: Allow COPY's 'text' format to output a header

От

Cynthia Shang

Дата:

25 июля 2018 г., 23:36:54

On Wed, Jul 25, 2018 at 1:24 PM, Cynthia Shang
<cynthia.shang@crunchydata.com> wrote:

> With regards to #2, the copy.source tests are for things requiring
> replacement when running the tests. Given that these copy tests do not, I
> have moved the current last set of copy tests to the copy2.sql file and have
> provided an attached patch.

The patch appears in the RAW and in your email (hopefully) but it
doesn't appear in the thread archive so I am reattaching from a
different email client.

Вложения

move-copy-tests-v1.patch

Re: Allow COPY's 'text' format to output a header

От

Simon Muller

Дата:

26 июля 2018 г., 04:09:43

On 25 July 2018 at 19:24, Cynthia Shang <cynthia.shang@crunchydata.com> wrote:

I've reviewed this patch and feel this patch addresses the original ask. I tested it manually trying to break it and, as mentioned previously, it's behavior is the same as the CSV copy with regards to it's shortcomings. However, I feel
1) a "copy from" test is needed and
2) the current "copy to" test is (along with a few others) in the wrong file.

With regards to #2, the copy.source tests are for things requiring replacement when running the tests. Given that these copy tests do not, I have moved the current last set of copy tests to the copy2.sql file and have provided an attached patch.

Thanks for reviewing the patch.

I agree that moving those previous and these new tests out of the .source files seems to make more sense as they don't make use of the preprocessing/replacement feature.

With regards to #1, the patch I have provided can then be used and the following added as the COPY TO/FROM tests (perhaps after line 426 of the attached copy2.sql file). Note that I moved the FROM test before the TO test and omitted the "(format text, header true)" in the FROM test since it is another way the command can be invoked.

copy copytest3 from stdin header;
this is just a line full of junk that would error out if parsed
11 a 1
22 b 2
\.

copy copytest3 to stdout with (format text, header true);

I've incorporated both your suggestions and included the patch you provided in the attached patch. Hope it's as expected.

As for the matching check of the header in the discussion of this patch, I feel that is a separate patch that can be added later since it would affect the general functionality of the copy command, not just the ability to have a text header.

Best,
- Cynthia Shang

P.S. I did receive the first attached patch, but on my Ubuntu I had to apply it using "git apply --ignore-space-change --ignore-whitespace", probably due to line ending differences.

Simon Muller

Вложения

text_header_v4.patch

Re: Allow COPY's 'text' format to output a header

От

Cynthia Shang

Дата:

26 июля 2018 г., 22:45:09

> On Jul 25, 2018, at 6:09 PM, Simon Muller <samullers@gmail.com> wrote:
>
> I've incorporated both your suggestions and included the patch you provided in the attached patch. Hope it's as
expected.
> --
> Simon Muller
>
> <text_header_v4.patch>

Reviewed and retested. Changing status to Ready for Committer.

Re: Allow COPY's 'text' format to output a header

От

"Daniel Verite"

Дата:

01 августа 2018 г., 20:20:02

  Simon Muller wrote:

> I've incorporated both your suggestions and included the patch you provided
> in the attached patch. Hope it's as expected.

Still unconvinced about the use case, since COPY's text format is only
meant to be consumed by Postgres, and the only way that Postgres will
consume this header is to discard it (at least as of the current
patch). But anyway...

   /* Check header */
-  if (!cstate->csv_mode && cstate->header_line)
+  if (cstate->binary && cstate->header_line)
     ereport(ERROR,
-  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-    errmsg("COPY HEADER available only in CSV mode")));
+     (errcode(ERRCODE_SYNTAX_ERROR),
+      errmsg("cannot specify HEADER in BINARY mode")));

Why should ERRCODE_FEATURE_NOT_SUPPORTED become ERRCODE_SYNTAX_ERROR?

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: Allow COPY's 'text' format to output a header

От

Cynthia Shang

Дата:

01 августа 2018 г., 21:18:04

> On Aug 1, 2018, at 10:20 AM, Daniel Verite <daniel@manitou-mail.org> wrote:
>
>   /* Check header */
> -  if (!cstate->csv_mode && cstate->header_line)
> +  if (cstate->binary && cstate->header_line)
>     ereport(ERROR,
> -  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> -    errmsg("COPY HEADER available only in CSV mode")));
> +     (errcode(ERRCODE_SYNTAX_ERROR),
> +      errmsg("cannot specify HEADER in BINARY mode")));
>
> Why should ERRCODE_FEATURE_NOT_SUPPORTED become ERRCODE_SYNTAX_ERROR?
>

I agree; it should remain ERRCODE_FEATURE_NOT_SUPPORTED and I might also suggest the message read "COPY HEADER not
availablein BINARY mode", although I'm pretty agnostic on the latter. 

Regards,
-Cynthia Shang

Re: Allow COPY's 'text' format to output a header

От

Simon Muller

Дата:

02 августа 2018 г., 00:36:47

On 1 August 2018 at 17:18, Cynthia Shang <cynthia.shang@crunchydata.com> wrote:

> On Aug 1, 2018, at 10:20 AM, Daniel Verite <daniel@manitou-mail.org> wrote:
>
> /* Check header */
> - if (!cstate->csv_mode && cstate->header_line)
> + if (cstate->binary && cstate->header_line)
> ereport(ERROR,
> - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> - errmsg("COPY HEADER available only in CSV mode")));
> + (errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg("cannot specify HEADER in BINARY mode")));
>
> Why should ERRCODE_FEATURE_NOT_SUPPORTED become ERRCODE_SYNTAX_ERROR?
>

I agree; it should remain ERRCODE_FEATURE_NOT_SUPPORTED and I might also suggest the message read "COPY HEADER not available in BINARY mode", although I'm pretty agnostic on the latter.

Regards,
-Cynthia Shang

I changed the error type and message for consistency with other similar errors in that file. Whenever options are combined that are incompatible, it looks like the convention is for a ERRCODE_SYNTAX_ERROR to be thrown.

For instance, in case you both specify a specific DELIMITER but also declare the format as BINARY, then there is this code in that same file:

    if (cstate->binary && cstate->delim)
        ereport(ERROR,
                (errcode(ERRCODE_SYNTAX_ERROR),
                errmsg("cannot specify DELIMITER in BINARY mode")));

HEADER seems very similar to me since, like DELIMITER, it makes sense for the textual formats such as CSV and TEXT, but doesn't make sense with the BINARY format.

ERRCODE_FEATURE_NOT_SUPPORTED previously made sense since the only reason TEXT and HEADER weren't compatible options was because the feature was not yet implemented, but now ERRCODE_SYNTAX_ERROR seems to make sense to me since I can't foresee a use case where BINARY and HEADER would ever be compatible options.

Simon Muller

Re: Allow COPY's 'text' format to output a header

От

"Daniel Verite"

Дата:

02 августа 2018 г., 18:11:44

    Simon Muller wrote:

> I changed the error type and message for consistency with other similar
> errors in that file. Whenever options are combined that are incompatible,
> it looks like the convention is for a ERRCODE_SYNTAX_ERROR to be thrown.

That makes sense, thanks for elaborating, although there are also
a fair number of ERRCODE_FEATURE_NOT_SUPPORTED in copy.c
that are raised on forbidden/nonsensical combination of features,
so the consistency argument could work both ways.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: Allow COPY's 'text' format to output a header

От

Cynthia Shang

Дата:

02 августа 2018 г., 21:07:00

> On Aug 2, 2018, at 8:11 AM, Daniel Verite <daniel@manitou-mail.org> wrote:
>
> That makes sense, thanks for elaborating, although there are also
> a fair number of ERRCODE_FEATURE_NOT_SUPPORTED in copy.c
> that are raised on forbidden/nonsensical combination of features,
> so the consistency argument could work both ways.
>

If there is not a strong reason to change the error code, then I believe we should not. The error is the same as it was
before,just narrower in scope. 

Best,
-Cynthia

Re: Allow COPY's 'text' format to output a header

От

Simon Muller

Дата:

03 августа 2018 г., 01:30:26

On 2 August 2018 at 17:07, Cynthia Shang <cynthia.shang@crunchydata.com> wrote:

> On Aug 2, 2018, at 8:11 AM, Daniel Verite <daniel@manitou-mail.org> wrote:
>
> That makes sense, thanks for elaborating, although there are also
> a fair number of ERRCODE_FEATURE_NOT_SUPPORTED in copy.c
> that are raised on forbidden/nonsensical combination of features,
> so the consistency argument could work both ways.
>

If there is not a strong reason to change the error code, then I believe we should not. The error is the same as it was before, just narrower in scope.

Best,
-Cynthia

Sure, thanks both for the feedback. Attached is a patch with the error kept as ERRCODE_FEATURE_NOT_SUPPORTED.

Simon Muller

Вложения

text_header_v5.patch

Re: Allow COPY's 'text' format to output a header

От

Cynthia Shang

Дата:

03 августа 2018 г., 22:30:39

> On Aug 2, 2018, at 3:30 PM, Simon Muller <samullers@gmail.com> wrote:
>
> Sure, thanks both for the feedback. Attached is a patch with the error kept as ERRCODE_FEATURE_NOT_SUPPORTED.
>

I was able to apply the patch (after resolving a merge conflict which was expected given an update in master). All
looksgood. 

-Cynthia

Re: Allow COPY's 'text' format to output a header

От

Stephen Frost

Дата:

06 августа 2018 г., 20:34:27

Greetings,

* Cynthia Shang (cynthia.shang@crunchydata.com) wrote:
> > On Aug 2, 2018, at 3:30 PM, Simon Muller <samullers@gmail.com> wrote:
> >
> > Sure, thanks both for the feedback. Attached is a patch with the error kept as ERRCODE_FEATURE_NOT_SUPPORTED.
>
> I was able to apply the patch (after resolving a merge conflict which was expected given an update in master). All
looksgood. 

If there's a merge conflict against master, then it'd be good for an
updated patch to be posted.

Thanks!

Stephen

Вложения

signature.asc

Re: Allow COPY's 'text' format to output a header

От

Simon Muller

Дата:

09 августа 2018 г., 00:57:33

On 6 August 2018 at 16:34, Stephen Frost <sfrost@snowman.net> wrote:

Greetings,

* Cynthia Shang (cynthia.shang@crunchydata.com) wrote:
> I was able to apply the patch (after resolving a merge conflict which was expected given an update in master). All looks good.

If there's a merge conflict against master, then it'd be good for an
updated patch to be posted.

Thanks!

Stephen

Attached is an updated patch that should directly apply against current master.

Simon Muller

Вложения

text_header_v6.patch

Re: Allow COPY's 'text' format to output a header

От

Cynthia Shang

Дата:

09 августа 2018 г., 20:37:28

On Aug 8, 2018, at 2:57 PM, Simon Muller <samullers@gmail.com> wrote:

If there's a merge conflict against master, then it'd be good for an
updated patch to be posted.

Thanks!

Stephen

Attached is an updated patch that should directly apply against current master.

--
Simon Muller

<text_header_v6.patch>

This patch looks good. I realized I should have changed the status back while we were discussing all this. It is now (and still is) ready for committer.

Thanks,

-Cynthia

Re: Allow COPY's 'text' format to output a header

От

Michael Paquier

Дата:

17 августа 2018 г., 10:39:11

On Thu, Aug 09, 2018 at 10:37:28AM -0400, Cynthia Shang wrote:
> This patch looks good. I realized I should have changed the status
> back while we were discussing all this. It is now (and still is) ready
> for committer.

I have some comments.

-ERROR:  COPY HEADER available only in CSV mode
+ERROR:  cannot specify HEADER in BINARY mode
This should read "COPY HEADER not available in BINARY mode" perhaps?

+copy copytest3 from stdin csv header;
+copy copytest3 to stdout csv header;
It would be more interesting to first export the data into the file with
a header, truncate the relation, and import it back with again header
specified.  The data of the original should match the new, for both text
and csv format.

CopyStateData defines header_line, which still assumes that only CSV is
supported.

Why are there no additional tests for file_fdw?

The point about the header matching mentioned upthread is quite
interesting as it could make the proposed feature way more useful, and
it has not really been discussed.  As far as I can see this adds more
sanity checks in NextCopyFromRawFields().  I'd like to think that this
should be a completely different option, say CHECK_HEADER, as CSV simply
skips the header in COPY FROM if specified on HEAD.
--
Michael

Вложения

signature.asc

Re: Allow COPY's 'text' format to output a header

От

Michael Paquier

Дата:

02 октября 2018 г., 11:06:54

On Fri, Aug 17, 2018 at 01:39:11PM +0900, Michael Paquier wrote:
> The point about the header matching mentioned upthread is quite
> interesting as it could make the proposed feature way more useful, and
> it has not really been discussed.  As far as I can see this adds more
> sanity checks in NextCopyFromRawFields().  I'd like to think that this
> should be a completely different option, say CHECK_HEADER, as CSV simply
> skips the header in COPY FROM if specified on HEAD.

It has been a couple of weeks since the last review, which has not been
addressed, so I am marking the patch as returned with feedback.
--
Michael

Вложения

signature.asc

[PATCH v1] Allow COPY "test" to output a header and add header matching mode to COPY FROM

От

"Rémi Lapeyre"

Дата:

26 февраля 2020 г., 00:38:32

Hi, here's a new version of the patch with the header matching feature.
I should apply cleanly on master, let me know if anything's wrong.

---
 contrib/file_fdw/input/file_fdw.source  |  7 +-
 contrib/file_fdw/output/file_fdw.source | 13 ++--
 doc/src/sgml/ref/copy.sgml              |  9 ++-
 src/backend/commands/copy.c             | 93 ++++++++++++++++++++++---
 src/test/regress/input/copy.source      | 71 ++++++++++++++-----
 src/test/regress/output/copy.source     | 58 ++++++++++-----
 6 files changed, 202 insertions(+), 49 deletions(-)

Вложения

v1-0001-Allow-COPY-test-to-output-a-header-and-add-header.patch

Re: [PATCH v1] Allow COPY "text" to output a header and add headermatching mode to COPY FROM

От

Rémi Lapeyre

Дата:

02 марта 2020 г., 02:45:05

I created an entry for this patch in the new CommiFest but it seems that it is not finding it. Is there anything that I
needto do?

Re: [PATCH v1] Allow COPY "text" to output a header and add headermatching mode to COPY FROM

От

Surafel Temesgen

Дата:

02 марта 2020 г., 08:39:00

On Mon, Mar 2, 2020 at 2:45 AM Rémi Lapeyre <remi.lapeyre@henki.fr> wrote:

I created an entry for this patch in the new CommiFest but it seems that it is not finding it. Is there anything that I need to do?

Is is added on next open commit fest which is https://commitfest.postgresql.org/28/ now

regards

Surafel

[PATCH v2] Allow COPY "text" to output a header and add header matching mode to COPY FROM

От

"Rémi Lapeyre"

Дата:

09 марта 2020 г., 20:13:38

Here's a rebased version that should apply cleanly on HEAD.

---
 contrib/file_fdw/input/file_fdw.source  |  7 +-
 contrib/file_fdw/output/file_fdw.source | 13 ++--
 doc/src/sgml/ref/copy.sgml              |  9 ++-
 src/backend/commands/copy.c             | 93 ++++++++++++++++++++++---
 src/test/regress/input/copy.source      | 71 ++++++++++++++-----
 src/test/regress/output/copy.source     | 58 ++++++++++-----
 6 files changed, 202 insertions(+), 49 deletions(-)

Вложения

v2-0001-Allow-COPY-text-to-output-a-header-and-add-header.patch

Re: [PATCH v1] Allow COPY "text" to output a header and add header matching mode to COPY FROM

От

Daniel Gustafsson

Дата:

01 июля 2020 г., 12:04:21

> On 2 Mar 2020, at 00:45, Rémi Lapeyre <remi.lapeyre@henki.fr> wrote:

> I created an entry for this patch in the new CommiFest but it seems that it is not finding it. Is there anything that
Ineed to do? 

This patch no longer applies cleanly on HEAD, due to changes in the regress
tests.  Please submit a rebased version, I've marked this entry as Waiting on
Author for now.

cheers ./daniel

[PATCH v2] Allow COPY "text" to output a header and add header matching mode to COPY FROM

От

Rémi Lapeyre

Дата:

08 июля 2020 г., 14:45:01

Hi, here's a new version of the patch that should apply cleanly. I'll monitor
the status on http://cfbot.cputube.org/

Rémi

---
 contrib/file_fdw/input/file_fdw.source  |  7 +-
 contrib/file_fdw/output/file_fdw.source | 13 ++--
 doc/src/sgml/ref/copy.sgml              |  9 ++-
 src/backend/commands/copy.c             | 93 ++++++++++++++++++++++---
 src/test/regress/input/copy.source      | 71 ++++++++++++++-----
 src/test/regress/output/copy.source     | 58 ++++++++++-----
 6 files changed, 202 insertions(+), 49 deletions(-)

Вложения

v2-0001-Allow-COPY-text-to-output-a-header-and-add-header.patch

Re: [PATCH v2] Allow COPY "text" to output a header and add header matching mode to COPY FROM

От

Daniel Gustafsson

Дата:

08 июля 2020 г., 16:21:48

> On 8 Jul 2020, at 13:45, Rémi Lapeyre <remi.lapeyre@lenstra.fr> wrote:
>
> Hi, here's a new version of the patch that should apply cleanly. I'll monitor
> the status on http://cfbot.cputube.org/

Please reply to the old thread about this, as that's the one connected to the
Commitfest entry and thats where all the discussion has happened.  While
additional threads can be attached to a CF entry, it's for when multiple
discussions are relevant to a patch, a single discussion should not be broken
into multiple threads.

cheers ./daniel

Re: [PATCH v2] Allow COPY "text" to output a header and add header matching mode to COPY FROM

От

Rémi Lapeyre

Дата:

08 июля 2020 г., 16:51:28

>
> Please reply to the old thread about this, as that's the one connected to the
> Commitfest entry and thats where all the discussion has happened.  While
> additional threads can be attached to a CF entry, it's for when multiple
> discussions are relevant to a patch, a single discussion should not be broken
> into multiple threads.

Sorry about this, I thought setting the In-Reply-To like so:

git send-email -v2 --to=pgsql-hackers@postgresql.org --in-reply-to=4E31E7AA-BFC6-47ED-90E1-3838E4D1F4FF@yesql.se HEAD^

There is some nice informations about how to write a good commit but I could not find exactly how to send it so I
probablydid something wrong. 

>
> cheers ./daniel

Re: [PATCH v2] Allow COPY "text" to output a header and add header matching mode to COPY FROM

От

Peter Eisentraut

Дата:

10 июля 2020 г., 14:34:16

On 2020-07-08 13:45, Rémi Lapeyre wrote:
> Hi, here's a new version of the patch that should apply cleanly. I'll monitor
> the status on http://cfbot.cputube.org/

It's hard to find an explanation what this patch actually does.  I don't 
want to have to go through threads dating back 4 months to determine 
what was discussed and what was actually implemented.  Since you're 
already using git format-patch, just add something to the commit message.

It appears that these are really two separate features, so perhaps they 
should be two patches.

Also, the new header matching mode could probably use more than one line 
of documentation.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Add header support to text format and matching feature

От

Rémi Lapeyre

Дата:

17 июля 2020 г., 17:54:08


> It's hard to find an explanation what this patch actually does.  I don't
> want to have to go through threads dating back 4 months to determine
> what was discussed and what was actually implemented.  Since you're
> already using git format-patch, just add something to the commit message.
>
>
> It appears that these are really two separate features, so perhaps they
> should be two patches.

Thanks for the feedback, I've split cleanly the two patches, simplified the
tests and tried to explain the changes in the commit message.

> Also, the new header matching mode could probably use more than one line
> of documentation.

I've improved the documentation, let's me know if it's better.

It seems like cfbot is not happy with the way I'm sending my patches. The wiki
has some good advices on how to write a patch but I couldn't find anything on
how to send it. I've used

   git send-email -v3 --compose --to=... --in-reply-to=... HEAD^^

here but I'm not sure if it's correct. I will see if it works and will try to fix
it if it's not but since it runs once a day it may take some time.

[PATCH v3 1/2] Add header support to "COPY TO" text format

От

Rémi Lapeyre

Дата:

17 июля 2020 г., 17:54:09

CSV format supports the HEADER option to output a header in the output,
it is convenient when other programs need to consume the output. This
patch adds the same option to the default text format.

Discussion:
https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com
---
 contrib/file_fdw/input/file_fdw.source  |  1 -
 contrib/file_fdw/output/file_fdw.source |  4 +---
 doc/src/sgml/ref/copy.sgml              |  3 ++-
 src/backend/commands/copy.c             | 11 +++++++----
 src/test/regress/input/copy.source      | 12 ++++++++++++
 src/test/regress/output/copy.source     |  8 ++++++++
 6 files changed, 30 insertions(+), 9 deletions(-)

Вложения

v3-0001-Add-header-support-to-COPY-TO-text-format.patch

[PATCH v3 2/2] Add header matching mode to "COPY FROM"

От

Rémi Lapeyre

Дата:

17 июля 2020 г., 17:54:10

COPY FROM supports the HEADER option to silently discard the header from
a CSV or text file. It is possible to load by mistake a file that
matches the expected format, for example if two text columns have been
swapped, resulting in garbage in the database.

This option adds the possibility to actually check the header to make
sure it matches what is expected and exit immediatly if it does not.

Discussion:
https://www.postgresql.org/message-id/flat/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com
---
 contrib/file_fdw/input/file_fdw.source  |  6 ++
 contrib/file_fdw/output/file_fdw.source |  9 ++-
 doc/src/sgml/ref/copy.sgml              |  8 ++-
 src/backend/commands/copy.c             | 84 +++++++++++++++++++++++--
 src/test/regress/input/copy.source      | 25 ++++++++
 src/test/regress/output/copy.source     | 17 +++++
 6 files changed, 140 insertions(+), 9 deletions(-)

Вложения

v3-0002-Add-header-matching-mode-to-COPY-FROM.patch

Re: Add header support to text format and matching feature

От

Magnus Hagander

Дата:

17 июля 2020 г., 18:27:31

On Fri, Jul 17, 2020 at 5:11 PM Rémi Lapeyre <remi.lapeyre@lenstra.fr> wrote:

> It's hard to find an explanation what this patch actually does. I don't
> want to have to go through threads dating back 4 months to determine
> what was discussed and what was actually implemented. Since you're
> already using git format-patch, just add something to the commit message.
>
>
> It appears that these are really two separate features, so perhaps they
> should be two patches.

Thanks for the feedback, I've split cleanly the two patches, simplified the
tests and tried to explain the changes in the commit message.

> Also, the new header matching mode could probably use more than one line
> of documentation.

I've improved the documentation, let's me know if it's better.

It seems like cfbot is not happy with the way I'm sending my patches. The wiki
has some good advices on how to write a patch but I couldn't find anything on
how to send it. I've used

git send-email -v3 --compose --to=... --in-reply-to=... HEAD^^

here but I'm not sure if it's correct. I will see if it works and will try to fix
it if it's not but since it runs once a day it may take some time.

If you have two patches that depend on each other, you should send them as two attachment to the same email. You now sent them as two separate emails, and cfbot will then pick up the latest one of them which is only patch 0002 (at least I'm fairly sure that's how it works).

I don't know how to do that with git-send-email, but you can certainly do it easy with git-format-patch and just attach them using your regular MUA.

(and while the cfbot and the archives have no problems dealing with the change in subject, it does break threading in some other MUAs, so I would recommend not doing that and sticking to the existing subject of the thread)

Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Re: Add header support to text format and matching feature

От

Rémi Lapeyre

Дата:

17 июля 2020 г., 19:48:21

>
> I don't know how to do that with git-send-email, but you can certainly do it easy with git-format-patch and just
attachthem using your regular MUA. 
>
> (and while the cfbot and the archives have no problems dealing with the change in subject, it does break threading in
someother MUAs, so I would recommend not doing that and sticking to the existing subject of the thread) 
>

Thanks, here are both patches attached so cfbot can read them.

Вложения

Re: Add header support to text format and matching feature

От

vignesh C

Дата:

18 июля 2020 г., 14:23:19

On Fri, Jul 17, 2020 at 10:18 PM Rémi Lapeyre <remi.lapeyre@lenstra.fr> wrote:
>
> >
> > I don't know how to do that with git-send-email, but you can certainly do it easy with git-format-patch and just
attachthem using your regular MUA. 
> >
> > (and while the cfbot and the archives have no problems dealing with the change in subject, it does break threading
insome other MUAs, so I would recommend not doing that and sticking to the existing subject of the thread) 
> >
>
> Thanks, here are both patches attached so cfbot can read them.

Few comments:
Few tests are failing because of hardcoded path:
+-- test header matching
+CREATE FOREIGN TABLE header_match ("1" int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename
'/Users/remi/src/postgresql/contrib/file_fdw/data/list1.csv',
delimiter ',', header 'match');
+CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server
+OPTIONS (format 'csv', filename
'/Users/remi/src/postgresql/contrib/file_fdw/data/list1.csv',
delimiter ',', header 'match');  -- ERROR
Generally path is not specified like this. file_fdw test of make
check-world is failing because of this.

There is one warning present in the changes:
copy.c: In function ‘ProcessCopyOptions’:
copy.c:1208:5: warning: ISO C90 forbids mixed declarations and code
[-Wdeclaration-after-statement]
     char    *sval = defGetString(defel);

There is space before tab in indent in the below code, check for git
diff --check.
+                       if (fldct < list_length(cstate->attnumlist))
+                               ereport(ERROR,
+
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                                        errmsg("missing header")));

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: Add header support to text format and matching feature

От

Rémi Lapeyre

Дата:

18 июля 2020 г., 18:25:55

Thanks for the feedback,

> There is one warning present in the changes:
> copy.c: In function ‘ProcessCopyOptions’:
> copy.c:1208:5: warning: ISO C90 forbids mixed declarations and code
> [-Wdeclaration-after-statement]
>     char    *sval = defGetString(defel);
>

Weirdly this is not caught by clang, even with -Wdeclaration-after-statement.

Here’s a new version that fix all the issues.

Rémi

Вложения

Re: Add header support to text format and matching feature

От

"Daniel Verite"

Дата:

25 июля 2020 г., 16:34:22

    Rémi Lapeyre wrote:

> Here’s a new version that fix all the issues.

Here's a review of v4.

The patch improves COPY in two ways:

- COPY TO and COPY FROM now accept "HEADER ON" for the TEXT format
(previously it was only for CSV)

- COPY FROM also accepts "HEADER match" to tell that there's a header
and that its contents must match the columns of the destination table.
This works for both the CSV and TEXT formats. The syntax for the
columns is the same as for the data and the match is case-sensitive.

The first improvement when submitted alone (in 2018 by Simon Muller)
has been judged not useful enough or even hazardous without any
"match" feature. It was returned with feedback in 2018-10 and
resubmitted by Rémi in 2020-02 with the match feature.

The patches  apply cleanly, "make check" and "make check-world" pass.

In my tests it works fine except for one crash that I can reproduce
on a fresh build and default configuration with:

$ cat >file.txt
i
1

$ psql
postgres=# create table x(i int);
CREATE TABLE
postgres=# \copy x(i) from file.txt with (header match)
COPY 1
postgres=# \copy x(i) from file.txt with (header match)
COPY 1
postgres=# \copy x(i) from file.txt with (header match)
COPY 1
postgres=# \copy x(i) from file.txt with (header match)
COPY 1
postgres=# \copy x(i) from file.txt with (header match)
COPY 1
postgres=# \copy x(i) from file.txt with (header match)
PANIC:    ERRORDATA_STACK_SIZE exceeded
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

I suspect the reason is the way that PG_TRY/PG_CATCH is used, see below.

Code comments:


+/*
+ * Represents whether the header must be absent, present or present and
match.
+ */
+typedef enum CopyHeader
+{
+    COPY_HEADER_ABSENT,
+    COPY_HEADER_PRESENT,
+    COPY_HEADER_MATCH
+} CopyHeader;
+
 /*
  * This struct contains all the state variables used throughout a COPY
  * operation. For simplicity, we use the same struct for all variants of
COPY,
@@ -136,7 +146,7 @@ typedef struct CopyStateData
    bool        binary;         /* binary format? */
    bool        freeze;         /* freeze rows on loading? */
    bool        csv_mode;        /* Comma Separated Value
format? */
-    bool        header_line;    /* CSV or text header line? */
+    CopyHeader  header_line;    /* CSV or text header line? */


After the redefinition into this enum type, there are still a
bunch of references to header_line that treat it like a boolean:

1190:            if (cstate->header_line)
1398:    if (cstate->binary && cstate->header_line)
2119:        if (cstate->header_line)
3635:    if (cstate->cur_lineno == 0 && cstate->header_line)

It works fine since COPY_HEADER_ABSENT is 0 as the first value of the enum,
but maybe it's not good style to count on that.



+            PG_TRY();
+            {
+                if (defGetBoolean(defel))
+                    cstate->header_line =
COPY_HEADER_PRESENT;
+                else
+                    cstate->header_line =
COPY_HEADER_ABSENT;
+            }
+            PG_CATCH();
+            {
+                char    *sval = defGetString(defel);
+
+                if (!cstate->is_copy_from)
+                    PG_RE_THROW();
+
+                if (pg_strcasecmp(sval, "match") == 0)
+                    cstate->header_line =
COPY_HEADER_MATCH;
+                else
+                    ereport(ERROR,
+
(errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("header requires a
boolean or \"match\"")));
+            }
+            PG_END_TRY();

It seems wrong to use a PG_CATCH block for this. I understand that
it's because defGetBoolean() calls ereport() on non-booleans, but then
it should be split into an error-throwing function and a
non-error-throwing lexical analysis of the boolean, the above code
calling the latter.
Besides the comments in elog.h above PG_TRY say that
 "the error recovery code
  can either do PG_RE_THROW to propagate the error outwards, or do a
  (sub)transaction abort. Failure to do so may leave the system in an
  inconsistent state for further processing."
Maybe this is what happens with the repeated uses of "match"
eventually failing with ERRORDATA_STACK_SIZE exceeded.


-    HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+    HEADER { <literal>match</literal> | <literal>true</literal> |
<literal>false</literal> }

This should be enclosed in square brackets because HEADER
with no argument is still accepted.




+      names from the table. On input, the first line is discarded when set
+      to <literal>true</literal> or required to match the column names if
set

The elision of "header" as the subject might be misinterpreted as if
it's the first line that is true.  I'd suggest
"when <literal>header>/literal> is set to ..."    to avoid any confusion.



Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite

Re: Add header support to text format and matching feature

От

vignesh C

Дата:

17 августа 2020 г., 15:49:09

Thanks for your comments, Please find my thoughts inline.

> In my tests it works fine except for one crash that I can reproduce
> on a fresh build and default configuration with:
>
> $ cat >file.txt
> i
> 1
>
> $ psql
> postgres=# create table x(i int);
> CREATE TABLE
> postgres=# \copy x(i) from file.txt with (header match)
> COPY 1
> postgres=# \copy x(i) from file.txt with (header match)
> COPY 1
> postgres=# \copy x(i) from file.txt with (header match)
> COPY 1
> postgres=# \copy x(i) from file.txt with (header match)
> COPY 1
> postgres=# \copy x(i) from file.txt with (header match)
> COPY 1
> postgres=# \copy x(i) from file.txt with (header match)
> PANIC:  ERRORDATA_STACK_SIZE exceeded
> server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>

Fixed, replaced PG_TRY/PG_CATCH with strcmp logic to get the header option.

>
> Code comments:
>
>
> +/*
> + * Represents whether the header must be absent, present or present and
> match.
> + */
> +typedef enum CopyHeader
> +{
> +       COPY_HEADER_ABSENT,
> +       COPY_HEADER_PRESENT,
> +       COPY_HEADER_MATCH
> +} CopyHeader;
> +
>  /*
>   * This struct contains all the state variables used throughout a COPY
>   * operation. For simplicity, we use the same struct for all variants of
> COPY,
> @@ -136,7 +146,7 @@ typedef struct CopyStateData
>         bool            binary;                 /* binary format? */
>         bool            freeze;                 /* freeze rows on loading? */
>         bool            csv_mode;               /* Comma Separated Value
> format? */
> -       bool            header_line;    /* CSV or text header line? */
> +       CopyHeader  header_line;        /* CSV or text header line? */
>
>
> After the redefinition into this enum type, there are still a
> bunch of references to header_line that treat it like a boolean:
>
> 1190:                   if (cstate->header_line)
> 1398:   if (cstate->binary && cstate->header_line)
> 2119:           if (cstate->header_line)
> 3635:   if (cstate->cur_lineno == 0 && cstate->header_line)
>
> It works fine since COPY_HEADER_ABSENT is 0 as the first value of the enum,
> but maybe it's not good style to count on that.

Fixed. Changed it to cstate->header_line != COPY_HEADER_ABSENT.

>
>
>
> +                       PG_TRY();
> +                       {
> +                               if (defGetBoolean(defel))
> +                                       cstate->header_line =
> COPY_HEADER_PRESENT;
> +                               else
> +                                       cstate->header_line =
> COPY_HEADER_ABSENT;
> +                       }
> +                       PG_CATCH();
> +                       {
> +                               char    *sval = defGetString(defel);
> +
> +                               if (!cstate->is_copy_from)
> +                                       PG_RE_THROW();
> +
> +                               if (pg_strcasecmp(sval, "match") == 0)
> +                                       cstate->header_line =
> COPY_HEADER_MATCH;
> +                               else
> +                                       ereport(ERROR,
> +
> (errcode(ERRCODE_SYNTAX_ERROR),
> +                                                errmsg("header requires a
> boolean or \"match\"")));
> +                       }
> +                       PG_END_TRY();
>
> It seems wrong to use a PG_CATCH block for this. I understand that
> it's because defGetBoolean() calls ereport() on non-booleans, but then
> it should be split into an error-throwing function and a
> non-error-throwing lexical analysis of the boolean, the above code
> calling the latter.
> Besides the comments in elog.h above PG_TRY say that
>  "the error recovery code
>   can either do PG_RE_THROW to propagate the error outwards, or do a
>   (sub)transaction abort. Failure to do so may leave the system in an
>   inconsistent state for further processing."
> Maybe this is what happens with the repeated uses of "match"
> eventually failing with ERRORDATA_STACK_SIZE exceeded.
>

Fixed, replaced PG_TRY/PG_CATCH with strcmp logic to get the header option.

>
> -    HEADER [ <replaceable class="parameter">boolean</replaceable> ]
> +    HEADER { <literal>match</literal> | <literal>true</literal> |
> <literal>false</literal> }
>
> This should be enclosed in square brackets because HEADER
> with no argument is still accepted.
>

Fixed.

>
>
>
> +      names from the table. On input, the first line is discarded when set
> +      to <literal>true</literal> or required to match the column names if
> set
>
> The elision of "header" as the subject might be misinterpreted as if
> it's the first line that is true.  I'd suggest
> "when <literal>header>/literal> is set to ..."  to avoid any confusion.
>

Fixed.

Attached v5 patch with the fixes of above comments.
Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: Add header support to text format and matching feature

От

Rémi Lapeyre

Дата:

27 августа 2020 г., 17:53:11

Thanks Daniel for the review and Vignesh for addressing the comments.

I have two remarks with the state of the current patches:
- DefGetCopyHeader() duplicates a lot of code from defGetBoolean(), should we refactor this so that they can share more
oftheir internals? In the current implementation any change to defGetBoolean() should be made to DefGetCopyHeader() too
ortheir behaviour will subtly differ. 
- It is possible to set the header option multiple time:
     \copy x(i) from file.txt with (format csv, header off, header on);
  In which case the last one is the one kept. I think this is a bug and it should be fixed, but this is already the
behaviourin the current implementation so fixing it would not be backward compatible. Do you think users should not do
thisand I can fix it or that keeping the current behaviour is better for backward compatibility? 

Regards,
Rémi

> Le 17 août 2020 à 14:49, vignesh C <vignesh21@gmail.com> a écrit :
>
> Thanks for your comments, Please find my thoughts inline.
>
>> In my tests it works fine except for one crash that I can reproduce
>> on a fresh build and default configuration with:
>>
>> $ cat >file.txt
>> i
>> 1
>>
>> $ psql
>> postgres=# create table x(i int);
>> CREATE TABLE
>> postgres=# \copy x(i) from file.txt with (header match)
>> COPY 1
>> postgres=# \copy x(i) from file.txt with (header match)
>> COPY 1
>> postgres=# \copy x(i) from file.txt with (header match)
>> COPY 1
>> postgres=# \copy x(i) from file.txt with (header match)
>> COPY 1
>> postgres=# \copy x(i) from file.txt with (header match)
>> COPY 1
>> postgres=# \copy x(i) from file.txt with (header match)
>> PANIC:  ERRORDATA_STACK_SIZE exceeded
>> server closed the connection unexpectedly
>>        This probably means the server terminated abnormally
>>        before or while processing the request.
>> The connection to the server was lost. Attempting reset: Failed.
>>
>
> Fixed, replaced PG_TRY/PG_CATCH with strcmp logic to get the header option.
>
>>
>> Code comments:
>>
>>
>> +/*
>> + * Represents whether the header must be absent, present or present and
>> match.
>> + */
>> +typedef enum CopyHeader
>> +{
>> +       COPY_HEADER_ABSENT,
>> +       COPY_HEADER_PRESENT,
>> +       COPY_HEADER_MATCH
>> +} CopyHeader;
>> +
>> /*
>>  * This struct contains all the state variables used throughout a COPY
>>  * operation. For simplicity, we use the same struct for all variants of
>> COPY,
>> @@ -136,7 +146,7 @@ typedef struct CopyStateData
>>        bool            binary;                 /* binary format? */
>>        bool            freeze;                 /* freeze rows on loading? */
>>        bool            csv_mode;               /* Comma Separated Value
>> format? */
>> -       bool            header_line;    /* CSV or text header line? */
>> +       CopyHeader  header_line;        /* CSV or text header line? */
>>
>>
>> After the redefinition into this enum type, there are still a
>> bunch of references to header_line that treat it like a boolean:
>>
>> 1190:                   if (cstate->header_line)
>> 1398:   if (cstate->binary && cstate->header_line)
>> 2119:           if (cstate->header_line)
>> 3635:   if (cstate->cur_lineno == 0 && cstate->header_line)
>>
>> It works fine since COPY_HEADER_ABSENT is 0 as the first value of the enum,
>> but maybe it's not good style to count on that.
>
> Fixed. Changed it to cstate->header_line != COPY_HEADER_ABSENT.
>
>>
>>
>>
>> +                       PG_TRY();
>> +                       {
>> +                               if (defGetBoolean(defel))
>> +                                       cstate->header_line =
>> COPY_HEADER_PRESENT;
>> +                               else
>> +                                       cstate->header_line =
>> COPY_HEADER_ABSENT;
>> +                       }
>> +                       PG_CATCH();
>> +                       {
>> +                               char    *sval = defGetString(defel);
>> +
>> +                               if (!cstate->is_copy_from)
>> +                                       PG_RE_THROW();
>> +
>> +                               if (pg_strcasecmp(sval, "match") == 0)
>> +                                       cstate->header_line =
>> COPY_HEADER_MATCH;
>> +                               else
>> +                                       ereport(ERROR,
>> +
>> (errcode(ERRCODE_SYNTAX_ERROR),
>> +                                                errmsg("header requires a
>> boolean or \"match\"")));
>> +                       }
>> +                       PG_END_TRY();
>>
>> It seems wrong to use a PG_CATCH block for this. I understand that
>> it's because defGetBoolean() calls ereport() on non-booleans, but then
>> it should be split into an error-throwing function and a
>> non-error-throwing lexical analysis of the boolean, the above code
>> calling the latter.
>> Besides the comments in elog.h above PG_TRY say that
>> "the error recovery code
>>  can either do PG_RE_THROW to propagate the error outwards, or do a
>>  (sub)transaction abort. Failure to do so may leave the system in an
>>  inconsistent state for further processing."
>> Maybe this is what happens with the repeated uses of "match"
>> eventually failing with ERRORDATA_STACK_SIZE exceeded.
>>
>
> Fixed, replaced PG_TRY/PG_CATCH with strcmp logic to get the header option.
>
>>
>> -    HEADER [ <replaceable class="parameter">boolean</replaceable> ]
>> +    HEADER { <literal>match</literal> | <literal>true</literal> |
>> <literal>false</literal> }
>>
>> This should be enclosed in square brackets because HEADER
>> with no argument is still accepted.
>>
>
> Fixed.
>
>>
>>
>>
>> +      names from the table. On input, the first line is discarded when set
>> +      to <literal>true</literal> or required to match the column names if
>> set
>>
>> The elision of "header" as the subject might be misinterpreted as if
>> it's the first line that is true.  I'd suggest
>> "when <literal>header>/literal> is set to ..."  to avoid any confusion.
>>
>
> Fixed.
>
> Attached v5 patch with the fixes of above comments.
> Thoughts?
>
> Regards,
> Vignesh
> EnterpriseDB: http://www.enterprisedb.com
> <v5-0001-Add-header-support-to-COPY-TO-text-format.patch><v5-0002-Add-header-matching-mode-to-COPY-FROM.patch>

Re: Add header support to text format and matching feature

От

Michael Paquier

Дата:

29 сентября 2020 г., 09:36:24

On Thu, Aug 27, 2020 at 04:53:11PM +0200, Rémi Lapeyre wrote:
> I have two remarks with the state of the current patches:
> - DefGetCopyHeader() duplicates a lot of code from defGetBoolean(),
> should we refactor this so that they can share more of their
> internals? In the current implementation any change to
> defGetBoolean() should be made to DefGetCopyHeader() too or their
> behaviour will subtly differ.

The difference comes from the use of "match", and my take would be
here that it is wrong to assume that header can be a boolean-like
parameter with only one exception.  It seems to me that we may
actually be looking at having this stuff as an option different than
"header" at the end to have clear semantics.

> - It is possible to set the header option multiple time:
>      \copy x(i) from file.txt with (format csv, header off, header on);
>   In which case the last one is the one kept. I think this is a bug
> and it should be fixed, but this is already the behaviour in the
> current implementation so fixing it would not be backward
> compatible. Do you think users should not do this and I can fix it
> or that keeping the current behaviour is better for backward
> compatibility?

I would agree that this is a bug because we are failing to detect
what's actually a redundant option here as the first option still
causes the flag to be set to false, but that's not something worth a
back-patch IMO.  What we are looking here is something similar
to what is done with "format", where we track if the option has been
specified with format_specified.  The same is actually true with the
"freeze" option here, and it is true that we tend to prefer error-ing
in such cases while there are exceptions like EXPLAIN.  I think that
it would be nicer to be at least consistent with the behavior that
each command has chosen, and COPY is now a mixed bag.

I have marked the patch as returned with feedback for now.
--
Michael

Вложения

signature.asc

Re: Add header support to text format and matching feature

От

Rémi Lapeyre

Дата:

04 октября 2020 г., 00:42:52

> I would agree that this is a bug because we are failing to detect
> what's actually a redundant option here as the first option still
> causes the flag to be set to false, but that's not something worth a
> back-patch IMO.  What we are looking here is something similar
> to what is done with "format", where we track if the option has been
> specified with format_specified.  The same is actually true with the
> "freeze" option here, and it is true that we tend to prefer error-ing
> in such cases while there are exceptions like EXPLAIN.  I think that
> it would be nicer to be at least consistent with the behavior that
> each command has chosen, and COPY is now a mixed bag.

Here’s a new version of the patches that report an error when the options are set multiple time.

Regards,
Rémi

Вложения

Re: Add header support to text format and matching feature

От

Michael Paquier

Дата:

05 октября 2020 г., 04:05:54

On Sat, Oct 03, 2020 at 11:42:52PM +0200, Rémi Lapeyre wrote:
> Here’s a new version of the patches that report an error when the options are set multiple time.

Please note that I have applied a fix for the redundant option
handling as of 10c5291, though I have missed that you sent a patch.
Sorry about that.  Looking at it, we have done the same thing
byte-by-byte except that I have added tests for all option
combinations.
--
Michael

Вложения

signature.asc

Re: Add header support to text format and matching feature

От

Rémi Lapeyre

Дата:

13 октября 2020 г., 15:49:07

Thanks Michael for taking care of that!

Here’s the rebased patches with the last one dropped.

Regards,
Rémi




> Le 5 oct. 2020 à 03:05, Michael Paquier <michael@paquier.xyz> a écrit :
>
> On Sat, Oct 03, 2020 at 11:42:52PM +0200, Rémi Lapeyre wrote:
>> Here’s a new version of the patches that report an error when the options are set multiple time.
>
> Please note that I have applied a fix for the redundant option
> handling as of 10c5291, though I have missed that you sent a patch.
> Sorry about that.  Looking at it, we have done the same thing
> byte-by-byte except that I have added tests for all option
> combinations.
> --
> Michael

Hi, here’s a rebased version of the patch.

Best regards,
Rémi




> On 21 Oct 2020, at 19:49, Daniel Verite <daniel@manitou-mail.org> wrote:
>
>     Rémi Lapeyre wrote:
>
>> It looks like this is not in the current commitfest and that Cabot does not
>> find it. I’m not yet accustomed to the PostgreSQL workflow, should I just
>> create a new entry in the current commitfest?
>
> Yes. Because in the last CommitFest it was marked
> as "Returned with feedback"
> https://commitfest.postgresql.org/29/2504/
>
>
> Best regards,
> --
> Daniel Vérité
> PostgreSQL-powered mailer: https://www.manitou-mail.org
> Twitter: @DanielVerite
>
>
>
>

Вложения

Re: Add header support to text format and matching feature

От

Rémi Lapeyre

Дата:

11 апреля 2021 г., 02:17:03

>
> Michael, since the issue of duplicated options has been fixed do either of these patches look like they are ready for
commit?
>

Here’s a rebased version of the patch.


Cheers,
Rémi


> Regards,
> --
> -David
> david@pgmasters.net

Вложения

Re: Add header support to text format and matching feature

От

Zhihong Yu

Дата:

11 апреля 2021 г., 05:22:39

On Sat, Apr 10, 2021 at 4:17 PM Rémi Lapeyre <remi.lapeyre@lenstra.fr> wrote:

>
> Michael, since the issue of duplicated options has been fixed do either of these patches look like they are ready for commit?
>

Here’s a rebased version of the patch.

Cheers,
Rémi

> Regards,
> --
> -David
> david@pgmasters.net

Hi,

>> sure it matches what is expected and exit immediatly if it does not.

Typo: immediately

+CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server

nit: since header is singular, you can name the table header_doesnt_match

+ from the one expected, or the name or case do not match, the copy will

For 'the name or case do not match', either use plural for the subjects or change 'do' to doesn't

- opts_out->header_line = defGetBoolean(defel);
+ opts_out->header_line = DefGetCopyHeader(defel);

Existing method starts with lower case d, I wonder why the new method starts with upper case D.

+ if (fldct < list_length(cstate->attnumlist))
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("missing header")));

The message seems to be inaccurate: the header may be there - it just misses some fields.

+ * Represents whether the header must be absent, present or present and match.

present and match: it seems present is redundant - if header is absent, how can it match ?

Cheers

Re: Add header support to text format and matching feature

От

Rémi Lapeyre

Дата:

11 апреля 2021 г., 14:01:22

>
> Hi,
> >> sure it matches what is expected and exit immediatly if it does not.
>
> Typo: immediately
>
> +CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server
>
> nit: since header is singular, you can name the table header_doesnt_match
>
> +      from the one expected, or the name or case do not match, the copy will
>
> For 'the name or case do not match', either use plural for the subjects or change 'do' to doesn't
>

Thanks, I fixed both typos.

> -           opts_out->header_line = defGetBoolean(defel);
> +           opts_out->header_line = DefGetCopyHeader(defel);
>
> Existing method starts with lower case d, I wonder why the new method starts with upper case D.
>

I don’t remember why I used DefGetCopyHeader, should I change it?

> +           if (fldct < list_length(cstate->attnumlist))
> +               ereport(ERROR,
> +                       (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
> +                        errmsg("missing header")));
>
> The message seems to be inaccurate: the header may be there - it just misses some fields.

I changed the error messages, they now are:
    ERROR:  incomplete header, expected 3 columns but got 2
    ERROR:  extra data after last expected header, expected 3 columns but got 4

>
> + * Represents whether the header must be absent, present or present and match.
>
> present and match: it seems present is redundant - if header is absent, how can it match ?

This now reads "Represents whether the header must be absent, present or match.”.

Cheers,
Rémi

>
> Cheers

Вложения

Re: Add header support to text format and matching feature

От

Zhihong Yu

Дата:

11 апреля 2021 г., 16:28:21

On Sun, Apr 11, 2021 at 4:01 AM Rémi Lapeyre <remi.lapeyre@lenstra.fr> wrote:

>
> Hi,
> >> sure it matches what is expected and exit immediatly if it does not.
>
> Typo: immediately
>
> +CREATE FOREIGN TABLE header_dont_match (a int, foo text) SERVER file_server
>
> nit: since header is singular, you can name the table header_doesnt_match
>
> + from the one expected, or the name or case do not match, the copy will
>
> For 'the name or case do not match', either use plural for the subjects or change 'do' to doesn't
>

Thanks, I fixed both typos.

> - opts_out->header_line = defGetBoolean(defel);
> + opts_out->header_line = DefGetCopyHeader(defel);
>
> Existing method starts with lower case d, I wonder why the new method starts with upper case D.
>

I don’t remember why I used DefGetCopyHeader, should I change it?

> + if (fldct < list_length(cstate->attnumlist))
> + ereport(ERROR,
> + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
> + errmsg("missing header")));
>
> The message seems to be inaccurate: the header may be there - it just misses some fields.

I changed the error messages, they now are:
ERROR: incomplete header, expected 3 columns but got 2
ERROR: extra data after last expected header, expected 3 columns but got 4

>
> + * Represents whether the header must be absent, present or present and match.
>
> present and match: it seems present is redundant - if header is absent, how can it match ?

This now reads "Represents whether the header must be absent, present or match.”.

Cheers,
Rémi

>
> Cheers

>> This now reads "Represents whether the header must be absent, present or match.”.

Since match shouldn't be preceded with be, I think we can say:

Represents whether the header must match, be absent or be present.

Cheers

Re: Add header support to text format and matching feature

От

Rémi Lapeyre

Дата:

11 апреля 2021 г., 17:35:12

>
> >> This now reads "Represents whether the header must be absent, present or match.”.
>
> Since match shouldn't be preceded with be, I think we can say:
>
> Represents whether the header must match, be absent or be present.

Thanks, here’s a v10 version of the patch that fixes this.

Вложения

Re: Add header support to text format and matching feature

От

Rémi Lapeyre

Дата:

31 декабря 2021 г., 20:36:33

Here’s an updated version of the patch that takes into account the changes in d1029bb5a2. The actual code is the same
asv10 which was already marked as ready for committer. 




> On 11 Apr 2021, at 16:35, Rémi Lapeyre <remi.lapeyre@lenstra.fr> wrote:
>
>>
>>>> This now reads "Represents whether the header must be absent, present or match.”.
>>
>> Since match shouldn't be preceded with be, I think we can say:
>>
>> Represents whether the header must match, be absent or be present.
>
> Thanks, here’s a v10 version of the patch that fixes this.
>
> <v10-0001-Add-header-support-to-COPY-TO-text-format.patch><v10-0002-Add-header-matching-mode-to-COPY-FROM.patch>

Вложения

Re: Add header support to text format and matching feature

От

Peter Eisentraut

Дата:

28 января 2022 г., 11:57:46

On 31.12.21 18:36, Rémi Lapeyre wrote:
> Here’s an updated version of the patch that takes into account the changes in d1029bb5a2. The actual code is the same
asv10 which was already marked as ready for committer.
 

I have committed the 0001 patch.  I will work on the 0002 patch next.

I notice in the 0002 patch that there is no test case for the error 
"wrong header for column \"%s\": got \"%s\"", which I think is really 
the core functionality of this patch.  So please add that.

I wonder whether the header matching should be a separate option from 
the HEADER option.  The option parsing in this patch is quite 
complicated and could be simpler if there were two separate options.  It 
appears this has been mentioned in the thread but not fully discussed.

Re: Add header support to text format and matching feature

От

Rémi Lapeyre

Дата:

31 января 2022 г., 01:56:57

> On 28 Jan 2022, at 09:57, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
>
> On 31.12.21 18:36, Rémi Lapeyre wrote:
>> Here’s an updated version of the patch that takes into account the changes in d1029bb5a2. The actual code is the
sameas v10 which was already marked as ready for committer. 
>
> I have committed the 0001 patch.  I will work on the 0002 patch next.
>

Thanks!

> I notice in the 0002 patch that there is no test case for the error "wrong header for column \"%s\": got \"%s\"",
whichI think is really the core functionality of this patch.  So please add that. 
>

I added a test for it in this new version of the patch.

> I wonder whether the header matching should be a separate option from the HEADER option.  The option parsing in this
patchis quite complicated and could be simpler if there were two separate options.  It appears this has been mentioned
inthe thread but not fully discussed. 

I suppose a new option could be added but I’m not sure it would simplify things much with regard to the code and in my
opinionit would be a bit weirder for users, right now it is just: 

    copy my_table from stdin with (header match);

with an additional option it could be:

    copy my_table from stdin with (header true,  match);

with potentially “header true” being implicit when “match” is given:

    copy my_table from stdin with (match);

But I think we would still have to check for and return an error if the user inputs:

    copy my_table from stdin with (header off, match);


Rather than complicating things, the current implementation seemed to be the best but I will update the patch if you
thinkI should change it. 

Best regards,
Rémi

Вложения

v12-0001-Add-header-matching-mode-to-COPY-FROM.patch

Re: Add header support to text format and matching feature

От

Peter Eisentraut

Дата:

31 января 2022 г., 09:54:01

On 30.01.22 23:56, Rémi Lapeyre wrote:
>> I notice in the 0002 patch that there is no test case for the error "wrong header for column \"%s\": got \"%s\"",
whichI think is really the core functionality of this patch.  So please add that.
 
>>
> 
> I added a test for it in this new version of the patch.

The file_fdw.sql tests contain this

+CREATE FOREIGN TABLE header_doesnt_match (a int, foo text) SERVER 
file_server
+OPTIONS (format 'csv', filename :'filename', delimiter ',', header 
'match');   -- ERROR

but no actual error is generated.  Please review the additions on the 
file_fdw tests to see that they make sense.

Re: Add header support to text format and matching feature

От

Peter Eisentraut

Дата:

03 февраля 2022 г., 09:37:44

On 31.01.22 07:54, Peter Eisentraut wrote:
> On 30.01.22 23:56, Rémi Lapeyre wrote:
>>> I notice in the 0002 patch that there is no test case for the error 
>>> "wrong header for column \"%s\": got \"%s\"", which I think is really 
>>> the core functionality of this patch.  So please add that.
>>>
>>
>> I added a test for it in this new version of the patch.
> 
> The file_fdw.sql tests contain this
> 
> +CREATE FOREIGN TABLE header_doesnt_match (a int, foo text) SERVER 
> file_server
> +OPTIONS (format 'csv', filename :'filename', delimiter ',', header 
> 'match');   -- ERROR
> 
> but no actual error is generated.  Please review the additions on the 
> file_fdw tests to see that they make sense.

A few more comments on your latest patch:

- The DefGetCopyHeader() function seems very bulky and might not be 
necessary.  I think you can just check for the string "match" first and 
then use defGetBoolean() as before if it didn't match.

- If you define COPY_HEADER_ABSENT = 0 in the enum, then most of the 
existing truth checks don't need to be changed.

- In NextCopyFromRawFields(), it looks like you have code that replaces 
the null_print value if the supplied column name is null.  I don't think 
we should allow null column values.  Someone, this should be an error. 
(Do we use null_print on output and make the column name null if it 
matches?)

Re: Add header support to text format and matching feature

От

Greg Stark

Дата:

25 марта 2022 г., 07:21:07

Great to see the first of the two patches committed.

It looks like the second patch got some feedback from Peter in
February and has been marked "Waiting on author" ever since.

Remi, will you have a chance to look at this this month?

Peter, are these comments blocking if Remi doesn't have a chance to
work on it should I move it to the next release or could it be fixed
up and committed?

Re: Add header support to text format and matching feature

От

"Daniel Verite"

Дата:

25 марта 2022 г., 15:02:33

    Peter Eisentraut wrote:

> - The DefGetCopyHeader() function seems very bulky and might not be
> necessary.  I think you can just check for the string "match" first and
> then use defGetBoolean() as before if it didn't match.

The problem is that defGetBoolean() ends like this in the non-matching
case:

    ereport(ERROR,
        (errcode(ERRCODE_SYNTAX_ERROR),
         errmsg("%s requires a Boolean value",
        def->defname)));

We don't want this error message when the string does not match
since it's really a tri-state option, not a boolean.

To avoid duplicating the code that recognizes true/false/on/off/0/1,
probably defGetBoolean()'s guts should be moved into another function
that does the matching and report to the caller instead of throwing an
error. Then DefGetCopyHeader() could call that non-throwing function.

> - If you define COPY_HEADER_ABSENT = 0 in the enum, then most of the
> existing truth checks don't need to be changed.

It's already 0 by default. Not changing some truth checks does work, but
then we get some code that treat CopyFromState.header_line like
a boolean and some other code like an enum, which I found not
ideal wrt clarity in an earlier round of review [1]

It's not a major issue though, as it  concerns only 3 lines of code in the
v12
patch:
-    if (opts_out->binary && opts_out->header_line)
+    if (opts_out->binary && opts_out->header_line != COPY_HEADER_ABSENT)

+    /* on input check that the header line is correct if needed */
+    if (cstate->cur_lineno == 0 && cstate->opts.header_line !=
COPY_HEADER_ABSENT)

-        if (cstate->opts.header_line)
+        if (cstate->opts.header_line != COPY_HEADER_ABSENT)

> - In NextCopyFromRawFields(), it looks like you have code that replaces
> the null_print value if the supplied column name is null.  I don't think
> we should allow null column values.  Someone, this should be an error.

+1

[1]
https://www.postgresql.org/message-id/80a9b594-01d6-4ee4-a612-9ae9fd3c1194%40manitou-mail.org

Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

Re: Add header support to text format and matching feature

От

Peter Eisentraut

Дата:

29 марта 2022 г., 18:02:04

On 25.03.22 05:21, Greg Stark wrote:
> Great to see the first of the two patches committed.
> 
> It looks like the second patch got some feedback from Peter in
> February and has been marked "Waiting on author" ever since.
> 
> Remi, will you have a chance to look at this this month?
> 
> Peter, are these comments blocking if Remi doesn't have a chance to
> work on it should I move it to the next release or could it be fixed
> up and committed?

I will try to finish up this patch.

Re: Add header support to text format and matching feature

От

Peter Eisentraut

Дата:

30 марта 2022 г., 10:11:09

On 29.03.22 17:02, Peter Eisentraut wrote:
> On 25.03.22 05:21, Greg Stark wrote:
>> Great to see the first of the two patches committed.
>>
>> It looks like the second patch got some feedback from Peter in
>> February and has been marked "Waiting on author" ever since.
>>
>> Remi, will you have a chance to look at this this month?
>>
>> Peter, are these comments blocking if Remi doesn't have a chance to
>> work on it should I move it to the next release or could it be fixed
>> up and committed?
> 
> I will try to finish up this patch.

Committed, after some further refinements as discussed.

Re: Add header support to text format and matching feature

От

Julien Rouhaud

Дата:

07 июня 2022 г., 18:47:44

Hi,

On Wed, Mar 30, 2022 at 09:11:09AM +0200, Peter Eisentraut wrote:
>
> Committed, after some further refinements as discussed.

While working on nearby code, I found some problems with this feature.

First, probably nitpicking, the HEADER MATCH is allowed for COPY TO, is that
expected?  The documentation isn't really explicit about it, but there's
nothing to match when exporting data it's a bit surprising.  I'm not opposed to
have HEADER MATCH means HEADER ON for COPY TO, as as-is one can easily reuse
the commands history, but maybe it should be clearly documented?

Then, apparently HEADER MATCH doesn't let you do sanity checks against a custom
column list.  This one looks like a clear oversight, as something like that
should be entirely valid IMHO:

CREATE TABLE tbl(col1 int, col2 int);
COPY tbl (col2, col1) TO '/path/to/file' WITH (HEADER MATCH);
COPY tbl (col2, col1) FROM '/path/to/file' WITH (HEADER MATCH);

but right now it errors out with:

ERROR:  column name mismatch in header line field 1: got "col1", expected "col2"

Note that the error message is bogus if you specify attributes in a
different order from the relation, as the code is mixing access to the tuple
desc and access to the raw fields with the same offset.

This also means that it will actually fail to detect a mismatch in the provided
column list and let you import data in the wrong position as long as the
datatypes are compatible and the column header in the file are in the correct
order.  For instance:

CREATE TABLE abc (a text, b text, c text);
INSERT INTO abc SELECT 'a', 'b', 'c';
COPY abc TO '/path/to/file' WITH (HEADER MATCH);

You can then import the data with any of those:
COPY abc(c, b, a) TO '/path/to/file' WITH (HEADER MATCH);
COPY abc(c, a, b) TO '/path/to/file' WITH (HEADER MATCH);
[...]
SELECT * FROM abc;

Even worse, if you try to do a COPY ... FROM ... WITH (HEADER ON) on a table
that has some dropped attribute(s).  The current code will access random memory
as there's no exact attnum / raw field mapping anymore.

I can work on a fix if needed (with some additional regression test to cover
those cases), but I'm still not sure that having a user provided column list is
supposed to be accepted or not for the HEADER MATCH.  In the meantime I will
add an open item.

Re: Add header support to text format and matching feature

От

Andrew Dunstan

Дата:

12 июня 2022 г., 16:36:13

On 2022-06-07 Tu 11:47, Julien Rouhaud wrote:
> Hi,
>
> On Wed, Mar 30, 2022 at 09:11:09AM +0200, Peter Eisentraut wrote:
>> Committed, after some further refinements as discussed.
> While working on nearby code, I found some problems with this feature.
>
> First, probably nitpicking, the HEADER MATCH is allowed for COPY TO, is that
> expected?  The documentation isn't really explicit about it, but there's
> nothing to match when exporting data it's a bit surprising.  I'm not opposed to
> have HEADER MATCH means HEADER ON for COPY TO, as as-is one can easily reuse
> the commands history, but maybe it should be clearly documented?


I think it makes more sense to have a sanity check to prevent HEADER
MATCH with COPY TO.


>
> Then, apparently HEADER MATCH doesn't let you do sanity checks against a custom
> column list.  This one looks like a clear oversight, as something like that
> should be entirely valid IMHO:
>
> CREATE TABLE tbl(col1 int, col2 int);
> COPY tbl (col2, col1) TO '/path/to/file' WITH (HEADER MATCH);
> COPY tbl (col2, col1) FROM '/path/to/file' WITH (HEADER MATCH);
>
> but right now it errors out with:
>
> ERROR:  column name mismatch in header line field 1: got "col1", expected "col2"
>
> Note that the error message is bogus if you specify attributes in a
> different order from the relation, as the code is mixing access to the tuple
> desc and access to the raw fields with the same offset.
>
> This also means that it will actually fail to detect a mismatch in the provided
> column list and let you import data in the wrong position as long as the
> datatypes are compatible and the column header in the file are in the correct
> order.  For instance:
>
> CREATE TABLE abc (a text, b text, c text);
> INSERT INTO abc SELECT 'a', 'b', 'c';
> COPY abc TO '/path/to/file' WITH (HEADER MATCH);
>
> You can then import the data with any of those:
> COPY abc(c, b, a) TO '/path/to/file' WITH (HEADER MATCH);
> COPY abc(c, a, b) TO '/path/to/file' WITH (HEADER MATCH);
> [...]
> SELECT * FROM abc;
>
> Even worse, if you try to do a COPY ... FROM ... WITH (HEADER ON) on a table
> that has some dropped attribute(s).  The current code will access random memory
> as there's no exact attnum / raw field mapping anymore.


Ouch! That certainly needs to be fixed.


>
> I can work on a fix if needed (with some additional regression test to cover
> those cases), but I'm still not sure that having a user provided column list is
> supposed to be accepted or not for the HEADER MATCH.  In the meantime I will
> add an open item.
>
>


I think it should, but a temporary alternative would be to forbid HEADER
MATCH with explicit column lists until we can make it work right.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Add header support to text format and matching feature

От

Julien Rouhaud

Дата:

13 июня 2022 г., 05:32:13

Hi,

On Sun, Jun 12, 2022 at 09:36:13AM -0400, Andrew Dunstan wrote:
> 
> On 2022-06-07 Tu 11:47, Julien Rouhaud wrote:
> >
> > First, probably nitpicking, the HEADER MATCH is allowed for COPY TO, is that
> > expected?  The documentation isn't really explicit about it, but there's
> > nothing to match when exporting data it's a bit surprising.  I'm not opposed to
> > have HEADER MATCH means HEADER ON for COPY TO, as as-is one can easily reuse
> > the commands history, but maybe it should be clearly documented?
> 
> 
> I think it makes more sense to have a sanity check to prevent HEADER
> MATCH with COPY TO.

I'm fine with it.  I added such a check and mentioned it in the documentation.

> > Then, apparently HEADER MATCH doesn't let you do sanity checks against a custom
> > column list.  This one looks like a clear oversight, as something like that
> > should be entirely valid IMHO:
> >
> > CREATE TABLE tbl(col1 int, col2 int);
> > COPY tbl (col2, col1) TO '/path/to/file' WITH (HEADER MATCH);
> > COPY tbl (col2, col1) FROM '/path/to/file' WITH (HEADER MATCH);
> >
> > but right now it errors out with:
> >
> > ERROR:  column name mismatch in header line field 1: got "col1", expected "col2"
> >
> > Note that the error message is bogus if you specify attributes in a
> > different order from the relation, as the code is mixing access to the tuple
> > desc and access to the raw fields with the same offset.
> > [...]
> I think it should, but a temporary alternative would be to forbid HEADER
> MATCH with explicit column lists until we can make it work right.

I think it would still be problematic if the target table has dropped columns.
Fortunately, as I initially thought the problem is only due to a thinko in the
original commit which used a wrong variable for the raw_fields offset.  Once
fixed (attached v1) I didn't see any other problem in the rest of the logic and
all the added regression tests work as expected.

Вложения

v1-0001-Fix-processing-of-header-match-option-for-COPY.patch

Re: Add header support to text format and matching feature

От

Peter Eisentraut

Дата:

13 июня 2022 г., 10:35:16

On 13.06.22 04:32, Julien Rouhaud wrote:
>> I think it makes more sense to have a sanity check to prevent HEADER
>> MATCH with COPY TO.
> 
> I'm fine with it.  I added such a check and mentioned it in the documentation.

> I think it would still be problematic if the target table has dropped columns.
> Fortunately, as I initially thought the problem is only due to a thinko in the
> original commit which used a wrong variable for the raw_fields offset.  Once
> fixed (attached v1) I didn't see any other problem in the rest of the logic and
> all the added regression tests work as expected.

Thanks for this patch.  I'll check it in detail in a bit.  It looks good 
to me at first glance.

Re: Add header support to text format and matching feature

От

Michael Paquier

Дата:

13 июня 2022 г., 10:46:46

On Mon, Jun 13, 2022 at 10:32:13AM +0800, Julien Rouhaud wrote:
> On Sun, Jun 12, 2022 at 09:36:13AM -0400, Andrew Dunstan wrote:
> I'm fine with it.  I added such a check and mentioned it in the documentation.

An error looks like the right call at this stage of the game.  I am
not sure what the combination of MATCH with COPY TO would mean,
actually.  And with the concept of SELECT queries on top of it, the
whole idea gets blurrier.

> I think it would still be problematic if the target table has dropped columns.
> Fortunately, as I initially thought the problem is only due to a thinko in the
> original commit which used a wrong variable for the raw_fields offset.  Once
> fixed (attached v1) I didn't see any other problem in the rest of the logic and
> all the added regression tests work as expected.

Interesting catch.  One thing that I've always found useful when it
comes to tests that stress dropped columns is to have tests where we
reduce the number of total columns that still exist.  An extra thing
is to look after ........pg.dropped.N........ a bit, and I would put
something in one of the headers.

>      if (pg_strcasecmp(sval, "match") == 0)
> +    {
> +        /* match is only valid for COPY FROM */
> +        if (!is_from)
> +            ereport(ERROR,
> +                (errcode(ERRCODE_SYNTAX_ERROR),
> +             errmsg("%s match is only valid for COPY FROM",
> +                    def->defname)));

Some nits.  I would suggest to reword this error message, like "cannot
use \"match\" with HEADER in COPY TO".  There is no need for the extra
comment, and the error code should be ERRCODE_FEATURE_NOT_SUPPORTED.
--
Michael

On Thu, Jun 16, 2022 at 09:24:56AM +0200, Peter Eisentraut wrote:
> I don't feel very strongly about this.  It makes sense to stay consistent
> with the existing COPY code.

Yes, my previous argument is based on consistency with the
surroundings.  I am not saying that this could not be made better, it
surely can, but I would recommend to tackle that in a separate patch,
and apply that to more areas than this specific one.
--
Michael

Вложения

signature.asc

Re: Add header support to text format and matching feature

От

Michael Paquier

Дата:

22 июня 2022 г., 02:34:34

On Mon, Jun 20, 2022 at 09:03:23AM +0900, Michael Paquier wrote:
> On Thu, Jun 16, 2022 at 09:24:56AM +0200, Peter Eisentraut wrote:
>> I don't feel very strongly about this.  It makes sense to stay consistent
>> with the existing COPY code.
>
> Yes, my previous argument is based on consistency with the
> surroundings.  I am not saying that this could not be made better, it
> surely can, but I would recommend to tackle that in a separate patch,
> and apply that to more areas than this specific one.

Peter, beta2 is planned for next week.  Do you think that you would be
able to address this open item by the end of this week?  If not, and I
have already looked at the proposed patch, I can jump in and help.
--
Michael

Вложения

signature.asc

Re: Add header support to text format and matching feature

От

Peter Eisentraut

Дата:

22 июня 2022 г., 13:22:01

On 22.06.22 01:34, Michael Paquier wrote:
> On Mon, Jun 20, 2022 at 09:03:23AM +0900, Michael Paquier wrote:
>> On Thu, Jun 16, 2022 at 09:24:56AM +0200, Peter Eisentraut wrote:
>>> I don't feel very strongly about this.  It makes sense to stay consistent
>>> with the existing COPY code.
>>
>> Yes, my previous argument is based on consistency with the
>> surroundings.  I am not saying that this could not be made better, it
>> surely can, but I would recommend to tackle that in a separate patch,
>> and apply that to more areas than this specific one.
> 
> Peter, beta2 is planned for next week.  Do you think that you would be
> able to address this open item by the end of this week?  If not, and I
> have already looked at the proposed patch, I can jump in and help.

The latest patch was posted by you, so I was deferring to you to commit 
it.  Would you like me to do it?

Re: Add header support to text format and matching feature

От

Michael Paquier

Дата:

22 июня 2022 г., 14:00:15

On Wed, Jun 22, 2022 at 12:22:01PM +0200, Peter Eisentraut wrote:
> The latest patch was posted by you, so I was deferring to you to commit it.
> Would you like me to do it?

OK.  As this is originally a feature you have committed, I originally
thought that you would take care of it, even if I sent a patch.  I'll
handle that tomorrow then, if that's fine for you, of course.  Happy
to help.
--
Michael

Вложения

signature.asc

Re: Add header support to text format and matching feature

От

Michael Paquier

Дата:

23 июня 2022 г., 07:26:29

On Wed, Jun 22, 2022 at 08:00:15PM +0900, Michael Paquier wrote:
> OK.  As this is originally a feature you have committed, I originally
> thought that you would take care of it, even if I sent a patch.  I'll
> handle that tomorrow then, if that's fine for you, of course.  Happy
> to help.

And done.  Thanks.
--
Michael

Вложения

signature.asc

Re: Add header support to text format and matching feature

От

Julien Rouhaud

Дата:

23 июня 2022 г., 07:46:46

On Thu, Jun 23, 2022 at 01:26:29PM +0900, Michael Paquier wrote:
> On Wed, Jun 22, 2022 at 08:00:15PM +0900, Michael Paquier wrote:
> > OK.  As this is originally a feature you have committed, I originally
> > thought that you would take care of it, even if I sent a patch.  I'll
> > handle that tomorrow then, if that's fine for you, of course.  Happy
> > to help.
> 
> And done.  Thanks.

Thanks!

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Allow COPY's 'text' format to output a header

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения