Re: Fixing backslash dot for COPY FROM...CSV

Поиск
Список
Период
Сортировка
От Daniel Verite
Тема Re: Fixing backslash dot for COPY FROM...CSV
Дата
Msg-id 90523252-5831-4541-b3f6-0b3e03d428d2@manitou-mail.org
обсуждение исходный текст
Ответ на Re: Fixing backslash dot for COPY FROM...CSV  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Fixing backslash dot for COPY FROM...CSV  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
    Robert Haas wrote:

> Those links unfortunately seem not to be entirely specific to this
> issue. Other, related things seem to be discussed there, and it's not
> obvious that everyone agrees on what to do, or really that anyone
> agrees on what to do. The best link that I found for this exact issue
> is
> https://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@wrigleys.postgresql.org
> but the thread isn't very conclusive and is so old that any
> conclusions reached then might no longer be considered valid today.

To refresh the problem statement, 4 cases that need fixing as
of HEAD can be distinguished:

#1. copy csv from file, single column, no quoting involved.
COPY will stop at \. and ignore the rest of the file without
any error or warning.

$ cat >/tmp/file.csv <<EOF
line1
\.
line2
EOF

$ psql <<EOF
create table contents(t text);
copy contents from '/tmp/file.csv' (format csv);
table contents;
EOF

Results in
   t
-------
 line1
(1 row)

The bug is that a single row is imported instead of the 3 rows of the file.


#2. Same as the previous case, but with file_fdw

$ psql <<EOF
CREATE EXTENSION file_fdw;

CREATE FOREIGN TABLE csv_data(line text) SERVER myserver
  OPTIONS (filename '/tmp/file.csv', format 'csv');

TABLE csv_data;
EOF

Results in:

line
-------
 line1
(1 row)

The bug is that rows 2 and 3 are missing, as in case #1.

#3. \copy csv from file with  \. inside a quoted multi-line section

This is the case that the above linked report mentioned,
resulting in a failure to import.
In addition to being legal CSV, these contents can be produced by
Postgres itself exporting in CSV.

$ cat >/tmp/file-quoted.csv <<EOF
line1
"
\.
"
line2
EOF

$ psql <<EOF
create table contents(t text);
\copy contents from '/tmp/file-quoted.csv' csv;
EOF

Results in an error:

ERROR:    unterminated CSV quoted field
CONTEXT:  COPY contents, line 4: ""
\.
"

The expected result is that it imports 3 rows without error.


#4. \copy csv from file, single column, no quoting involved
This is the same case as #1 except it uses the client-server protocol.

$ cat >/tmp/file.csv <<EOF
line1
\.
line2
EOF

$ psql <<EOF
create table contents(t text);
\copy contents from '/tmp/file.csv' (format csv);
TABLE contents;
EOF

Results in
   t
-------
 line1
(1 row)


As in case #1, a single row is imported instead of 3 rows.


The proposed  patch addresses these cases by making the sequence
\. non-special in CSV (in fact backslash itself is a normal character in
CSV).
It does not address the cases when the data is embedded after
the COPY command or typed interactively in psql, since these cases
require an explicit end-of-data marker, and CSV does not have
the concept of an end-of-data marker.


Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jelte Fennema-Nio
Дата:
Сообщение: Re: SSL tests fail on OpenSSL v3.2.0
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Support TZ format code in to_timestamp()