Re: COPY TO CSV produces data that is incompatible/unsafe for \COPY FROM CSV

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: COPY TO CSV produces data that is incompatible/unsafe for \COPY FROM CSV
Дата
Msg-id Yvu3N9Dn4iA5+8Up@momjian.us
обсуждение исходный текст
Ответ на Re: COPY TO CSV produces data that is incompatible/unsafe for \COPY FROM CSV  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: COPY TO CSV produces data that is incompatible/unsafe for \COPY FROM CSV  (Noah Misch <noah@leadboat.com>)
Re: COPY TO CSV produces data that is incompatible/unsafe for \COPY FROM CSV  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-bugs
On Sun, Aug 14, 2022 at 10:08:55AM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > The psql documentation says [\copy ... from stdin] looks for "\.".  It says
> > nothing about doing that for [\copy ... from filename].  I wonder if psql
> > should change the filename case to send the whole file to the server, ignoring
> > "\." inside.  (That is to say, change to match the psql documentation.)
> 
> This seems like a sensible solution.  The need to use an in-band EOF
> marker when reading from the command source file is clear, and there's
> no non-kluge way there.  But that doesn't mean we should extend it
> to separate files.  (I'm surprised to realize we do, actually.)

So, look what I found in psql/copy.c:

        /* check for EOF marker, but not on a partial line */
        if (at_line_begin)
        {
        /*
-->         * This code erroneously assumes '\.' on a line alone
-->         * inside a quoted CSV string terminates the \copy.
-->         * https://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@wrigleys.postgresql.org
         */
        if ((linelen == 3 && memcmp(fgresult, "\\.\n", 3) == 0) ||
            (linelen == 4 && memcmp(fgresult, "\\.\r\n", 4) == 0))
        {
            copydone = true;
        }
        }

I wondered who would reference an email thread in the source code, so I
looked at the whole 2012 thread:

    https://www.postgresql.org/message-id/flat/E1TdNVQ-0001ju-GO%40wrigleys.postgresql.org

and it was, of course, me.  :-(

Tom's analysis was similar ten years ago:

> Ugh.  This seems like a rather fundamental oversight in the CSV feature.
> The problem is that psql has no idea whether the copy is being done in
> CSV mode or not --- and even if it did, it doesn't parse the data fully
> enough to realize whether a \. line is inside quotes or not.
> 
> In the case of out-of-line data files, it might be reasonable to just
> dispense with the check for \. altogether and always ship the whole file
> to the backend; I think there's a \. check on the backend side.  (Not
> sure this is safe in V2 protocol, but I doubt anyone cares anymore
> about that.)
> 
> In the case of in-line data in a script file, CSV mode seems a bit
> broken in any case; there's no concept of a terminator in CSV, AFAIK.
> So maybe we don't have to worry about that.

I think at this point we should either fix this or document it.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson




В списке pgsql-bugs по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Excessive number of replication slots for 12->14 logical replication
Следующее
От: Francisco Olarte
Дата:
Сообщение: Re: [PATCH] Fix segfault calling PQflush on invalid connection