Re: Should CSV parsing be stricter about mid-field quotes?

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема Re: Should CSV parsing be stricter about mid-field quotes?
Дата
Msg-id CAFj8pRBPPfmL+xhBmZha+OAyJO2zXj+28RFPJdd2wS2+pfZc_Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Should CSV parsing be stricter about mid-field quotes?  ("Joel Jacobson" <joel@compiler.org>)
Ответы Re: Should CSV parsing be stricter about mid-field quotes?  ("Joel Jacobson" <joel@compiler.org>)
Список pgsql-hackers


čt 18. 5. 2023 v 8:01 odesílatel Joel Jacobson <joel@compiler.org> napsal:
On Thu, May 18, 2023, at 00:18, Kirk Wolak wrote:
> Here you go. Not horrible handling.  (I use DataGrip so I saved it from there
> directly as TSV, just for an extra datapoint).
>
> FWIW, if you copy/paste in windows, the data, the field with the tab gets
> split into another column in Excel. But saving it as a file, and opening it.
> Saving it as XLSX, and then having Excel save it as a TSV (versus opening a
> text file, and saving it back)

Very useful, thanks.

Interesting, DataGrip contrary to Excel doesn't quote fields with commas in TSV.
All the DataGrip/Excel TSV variants uses quoting when necessary,
contrary to Google Sheets's TSV-format, that doesn't quote fields at all.

Maybe there is another third implementation in Libre Office.

Generally TSV is not well specified, and then the implementations are not consistent.

 

DataGrip/Excel terminate also the last record with newline,
while Google Sheets omit the newline for the last record,
(which is bad, since then a streaming reader wouldn't know
if the last record is completed or not.)

This makes me think we probably shouldn't add a new TSV format,
since there is no consistency between vendors.
It's impossible to deduce with certainty if a TSV-field that
begins with a double quotation mark is quoted or unquoted.

Two alternative ideas:

1. How about adding a `WITHOUT QUOTE` or `QUOTE NONE` option in conjunction
with `COPY ... WITH CSV`?

Internally, it would just set

    quotec = '\0';`

so it would't affect performance at all.

2. How about adding a note on the complexities of dealing with TSV files in the
COPY documentation?

/Joel

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Joel Jacobson"
Дата:
Сообщение: Re: Should CSV parsing be stricter about mid-field quotes?
Следующее
От: Richard Guo
Дата:
Сообщение: Re: Assert failure of the cross-check for nullingrels