Re: Should CSV parsing be stricter about mid-field quotes?

Поиск
Список
Период
Сортировка
От Joel Jacobson
Тема Re: Should CSV parsing be stricter about mid-field quotes?
Дата
Msg-id 777be2db-f201-49d2-961b-0779f0f0d5ac@app.fastmail.com
обсуждение исходный текст
Ответ на Re: Should CSV parsing be stricter about mid-field quotes?  (Kirk Wolak <wolakk@gmail.com>)
Ответы Re: Should CSV parsing be stricter about mid-field quotes?  ("Joel Jacobson" <joel@compiler.org>)
Re: Should CSV parsing be stricter about mid-field quotes?  (Pavel Stehule <pavel.stehule@gmail.com>)
Список pgsql-hackers
On Thu, May 18, 2023, at 00:18, Kirk Wolak wrote:
> Here you go. Not horrible handling.  (I use DataGrip so I saved it from there
> directly as TSV, just for an extra datapoint).
>
> FWIW, if you copy/paste in windows, the data, the field with the tab gets
> split into another column in Excel. But saving it as a file, and opening it.
> Saving it as XLSX, and then having Excel save it as a TSV (versus opening a
> text file, and saving it back)

Very useful, thanks.

Interesting, DataGrip contrary to Excel doesn't quote fields with commas in TSV.
All the DataGrip/Excel TSV variants uses quoting when necessary,
contrary to Google Sheets's TSV-format, that doesn't quote fields at all.

DataGrip/Excel terminate also the last record with newline,
while Google Sheets omit the newline for the last record,
(which is bad, since then a streaming reader wouldn't know
if the last record is completed or not.)

This makes me think we probably shouldn't add a new TSV format,
since there is no consistency between vendors.
It's impossible to deduce with certainty if a TSV-field that
begins with a double quotation mark is quoted or unquoted.

Two alternative ideas:

1. How about adding a `WITHOUT QUOTE` or `QUOTE NONE` option in conjunction
with `COPY ... WITH CSV`?

Internally, it would just set

    quotec = '\0';`

so it would't affect performance at all.

2. How about adding a note on the complexities of dealing with TSV files in the
COPY documentation?

/Joel

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bharath Rupireddy
Дата:
Сообщение: Re: WAL Insertion Lock Improvements
Следующее
От: "Joel Jacobson"
Дата:
Сообщение: Re: Should CSV parsing be stricter about mid-field quotes?