Re: Should CSV parsing be stricter about mid-field quotes?

Поиск
Список
Период
Сортировка
От Joel Jacobson
Тема Re: Should CSV parsing be stricter about mid-field quotes?
Дата
Msg-id 5258a2fb-2590-49ec-a482-815c4e159675@app.fastmail.com
обсуждение исходный текст
Ответ на Re: Should CSV parsing be stricter about mid-field quotes?  (Andrew Dunstan <andrew@dunslane.net>)
Ответы Re: Should CSV parsing be stricter about mid-field quotes?
Re: Should CSV parsing be stricter about mid-field quotes?
Список pgsql-hackers
On Wed, May 17, 2023, at 19:42, Andrew Dunstan wrote:
> You can use CSV mode pretty reliably for TSV files. The trick is to use a
> quoting char that shouldn't appear, such as E'\x01' as well as setting the
> delimiter to E'\t'. Yes, it's far from obvious.

I've been using that trick myself many times in the past, but thanks to this
deep-dive into this topic, it looks to me like TEXT would be a better format
fit when dealing with unquoted TSV files, or?

OTOH, one would then need to inspect the TSV file doesn't contain \. on an empty
line...

I was about to suggest we perhaps should consider adding a TSV format, that
is like TEXT excluding the PostgreSQL specific things like \. and \N,
but then I tested exporting TSV from Numbers on Mac and Google Sheets,
and I can see there are incompatible differences. Numbers quote fields
that contain double-quote marks, while Google Sheets doesn't.
None of them (unsurpringly) uses midfield quoting though.

Anyone using Excel that could try exporting the following example as CSV/TSV?

CREATE TABLE t (a text, b text, c text, d text);
INSERT INTO t (a, b, c, d)
VALUES ('unquoted','a "quoted" string', 'field, with a comma', E'field\t with a tab');

I agree with you that it's unwise to change something that's been working
great for such a long time, and I agree CSV-files are probably not a problem
per se, but I think you will agree with me TSV-files is a different story,
from a user-friendliness and correctness perspective. Sure, we could just say
"Don't use TSV! Use CSV instead!" in the docs, that would be an improvement
I think, but there is currently nothing on "TSV" in the docs, so users will
google and find all these broken dangerous suggestions on work-arounds.

Thoughts?

/Joel

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kirk Wolak
Дата:
Сообщение: Re: psql: Could we get "-- " prefixing on the **** QUERY **** outputs? (ECHO_HIDDEN)
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: issue with meson builds on msys2