Re: Should CSV parsing be stricter about mid-field quotes?

Поиск
Список
Период
Сортировка
От Kirk Wolak
Тема Re: Should CSV parsing be stricter about mid-field quotes?
Дата
Msg-id CACLU5mSL=YSWnN787FFph-1QT3wqK9x7qcX=gvg4mqWD-4DoGA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Should CSV parsing be stricter about mid-field quotes?  ("Joel Jacobson" <joel@compiler.org>)
Ответы Re: Should CSV parsing be stricter about mid-field quotes?  ("Joel Jacobson" <joel@compiler.org>)
Список pgsql-hackers
On Wed, May 17, 2023 at 5:47 PM Joel Jacobson <joel@compiler.org> wrote:
On Wed, May 17, 2023, at 19:42, Andrew Dunstan wrote:
> You can use CSV mode pretty reliably for TSV files. The trick is to use a
> quoting char that shouldn't appear, such as E'\x01' as well as setting the
> delimiter to E'\t'. Yes, it's far from obvious.

I've been using that trick myself many times in the past, but thanks to this
deep-dive into this topic, it looks to me like TEXT would be a better format
fit when dealing with unquoted TSV files, or?

OTOH, one would then need to inspect the TSV file doesn't contain \. on an empty
line...

I was about to suggest we perhaps should consider adding a TSV format, that
is like TEXT excluding the PostgreSQL specific things like \. and \N,
but then I tested exporting TSV from Numbers on Mac and Google Sheets,
and I can see there are incompatible differences. Numbers quote fields
that contain double-quote marks, while Google Sheets doesn't.
None of them (unsurpringly) uses midfield quoting though.

Anyone using Excel that could try exporting the following example as CSV/TSV?

CREATE TABLE t (a text, b text, c text, d text);
INSERT INTO t (a, b, c, d)
VALUES ('unquoted','a "quoted" string', 'field, with a comma', E'field\t with a tab');


Here you go. Not horrible handling.  (I use DataGrip so I saved it from there directly as TSV,
just for an extra datapoint).

FWIW, if you copy/paste in windows, the data, the field with the tab gets split into another column in Excel.
But saving it as a file, and opening it.
Saving it as XLSX, and then having Excel save it as a TSV (versus opening a text file, and saving it back)

Kirk...

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: No buildfarm animals are running both typedefs and --with-llvm
Следующее
От: Jehan-Guillaume de Rorthais
Дата:
Сообщение: Re: Memory leak from ExecutorState context?