Re: Patch: [BUGS] BUG #12320: json parsing with embedded double quotes

Поиск
Список
Период
Сортировка
От Andrew Dunstan
Тема Re: Patch: [BUGS] BUG #12320: json parsing with embedded double quotes
Дата
Msg-id 54AD8CEF.3080904@dunslane.net
обсуждение исходный текст
Ответ на Patch: [BUGS] BUG #12320: json parsing with embedded double quotes  (Aaron Botsis <aaron@bt-r.com>)
Ответы Re: Patch: [BUGS] BUG #12320: json parsing with embedded double quotes  (Aaron Botsis <aaron@bt-r.com>)
Список pgsql-hackers

On 01/07/2015 08:25 AM, Aaron Botsis wrote:
> Hi folks, I was having a problem importing json data with COPY. Lots
> of things export data nicely as one json blob per line. This is
> excellent for directly importing into a JSON/JSONB column for analysis.
>
> ...Except when there’s an embedded doublequote. Or anything that’s
> escaped. COPY handles this, but by the time the escaped char hit the
> JSON parser, it's not escaped anymore. This breaks the JSON parsing.
> This means I need to manipulate the input data to double-escape it.
> See bug #12320 for an example. Yuck.
>
> I propose this small patch that simply allows specifying COPY … ESCAPE
> without requiring the CSV parser. It will make it much easier to
> directly use json formatted export data for folks going forward. This
> seemed like the simplest route.
>
> Usage is simply:
>
> postgres=# copy t1 from '/Users/nok/Desktop/queries.json';
> ERROR:  invalid input syntax for type json
> DETAIL:  Token "root" is invalid.
> CONTEXT:  JSON data, line 1: ...1418066241619 AND <=1418671041621) AND
> user:"root...
> COPY t1, line 3, column bleh:
> "{"timestamp":"2014-12-15T19:17:32.505Z","duration":7.947,"query":{"query":{"filtered":{"filter":{"qu..."
> postgres=# copy t1 from '/Users/nok/Desktop/queries.json' escape '';
> COPY 1966
>
>


This isn't a bug. Neither CSV format nor TEXT format are partucularly
suitable for json. I'm quite certain I could compose legal json that
will break your proposal (for example, with an embedded newline in the
white space.)

It's also unnecessary. CSV format, while not designed for this, is
nevertheless sufficiently flexible to allow successful import of json
data meeting certain criteria (essentially no newlines), like this:
   copy the_table(jsonfield)   from '/path/to/jsondata'   csv quote e'\x01' delimiter e'\x02';


You aren't the first person to encounter this problem. See
<http://adpgtech.blogspot.com/2014/09/importing-json-data.html>

Maybe we need to add something like this to the docs, or to the wiki.

Note too my comment in that blog post:
   Now this solution is a bit of a hack. I wonder if there's a case for   a COPY mode that simply treats each line as a
singledatum. I also   wonder if we need some more specialized tools for importing JSON,   possibly one or more Foreign
DataWrappers. Such things could   handle, say, embedded newline punctuation. 


cheers

andrew






В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: INSERT ... ON CONFLICT UPDATE and RLS
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Possible typo in create_policy.sgml