Обсуждение: Emitting JSON to file using COPY TO

Поиск
Список
Период
Сортировка

Emitting JSON to file using COPY TO

От
Davin Shearer
Дата:
Hello!

I'm trying to emit a JSON aggregation of JSON rows to a file using COPY TO, but I'm running into problems with COPY TO double quoting the output.   Here is a minimal example that demonstrates the problem I'm having:

create table public.tbl_json_test (id int, t_test text);

-- insert text that includes double quotes
insert into public.tbl_json_test (id, t_test) values (1, 'here''s a "string"');

-- select a JSON aggregation of JSON rows
select json_agg(row_to_json(t)) from (select * from public.tbl_json_test) t;
-- this yields the correct result in proper JSON format:
-- [{"id":1,"t_test":"here's a \"string\""}]
copy (select json_agg(row_to_json(t)) from (select * from public.tbl_json_test) t) to '/tmp/tbl_json_test.json';
-- once the JSON results are copied to file, the JSON is broken due to double quoting:
-- [{"id":1,"t_test":"here's a \\"string\\""}]
-- this fails to be parsed using jq on the command line:
-- cat /tmp/tbl_json_test.json | jq .
-- jq: parse error: Invalid numeric literal at line 1, column 40

We populate a text field in a table with text containing at least one double-quote (").  We then select from that table, formating the result as a JSON aggregation of JSON rows.  At this point the JSON syntax is correct, with the double quotes being properly quoted.  The problem is that once we use COPY TO to emit the results to a file, the output gets quoted again with a second escape character (\), breaking the JSON and causing a syntax error (as we can see above using the `jq` command line tool).

I have tried to get COPY TO to copy the results to file "as-is" by setting the escape and the quote characters to the empty string (''), but they only apply to the CSV format.

Is there a way to emit JSON results to file from within postgres?  Effectively, nn "as-is" option to COPY TO would work well for this JSON use case.

Any assistance would be appreciated.

Thanks,
Davin

Re: Emitting JSON to file using COPY TO

От
"David G. Johnston"
Дата:
On Sat, Nov 25, 2023 at 12:22 PM Davin Shearer <scholarsmate@gmail.com> wrote:

Is there a way to emit JSON results to file from within postgres?

Use psql to directly output query results to a file instead of using COPY to output structured output in a format you don't want.

David J.

Re: Emitting JSON to file using COPY TO

От
Adrian Klaver
Дата:
On 11/25/23 11:21, Davin Shearer wrote:
> Hello!
> 
> I'm trying to emit a JSON aggregation of JSON rows to a file using COPY 
> TO, but I'm running into problems with COPY TO double quoting the 
> output.   Here is a minimal example that demonstrates the problem I'm 
> having:
> 

> I have tried to get COPY TO to copy the results to file "as-is" by 
> setting the escape and the quote characters to the empty string (''), 
> but they only apply to the CSV format.
> 
> Is there a way to emit JSON results to file from within postgres?  
> Effectively, nn "as-is" option to COPY TO would work well for this JSON 
> use case.
> 

Not using COPY.

See David Johnson's post for one way using the client psql.

Otherwise you will need to use any of the many ETL programs out there 
that are designed for this sort of thing.

> Any assistance would be appreciated.
> 
> Thanks,
> Davin

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Emitting JSON to file using COPY TO

От
Dominique Devienne
Дата:
On Sat, Nov 25, 2023 at 10:00 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 11/25/23 11:21, Davin Shearer wrote:
> Hello!
>
> I'm trying to emit a JSON aggregation of JSON rows to a file using COPY
> TO, but I'm running into problems with COPY TO double quoting the
> output.   Here is a minimal example that demonstrates the problem I'm
> having:
>

> I have tried to get COPY TO to copy the results to file "as-is" by
> setting the escape and the quote characters to the empty string (''),
> but they only apply to the CSV format.
>
> Is there a way to emit JSON results to file from within postgres? 
> Effectively, nn "as-is" option to COPY TO would work well for this JSON
> use case.
>

Not using COPY.

See David Johnson's post for one way using the client psql.

Otherwise you will need to use any of the many ETL programs out there
that are designed for this sort of thing.

Guys, I don't get answers like that. The JSON spec is clear:

>  

Re: Emitting JSON to file using COPY TO

От
Dominique Devienne
Дата:
On Mon, Nov 27, 2023 at 10:33 AM Dominique Devienne <ddevienne@gmail.com> wrote:
On Sat, Nov 25, 2023 at 10:00 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 11/25/23 11:21, Davin Shearer wrote:
> Hello!
>
> I'm trying to emit a JSON aggregation of JSON rows to a file using COPY
> TO, but I'm running into problems with COPY TO double quoting the
> output.   Here is a minimal example that demonstrates the problem I'm
> having:
>

> I have tried to get COPY TO to copy the results to file "as-is" by
> setting the escape and the quote characters to the empty string (''),
> but they only apply to the CSV format.
>
> Is there a way to emit JSON results to file from within postgres? 
> Effectively, nn "as-is" option to COPY TO would work well for this JSON
> use case.
>

Not using COPY.

See David Johnson's post for one way using the client psql.

Otherwise you will need to use any of the many ETL programs out there
that are designed for this sort of thing.

Guys, I don't get answers like that. The JSON spec is clear:

Oops, sorry, user error. --DD

PS: The JSON spec is a bit ambiguous. First it says

> Any codepoint except " or \ or control characters

And then is clearly shows \" as a valid sequence...
Sounds like JQ is too restrictive?

Or that's the double-escape that's the culprit?
i.e. \\ is in the final text, so that's just a backslash,
and then the double-quote is no longer escaped.

I've recently noticed json_agg(row_to_json(t))
is equivalent to json_agg(t)

Maybe use that instead? Does that make a difference?

I haven't noticed wrong escaping of double-quotes yet,
but then I'm using the binary mode of queries. Perhaps that matters.

On second thought, I guess that's COPY in its text modes doing the escaping?
Interesting. The text-based modes of COPY are configurable. There's even a JSON mode.
By miracle, would the JSON output mode recognize JSON[B] values, and avoid the escaping?


Re: Emitting JSON to file using COPY TO

От
"David G. Johnston"
Дата:
On Monday, November 27, 2023, Dominique Devienne <ddevienne@gmail.com> wrote:
There's even a JSON mode.
By miracle, would the JSON output mode recognize JSON[B] values, and avoid the escaping?

I agree there should be a copy option for “not formatted” so if you dump a single column result in that format you get the raw unescaped contents of the column. As soon as you ask for a format your json is now embedded so it is a value within another format and any structural aspects of the wrapper present in the json text representation need to be escaped.

David J.

Re: Emitting JSON to file using COPY TO

От
Pavel Stehule
Дата:
Hi

po 27. 11. 2023 v 14:27 odesílatel David G. Johnston <david.g.johnston@gmail.com> napsal:
On Monday, November 27, 2023, Dominique Devienne <ddevienne@gmail.com> wrote:
There's even a JSON mode.
By miracle, would the JSON output mode recognize JSON[B] values, and avoid the escaping?

I agree there should be a copy option for “not formatted” so if you dump a single column result in that format you get the raw unescaped contents of the column. As soon as you ask for a format your json is now embedded so it is a value within another format and any structural aspects of the wrapper present in the json text representation need to be escaped.

Is it better to use the LO API for this purpose?  It is native for not formatted data.

Regards

Pavel


David J.

Re: Emitting JSON to file using COPY TO

От
"David G. Johnston"
Дата:
On Monday, November 27, 2023, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

po 27. 11. 2023 v 14:27 odesílatel David G. Johnston <david.g.johnston@gmail.com> napsal:
On Monday, November 27, 2023, Dominique Devienne <ddevienne@gmail.com> wrote:
There's even a JSON mode.
By miracle, would the JSON output mode recognize JSON[B] values, and avoid the escaping?

I agree there should be a copy option for “not formatted” so if you dump a single column result in that format you get the raw unescaped contents of the column. As soon as you ask for a format your json is now embedded so it is a value within another format and any structural aspects of the wrapper present in the json text representation need to be escaped.

Is it better to use the LO API for this purpose?  It is native for not formatted data. 

Using LO is, IMO, never the answer.  But if you are using a driver API anyway just handle the normal select query result.

David J.

Re: Emitting JSON to file using COPY TO

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> I agree there should be a copy option for “not formatted” so if you dump a
> single column result in that format you get the raw unescaped contents of
> the column.

I'm not sure I even buy that.  JSON data in particular is typically
multi-line, so how will you know where the row boundaries are?
That is, is a newline a row separator or part of the data?

You can debate the intelligence of any particular quoting/escaping
scheme, but imagining that you can get away without having one at
all will just create its own problems.

            regards, tom lane



Re: Emitting JSON to file using COPY TO

От
Dominique Devienne
Дата:
On Mon, Nov 27, 2023 at 3:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> I agree there should be a copy option for “not formatted” so if you dump a
> single column result in that format you get the raw unescaped contents of
> the column.

I'm not sure I even buy that.  JSON data in particular is typically
multi-line, so how will you know where the row boundaries are?
That is, is a newline a row separator or part of the data?

You can debate the intelligence of any particular quoting/escaping
scheme, but imagining that you can get away without having one at
all will just create its own problems.

What I was suggesting is not about a "not formatted" option.
But rather than JSON values (i.e. typed `json` or `jsonb`) in a
JSON-formatted COPY operator, the JSON values should not be
serialized to text that is simply output as a JSON-text-value by COPY,
but "inlined" as a "real" JSON value without the JSON document output by COPY.

This is a special case, where the inner and outer "values" (for lack of a better terminology)
are *both* JSON documents, and given that JSON is hierarchical, the inner JSON value can
either by 1) serializing to text first, which must thus be escaped using the JSON escaping rules,
2) NOT serialized, but "inline" or "spliced-in" the outer COPY JSON document.

I guess COPY in JSON mode supports only #1 now? While #2 makes more sense to me.
But both options are valid. Is that clearer?

BTW, JSON is not multi-line, except for insignificant whitespace.
So even COPY in JSON mode is not supposed to be line based I guess?
Unless COPY in JSON mode is more like NDJSON (https://ndjson.org/)? --DD

Re: Emitting JSON to file using COPY TO

От
Adrian Klaver
Дата:
On 11/27/23 01:44, Dominique Devienne wrote:
> On Mon, Nov 27, 2023 at 10:33 AM Dominique Devienne <ddevienne@gmail.com 
> <mailto:ddevienne@gmail.com>> wrote:
> 

> On second thought, I guess that's COPY in its text modes doing the escaping?
> Interesting. The text-based modes of COPY are configurable. There's even 
> a JSON mode.

Where are you seeing the JSON mode for COPY? AFAIK there is only text 
and CSV formats.

> By miracle, would the JSON output mode recognize JSON[B] values, and 
> avoid the escaping?
> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Emitting JSON to file using COPY TO

От
Dominique Devienne
Дата:
On Mon, Nov 27, 2023 at 5:04 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 11/27/23 01:44, Dominique Devienne wrote:
> On Mon, Nov 27, 2023 at 10:33 AM Dominique Devienne <ddevienne@gmail.com
> <mailto:ddevienne@gmail.com>> wrote:
> On second thought, I guess that's COPY in its text modes doing the escaping?
> Interesting. The text-based modes of COPY are configurable. There's even
> a JSON mode.

Where are you seeing the JSON mode for COPY? AFAIK there is only text
and CSV formats.

Indeed. Somehow I thought there was...
I've used the TEXT and BINARY modes, and remembered a wishful thinking JSON mode!
OK then, if there was, then what I wrote would apply :). --DD

Re: Emitting JSON to file using COPY TO

От
Filip Sedlák
Дата:
This would be a very special case for COPY. It applies only to a single 
column of JSON values. The original problem can be solved with psql 
--tuples-only as David wrote earlier.


$ psql -tc 'select json_agg(row_to_json(t))
              from (select * from public.tbl_json_test) t;'

  [{"id":1,"t_test":"here's a \"string\""}]


Special-casing any encoding/escaping scheme leads to bugs and harder 
parsing.

Just my 2c.

--
Filip Sedlák



Re: Emitting JSON to file using COPY TO

От
Davin Shearer
Дата:
Thanks for the responses everyone.

I worked around the issue using the `psql -tc` method as Filip described.

I think it would be great to support writing JSON using COPY TO at some point so I can emit JSON to files using a PostgreSQL function directly.

-Davin

On Tue, Nov 28, 2023 at 2:36 AM Filip Sedlák <filip@sedlakovi.org> wrote:
This would be a very special case for COPY. It applies only to a single
column of JSON values. The original problem can be solved with psql
--tuples-only as David wrote earlier.


$ psql -tc 'select json_agg(row_to_json(t))
              from (select * from public.tbl_json_test) t;'

  [{"id":1,"t_test":"here's a \"string\""}]


Special-casing any encoding/escaping scheme leads to bugs and harder
parsing.

Just my 2c.

--
Filip Sedlák