Re: Emitting JSON to file using COPY TO

Поиск
Список
Период
Сортировка
От Davin Shearer
Тема Re: Emitting JSON to file using COPY TO
Дата
Msg-id CALvfUkByf6eou6J36-2Hw7Kdro80JD_Ch1AeUDAjsX8RpcA==A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Emitting JSON to file using COPY TO  (Nathan Bossart <nathandbossart@gmail.com>)
Ответы Re: Emitting JSON to file using COPY TO  (Joe Conway <mail@joeconway.com>)
Список pgsql-hackers
I'm really glad to see this taken up as a possible new feature and will definitely use it if it gets released.  I'm impressed with how clean, understandable, and approachable the postgres codebase is in general and how easy it is to read and understand this patch.

I reviewed the patch (though I didn't build and test the code) and have a concern with adding the '[' at the beginning and ']' at the end of the json output.  Those are already added by `json_agg` (https://www.postgresql.org/docs/current/functions-aggregate.html) as you can see in my initial email.  Adding them in the COPY TO may be redundant (e.g., [[{"key":"value"...}....]]).

I think COPY TO makes good sense to support, though COPY FROM maybe not so much as JSON isn't necessarily flat and rectangular like CSV.

For my use-case, I'm emitting JSON files to Apache NiFi for processing, and NiFi has superior handling of JSON (via JOLT parsers) versus CSV where parsing is generally done with regex.  I want to be able to emit JSON using a postgres function and thus COPY TO.

Definitely +1 for COPY TO.

I don't think COPY FROM will work out well unless the JSON is required to be flat and rectangular.  I would vote -1 to leave it out due to the necessary restrictions making it not generally useful.

Hope it helps,
Davin

On Fri, Dec 1, 2023 at 6:10 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Fri, Dec 01, 2023 at 02:28:55PM -0500, Joe Conway wrote:
> I did a quick PoC patch (attached) -- if there interest and no hard
> objections I would like to get it up to speed for the January commitfest.

Cool.  I would expect there to be interest, given all the other JSON
support that has been added thus far.

I noticed that, with the PoC patch, "json" is the only format that must be
quoted.  Without quotes, I see a syntax error.  I'm assuming there's a
conflict with another json-related rule somewhere in gram.y, but I haven't
tracked down exactly which one is causing it.

> 1. Is supporting JSON array format sufficient, or does it need to support
> some other options? How flexible does the support scheme need to be?

I don't presently have a strong opinion on this one.  My instinct would be
start with something simple, though.  I don't think we offer any special
options for log_destination...

> 2. This only supports COPY TO and we would undoubtedly want to support COPY
> FROM for JSON as well, but is that required from the start?

I would vote for including COPY FROM support from the start.

> !     if (!cstate->opts.json_mode)

I think it's unfortunate that this further complicates the branching in
CopyOneRowTo(), but after some quick glances at the code, I'm not sure it's
worth refactoring a bunch of stuff to make this nicer.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alexander Lakhin
Дата:
Сообщение: Re: Refactoring backend fork+exec code
Следующее
От: Lev Kokotov
Дата:
Сообщение: Re: Bug in pgbench prepared statements