Re: Emitting JSON to file using COPY TO
| От | Andrew Dunstan |
|---|---|
| Тема | Re: Emitting JSON to file using COPY TO |
| Дата | |
| Msg-id | 74d3b4e2-d201-489d-9b9a-1a6d0eb492a6@dunslane.net обсуждение |
| Ответ на | Re: Emitting JSON to file using COPY TO (jian he <jian.universality@gmail.com>) |
| Ответы |
Re: Emitting JSON to file using COPY TO
|
| Список | pgsql-hackers |
On Sat, Feb 7, 2026 at 9:27 PM Junwang Zhao <zhjwpku@gmail.com> wrote:Here are some comments on v23: 0001: The refactor looks straightforward to me. Introducing a format field should make future extensions easier. One suggestion is that we could add some helper macros around format, for example: #define IS_FORMAT_CSV(format) (format == COPY_FORMAT_CSV) #define IS_FORMAT_TEXT_LIKE(format) \ (format == COPY_FORMAT_TEXT || format == COPY_FORMAT_CSV) I think this would improve readability.Personally, I don't like marcos....0002: Since you have moved the `CopyFormat enum` into 0001, the following commit msg should be rephrased. The CopyFormat enum was originally contributed by Joel Jacobson joel@compiler.org, later refactored by Jian He to address various issues, and further adapted by Junwang Zhao to support the newly introduced CopyToRoutine struct (commit 2e4127b6d2). - if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim) - ereport(ERROR, - (errcode(ERRCODE_SYNTAX_ERROR), - /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */ - errmsg("cannot specify %s in BINARY mode", "DELIMITER"))); + if (opts_out->delim) + { + if (opts_out->format == COPY_FORMAT_BINARY) + ereport(ERROR, + errcode(ERRCODE_SYNTAX_ERROR), + /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */ + errmsg("cannot specify %s in BINARY mode", "DELIMITER")); + else if (opts_out->format == COPY_FORMAT_JSON) + ereport(ERROR, + errcode(ERRCODE_SYNTAX_ERROR), + errmsg("cannot specify %s in JSON mode", "DELIMITER")); + } Can we add a function that converts CopyFormat to a string? Treating CopyFormat as %s in error messages would make the code shorter. However, I'm not sure whether this aligns with translation conventions, correct me if I'm wrong.I don’t think this is worth the added complexity. That said, I tried to simplify the code and changed it to: if (opts_out->delim && (opts_out->format == COPY_FORMAT_BINARY || opts_out->format == COPY_FORMAT_JSON)) ereport(ERROR, errcode(ERRCODE_SYNTAX_ERROR), opts_out->format == COPY_FORMAT_BINARY ? errmsg("cannot specify %s in BINARY mode", "DELIMITER") : errmsg("cannot specify %s in JSON mode", "DELIMITER"));- * CSV and text formats share the same TextLike routines except for the + * CSV and text, json formats share the same TextLike routines except for the I'd suggest rewording to `CSV, text and json ...`. The same applied to other parts in this patch.sure.0003: The commit message includes some changes(adapt the newly introduced CopyToRoutine) that actually belong to 0002; it would be better to remove them from this commit.0002 commit message: """ This introduces the JSON format option for the COPY TO command, allowing users to export query results or table data directly as a single JSON object or a stream of JSON objects. The JSON format is currently supported only for COPY TO operations; it is not available for COPY FROM. JSON format is incompatible with some standard text/CSV parsing or formatting options, including: - HEADER - DEFAULT - NULL - DELIMITER - FORCE QUOTE / FORCE NOT NULL Regression tests covering valid JSON exports and error handling for incompatible options have been added to src/test/regress/sql/copy.sql. """ 0003 commit message: """ Add option force_array for COPY JSON FORMAT This adds the force_array option, which is available exclusively when using COPY TO with the JSON format. When enabled, this option wraps the output in a top-level JSON array (enclosed in square brackets with comma-separated elements), making the entire result a valid single JSON value. Without this option, the default behavior is to output a stream of independent JSON objects. Attempting to use this option with COPY FROM or with formats other than JSON will raise an error. """+ if (cstate->json_row_delim_needed && cstate->opts.force_array) + CopySendChar(cstate, ','); + else if (cstate->opts.force_array) + { + /* first row needs no delimiter */ + CopySendChar(cstate, ' '); + cstate->json_row_delim_needed = true; + } can we do this: if (cstate->opts.force_array) { if (cstate->json_row_delim_needed) CopySendChar(cstate, ','); else { /* first row needs no delimiter */ CopySendChar(cstate, ' '); cstate->json_row_delim_needed = true; } }good suggestion. If more people think WRAP_ARRAY is better than FORCE_ARRAY, we can switch to it accordingly. The change itself is quite straightforward.
I have reworked these. First I cleaned up a number of things in patches 2 and 3 (Thanks, Claude for the summary):
Patch 2: json format for COPY TO
copy.c:
- "json" → "JSON" in the COPY FROM rejection error message.
copyto.c:
1. TupleDesc setup runs once, not every row — Added json_tupledesc_ready flag; the memcpy/populate_compact_attribute/BlessTupleDesc block is now guarded by if (!cstate->json_tupledesc_ready).
2. Comment rewritten — Old: "the slot's TupleDesc may change during query execution". New: explains BlessTupleDesc registers the RECORDOID descriptor so lookup_rowtype_tupdesc inside composite_to_json can
find it.
3. Eliminated per-row makeStringInfo() — Added StringInfoData json_buf to the struct, initialized once in CopyToTextLikeStart in copycontext. Each row does resetStringInfo instead of allocating a new
StringInfo.
4. Column list rejection added — Error: "column selection is not supported in JSON mode" when attnamelist != NIL.
5. Improved SendCopyBegin comment — Old: "JSON format is always one non-binary column". New: explains each CopyData message contains one complete JSON object.
Tests:
6. Added copy copytest (style) to stdout (format json) to the error-case block.
7. Added copyjsontype table test with json, jsonb columns — verifies values are embedded directly, not double-encoded. Covers json objects, scalars, arrays, nested objects, and nulls.
Patch 3: Add option force_array
copy.c:
1. "json" → "JSON" in COPY FROM error (carried from patch 2).
2. "can only used" → "can only be used" — grammar fix in FORCE_ARRAY error.
copyto.c:
3. Struct fields reorganized — JSON fields grouped under /* JSON format state */ comment with inline descriptions, instead of a standalone json_row_delim_needed with a vague comment.
4. Block comment updated — Was "CSV, text and json formats share the same TextLike routines except for the one-row callback". Now correctly notes JSON has its own one-row and end callbacks.
5. CopyToTextLikeEnd comment fixed — Was "text, CSV, and json", now "text and CSV" (JSON uses CopyToJsonEnd).
6. Cleaner start callback — FORCE_ARRAY bracket emission moved inside the if (format == JSON) block after json_buf init, instead of a separate top-level conditional.
7. Inherits all patch 2 fixes — TupleDesc guard, reusable json_buf, improved comments, column list rejection.
Tests:
8. --Error → -- should fail: force_array requires json format; --ok → -- force_array variants.
9. copyjsontype test carried through from patch 2.
Then I reworked the way this works. In order to support column lists with JSON output, we need to deal with individual columns instead of whole records. This involved quite a number of changes, as can be seen in patch 4. This involved exporting a new small function from json.c.
The result is a lot cleaner, I believe, and in my benchmarking is faster by a factor of almost 2.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: