Re: New "single" COPY format
От | Andrew Dunstan |
---|---|
Тема | Re: New "single" COPY format |
Дата | |
Msg-id | 0b70a518-f6cc-483b-8e1c-51a8585f0f72@dunslane.net обсуждение исходный текст |
Ответ на | Re: New "single" COPY format ("Joel Jacobson" <joel@compiler.org>) |
Ответы |
Re: New "single" COPY format
|
Список | pgsql-hackers |
On 2024-12-16 Mo 10:09 AM, Joel Jacobson wrote: > Hi hackers, > > After further consideration, I'm withdrawing the patch. > Some fundamental questions remain unresolved: > > - Should round-trip fidelity be a strict goal? By "round-trip fidelity", > I mean that data exported and then re-imported should yield exactly > the original values, including the distinction between NULL and empty strings. > - If round-trip fidelity is a requirement, how do we distinguish NULL from empty > strings without delimiters or escapes? > - Is automatic newline detection (as in "csv" and "text") more valuable than > the ability to embed \r (CR) characters? > - Would it be better to extend the existing COPY options rather than introducing > a new format? > - Or should we consider a JSONL format instead, one that avoids the NULL/empty > string problem entirely? > > No clear solution or consensus has emerged. For now, I'll step back from the > proposal. If someone wants to revisit this later, I'd be happy to contribute. > > Thanks again for all the feedback and consideration. > We seem to have got seriously into the weeds, here. I'd be sorry to see this dropped. After all, it's not something new, and while we have a sort of workaround for "one json doc per line" it's far from obvious, and except in a few blog posts undocumented. I think we're trying to be far too general here but in the absence of more general use cases. The ones I recall having encountered in the wild are: . one json datum per line . one json document per file . a sequence of json documents per file The last one is hard to deal with, and I think I've only seen it once or twice, so I suggest leaving it aside for now. Notice these are all JSON. I could imagine XML might have similar requirements, but I encounter it extremely rarely. Regarding NULL, an empty string is not a valid JSON literal, so there should be no confusion there. It is valid for XML, though. Given all that I think restricting ourselves to just the JSON cases, and possibly just to JSONL, would be perfectly reasonable. Regarding CR, it's not a valid character in a JSON string item, although it is valid in JSON whitespace. I would not treat it as magical unless it immediately precedes an NL. That gives rise to a very sight ambiguity, but I think it's one we could live with. As for what the format is called, I don't like the "LIST" proposal much, even for the general case. Seems too close to an array. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: