Re: Should CSV parsing be stricter about mid-field quotes?
От | Andrew Dunstan |
---|---|
Тема | Re: Should CSV parsing be stricter about mid-field quotes? |
Дата | |
Msg-id | 67dc3a37-8853-46bd-883e-df8a8c934368@dunslane.net обсуждение исходный текст |
Ответ на | Re: Should CSV parsing be stricter about mid-field quotes? ("Joel Jacobson" <joel@compiler.org>) |
Ответы |
Re: Should CSV parsing be stricter about mid-field quotes?
|
Список | pgsql-hackers |
On 2024-10-08 Tu 3:25 AM, Joel Jacobson wrote: > On Sun, Oct 6, 2024, at 15:12, Andrew Dunstan wrote: >> On 2024-10-04 Fr 12:19 PM, Joel Jacobson wrote: >>> 2. Avoid needing hacks like using E'\x01' as quoting char. >>> >>> Introduce QUOTE NONE and DELIMITER NONE, >>> to allow raw lines to be imported "as is" into a single text column. >> As I think I previously indicated, I'm perfectly happy about 2, because >> it replaces a far from obvious hack, but I am at best dubious about 1. > I've looked at how to implement this, and there is quite a lot of complexity > having to do with quoting and escaping. > > Need guidance on what you think would be best to do: > > 2a) Should we aim to support all NONE combinations, at the cost of increasing the > complexity at all code having to do with quoting, escaping and delimiters? > > 2b) Should we aim to only support the QUOTE NONE DELIMITER NONE ESCAPE NONE case, > useful to the real-life scenario we've identified, that is, importing raw log > lines into a single column, which could then be handed by a much simpler and > probably faster version of CopyReadAttributesCSV(), > e.g. named CopyReadAttributesUnquotedUnDelimited() or > maybe CopyReadAttributesRaw()? > (We also need to modify CopyReadLineText(), but seems we only need a > quote_none bool, to skip over the quoting code there, so don't think a > separate function is warranted there.) > > I think ESCAPE NONE should be implied from QUOTE NONE, since the default escape > character is the same as the quote character, so if there isn't any > quote character, then I think that would imply no escape character either. > > Can we think of any other valid, useful, realistic, and safe combinations of > QUOTE NONE, DELIMITER NONE and ESCAPE NONE, that would be interesting > to support? > > If not, then I think 2b looks more interesting, to reduce risk of accidental > misuse, simpler implementation, and since it also should allow importing > raw log files faster, thanks to the reduced complexity. > Off hand I can't think of a case other than 2b that would apply in the real world, although others might like to chime in here. If we're going to do that, let's find a shorter way to spell it. In fact, we should do that even if we go with 2a. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: