Re: BUG #18735: Specific multibyte character in psql file path command parameter for Windows
От | Tom Lane |
---|---|
Тема | Re: BUG #18735: Specific multibyte character in psql file path command parameter for Windows |
Дата | |
Msg-id | 2572359.1733424649@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | BUG #18735: Specific multibyte character in psql file path command parameter for Windows (PG Bug reporting form <noreply@postgresql.org>) |
Ответы |
Re: BUG #18735: Specific multibyte character in psql file path command parameter for Windows
Re: BUG #18735: Specific multibyte character in psql file path command parameter for Windows |
Список | pgsql-bugs |
PG Bug reporting form <noreply@postgresql.org> writes: > Analysis: > * Latter byte valueof the character in question is same as '\' (backslash). > It looks that this byte value is handled as escape characters. This > happns SHIFT JIS client encoding. > * The issue happens in \i, \ir and \copy but does not happen in \cd, \o and > \! command. I imagine what is happening here is that canonicalize_path() interprets the backslash bytes as directory separators. The only thing I can think of to improve that is to make canonicalize_path() encoding-aware and have it skip over multibyte characters. Unfortunately, I fear that would introduce as many misbehaviors as it would remove, because we don't always know the relevant encoding. We might be able to limit the hazard by confining the encoding-awareness to the initial Windows-only conversion of '\' to '/', but it'd still be pretty squishy. > * The similar issue may happen if the latter byte value of a multibyte > character is same as '/' (directory delimiter). I don't believe Shift-JIS uses '/' as part of multibyte characters, so it should be sufficient to consider '\'. BTW, according to wikipedia[1], backslash is not even part of the Shift-JIS character set: The single-byte characters 0x00 to 0x7F match the ASCII encoding, except for a yen sign (U+00A5) at 0x5C and an overline (U+203E) at 0x7E in place of the ASCII character set's backslash and tilde respectively (these deviations from ASCII align with JIS X 0201). The single-byte characters from 0xA1 to 0xDF map to the half-width katakana characters found in JIS X 0201. For double-byte characters, the first byte is always in the range 0x81 to 0x9F or the range 0xE0 to 0xEF (these ranges are unassigned in JIS X 0201). If the first byte is odd, the second byte must be in the range 0x40 to 0x9E (but cannot be 0x7F); if the first byte is even, the second byte must in the range 0x9F to 0xFC. This might mean that it'd be okay to just skip the backslash-to-slash conversion loops altogether if we think the encoding is Shift-JIS. There's still the question of how we determine the relevant encoding. I don't think client_encoding is what to use (and we won't have that at hand anyway, in programs other than psql). What we want to know is what fopen and related system calls will do with the path: they must have different behavior for Shift-JIS than other encodings, else none of your examples could work at all. I assume there's a way to find out what they think the relevant encoding is. make_native_path() adds even more fun: when should we convert '/' back to '\'? From the comments, this function is concerned with producing something that will be accepted as a command-line argument by other programs, so I wonder if we can even know what to do with any certainty. (In case it's not clear, I'm not volunteering to write or test any of this.) regards, tom lane [1] https://en.wikipedia.org/wiki/Shift_JIS
В списке pgsql-bugs по дате отправления: