Re: BUG #16236: Invalid escape encoding
От | Tom Lane |
---|---|
Тема | Re: BUG #16236: Invalid escape encoding |
Дата | |
Msg-id | 10420.1580398179@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | BUG #16236: Invalid escape encoding (PG Bug reporting form <noreply@postgresql.org>) |
Список | pgsql-bugs |
[ please keep the list cc'd ] =?utf-8?B?U3TDqXBoYW5l?= Campinas <stephane.campinas@gmail.com> writes: > myDatabaseName=# select encode('\x00017F80', 'escape'); > encode > ------------------ > \000\x01\x7F\200 > If I understand correctly, with the input "\x00017F80", I get the > outputted value above because: > - "00" is converted to "\000" > - "01" and "7F" get converted to "\x01" and "\x7F" respectively as they > are not 0 or a high-bit-set value > - "80" is converted to "\200" since it is a high-bit-set value The point here is that the encode function is only doing the first and last of those things. It lets the 01 and 7F bytes through as-is, because the text data type can store and transport those just fine. It's psql's table-printing code that is deciding that those bytes are nonprintable and then choosing to render them in the \x01 style. (The large distance between those bits of code helps to explain the inconsistency of style.) > Do you know why there is this distinction between high-bit-set values > and other non-printable characters ? Probably, whoever wrote the encode-as-escape code didn't see a need to escape anything that type text could store without it. That code's old enough that it might predate psql's decision to render control characters this way, too. (Type text won't store zero bytes, and it will only accept high-bit-set bytes if they form part of a validly encoded character, which limits the allowed sequences if the database encoding is, say, UTF8. So those cases *have* to be escaped in order to turn any valid bytea into a valid text object.) There's certainly an argument to be made that it'd be more friendly for encode() to escape these other byte values as well. But the code is operating as designed. > First, the following is strange: I cannot decode what the encode method > returned > myDatabaseName=# select encode('\x00017F80', 'escape'); > encode > ------------------ > \000\x01\x7F\200 > (1 row) > myDatabaseName=# select decode('\000\x01\x7F\200', 'escape'); > ERROR: invalid input syntax for type bytea That's because that's *not* what encode() returned, it's just how psql chose to print it. One way to write what encode() really returned is regression=# select octet_length(E'\\000\x01\x7F\\200'::text); octet_length -------------- 10 (1 row) regression=# select decode(E'\\000\x01\x7F\\200'::text, 'escape'); decode ------------ \x00017f80 (1 row) > Second, as I was poking around the code, I found out about the > "bytea_output". If I set it to "escape", I still get hexadecimals. Is > that expected ? Yes, because encode()'s output is type text and hence not subject to that setting. If you were looking for a way to control what psql does with these bytes, you'd have to look into its commands, probably \pset. (I don't think there is a way to control it, but if there was, that's where we'd put it.) regards, tom lane
В списке pgsql-bugs по дате отправления:
Следующее
От: "Walker, Jared (Contractor)"Дата:
Сообщение: RE: [EXTERNAL] Re: BUG #16131: pg_upgrade 9 -> 11.6 and a database ismissing