Обсуждение: Mention invalid null byte sequence
The following documentation comment has been logged on the website: Page: https://www.postgresql.org/docs/13/datatype-character.html Description: I discovered accidentally that PostgreSQL doesn't accept null byte in text type. It seems that Oracle does (see https://www.postgresql.org/message-id/de752e01-f36c-821e-9181-cfba78c0fbc8%40propaas.com) and SQLite does it too. So it should written in the character type that null byte are not accepted, it would make like easier to migrate to PostgreSQL :)
On Sat, 2020-12-05 at 21:58 +0000, PG Doc comments form wrote: > The following documentation comment has been logged on the website: > > Page: https://www.postgresql.org/docs/13/datatype-character.html > Description: > > I discovered accidentally that PostgreSQL doesn't accept null byte in text > type. It seems that Oracle does (see > https://www.postgresql.org/message-id/de752e01-f36c-821e-9181-cfba78c0fbc8%40propaas.com) > and SQLite does it too. > > So it should written in the character type that null byte are not accepted, > it would make like easier to migrate to PostgreSQL :) +1; how about the attached patch? Yours, Laurenz Albe
Вложения
Le 07/12/2020 à 10:02, Laurenz Albe a écrit :
On Sat, 2020-12-05 at 21:58 +0000, PG Doc comments form wrote:The following documentation comment has been logged on the website: Page: https://www.postgresql.org/docs/13/datatype-character.html Description: I discovered accidentally that PostgreSQL doesn't accept null byte in text type. It seems that Oracle does (see https://www.postgresql.org/message-id/de752e01-f36c-821e-9181-cfba78c0fbc8%40propaas.com) and SQLite does it too. So it should written in the character type that null byte are not accepted, it would make like easier to migrate to PostgreSQL :)+1; how about the attached patch?
That would be a good start indeed.
I don't know the policy for documentation redundancy in PostgreSQL, but it should be good to mention that also in https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-ESCAPE since the basic "SELECT E'la\x00la';" will fail while "SELECT E'la\x01la';" will not.
And, as a lazy person, I also would like to see it in the general datatype page, since it's a common behavior.
Anyway, merging the first patch will enable the search for "NUL character" to return a result, and that will be definitively a nice improvement!
Have a nice day!
Adrien
Laurenz Albe <laurenz.albe@cybertec.at> writes: > On Sat, 2020-12-05 at 21:58 +0000, PG Doc comments form wrote: >> The following documentation comment has been logged on the website: >> Page: https://www.postgresql.org/docs/13/datatype-character.html >> Description: >> >> So it should written in the character type that null byte are not accepted, >> it would make like easier to migrate to PostgreSQL :) > +1; how about the attached patch? I had thought that this was already documented, but after digging around I can only find it mentioned in the contexts of saying that literal strings and quoted identifiers can't contain \0. So yeah, we need to improve that. I agree with the submitter that the place one would expect to read about this is in datatype-character.html. So I'd propose the attached. Maybe there's reason to repeat the info in charset.sgml, but it seems like more of a datatype limitation than a character set issue. regards, tom lane diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index 5c8a92e250..9eb19a1c61 100644 --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -1209,6 +1209,14 @@ SELECT '52093.89'::money::numeric::float8; regular expressions. </para> + <para> + The characters that can be stored in any of these data types are + determined by the database character set, which is selected when + the database is created. Regardless of the specific character set, + the character with code zero (sometimes called NUL) cannot be stored. + For more information refer to <xref linkend="multibyte"/>. + </para> + <para> The storage requirement for a short string (up to 126 bytes) is 1 byte plus the actual string, which includes the space padding in the case of @@ -1246,10 +1254,7 @@ SELECT '52093.89'::money::numeric::float8; <para> Refer to <xref linkend="sql-syntax-strings"/> for information about the syntax of string literals, and to <xref linkend="functions"/> - for information about available operators and functions. The - database character set determines the character set used to store - textual values; for more information on character set support, - refer to <xref linkend="multibyte"/>. + for information about available operators and functions. </para> <example>
On Mon, 2020-12-07 at 15:27 -0500, Tom Lane wrote: > Laurenz Albe <laurenz.albe@cybertec.at> writes: > > On Sat, 2020-12-05 at 21:58 +0000, PG Doc comments form wrote: > > > The following documentation comment has been logged on the website: > > > Page: https://www.postgresql.org/docs/13/datatype-character.html > > > Description: > > > > > > So it should written in the character type that null byte are not accepted, > > > it would make like easier to migrate to PostgreSQL :) > > > > +1; how about the attached patch? > > I had thought that this was already documented, but after digging around > I can only find it mentioned in the contexts of saying that literal > strings and quoted identifiers can't contain \0. So yeah, we need to > improve that. > > I agree with the submitter that the place one would expect to read about > this is in datatype-character.html. So I'd propose the attached. > Maybe there's reason to repeat the info in charset.sgml, but it seems > like more of a datatype limitation than a character set issue. +1 on your patch. Yours, Laurenz Albe
Laurenz Albe <laurenz.albe@cybertec.at> writes: > On Mon, 2020-12-07 at 15:27 -0500, Tom Lane wrote: >> I agree with the submitter that the place one would expect to read about >> this is in datatype-character.html. So I'd propose the attached. >> Maybe there's reason to repeat the info in charset.sgml, but it seems >> like more of a datatype limitation than a character set issue. > +1 on your patch. Pushed, thanks for looking it over. regards, tom lane