Обсуждение: Mention invalid null byte sequence

Поиск
Список
Период
Сортировка

Mention invalid null byte sequence

От
PG Doc comments form
Дата:
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/13/datatype-character.html
Description:

I discovered accidentally that PostgreSQL doesn't accept null byte in text
type. It seems that Oracle does (see
https://www.postgresql.org/message-id/de752e01-f36c-821e-9181-cfba78c0fbc8%40propaas.com)
and SQLite does it too.

So it should written in the character type that null byte are not accepted,
it would make like easier to migrate to PostgreSQL :)

Re: Mention invalid null byte sequence

От
Laurenz Albe
Дата:
On Sat, 2020-12-05 at 21:58 +0000, PG Doc comments form wrote:
> The following documentation comment has been logged on the website:
> 
> Page: https://www.postgresql.org/docs/13/datatype-character.html
> Description:
> 
> I discovered accidentally that PostgreSQL doesn't accept null byte in text
> type. It seems that Oracle does (see
> https://www.postgresql.org/message-id/de752e01-f36c-821e-9181-cfba78c0fbc8%40propaas.com)
> and SQLite does it too.
> 
> So it should written in the character type that null byte are not accepted,
> it would make like easier to migrate to PostgreSQL :)

+1; how about the attached patch?

Yours,
Laurenz Albe

Вложения

Re: Mention invalid null byte sequence

От
Adrien CLERC
Дата:
Le 07/12/2020 à 10:02, Laurenz Albe a écrit :
On Sat, 2020-12-05 at 21:58 +0000, PG Doc comments form wrote:
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/13/datatype-character.html
Description:

I discovered accidentally that PostgreSQL doesn't accept null byte in text
type. It seems that Oracle does (see
https://www.postgresql.org/message-id/de752e01-f36c-821e-9181-cfba78c0fbc8%40propaas.com)
and SQLite does it too.

So it should written in the character type that null byte are not accepted,
it would make like easier to migrate to PostgreSQL :)
+1; how about the attached patch?

That would be a good start indeed.

I don't know the policy for documentation redundancy in PostgreSQL, but it should be good to mention that also in https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-ESCAPE since the basic "SELECT E'la\x00la';" will fail while "SELECT E'la\x01la';" will not.

And, as a lazy person, I also would like to see it in the general datatype page, since it's a common behavior.

Anyway, merging the first patch will enable the search for "NUL character" to return a result, and that will be definitively a nice improvement!

Have a nice day!

Adrien

Re: Mention invalid null byte sequence

От
Tom Lane
Дата:
Laurenz Albe <laurenz.albe@cybertec.at> writes:
> On Sat, 2020-12-05 at 21:58 +0000, PG Doc comments form wrote:
>> The following documentation comment has been logged on the website:
>> Page: https://www.postgresql.org/docs/13/datatype-character.html
>> Description:
>> 
>> So it should written in the character type that null byte are not accepted,
>> it would make like easier to migrate to PostgreSQL :)

> +1; how about the attached patch?

I had thought that this was already documented, but after digging around
I can only find it mentioned in the contexts of saying that literal
strings and quoted identifiers can't contain \0.  So yeah, we need to
improve that.

I agree with the submitter that the place one would expect to read about
this is in datatype-character.html.  So I'd propose the attached.
Maybe there's reason to repeat the info in charset.sgml, but it seems
like more of a datatype limitation than a character set issue.

            regards, tom lane

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 5c8a92e250..9eb19a1c61 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -1209,6 +1209,14 @@ SELECT '52093.89'::money::numeric::float8;
     regular expressions.
    </para>
 
+   <para>
+    The characters that can be stored in any of these data types are
+    determined by the database character set, which is selected when
+    the database is created.  Regardless of the specific character set,
+    the character with code zero (sometimes called NUL) cannot be stored.
+    For more information refer to <xref linkend="multibyte"/>.
+   </para>
+
    <para>
     The storage requirement for a short string (up to 126 bytes) is 1 byte
     plus the actual string, which includes the space padding in the case of
@@ -1246,10 +1254,7 @@ SELECT '52093.89'::money::numeric::float8;
    <para>
     Refer to <xref linkend="sql-syntax-strings"/> for information about
     the syntax of string literals, and to <xref linkend="functions"/>
-    for information about available operators and functions. The
-    database character set determines the character set used to store
-    textual values; for more information on character set support,
-    refer to <xref linkend="multibyte"/>.
+    for information about available operators and functions.
    </para>
 
    <example>

Re: Mention invalid null byte sequence

От
Laurenz Albe
Дата:
On Mon, 2020-12-07 at 15:27 -0500, Tom Lane wrote:
> Laurenz Albe <laurenz.albe@cybertec.at> writes:
> > On Sat, 2020-12-05 at 21:58 +0000, PG Doc comments form wrote:
> > > The following documentation comment has been logged on the website:
> > > Page: https://www.postgresql.org/docs/13/datatype-character.html
> > > Description:
> > > 
> > > So it should written in the character type that null byte are not accepted,
> > > it would make like easier to migrate to PostgreSQL :)
> >
> > +1; how about the attached patch?
> 
> I had thought that this was already documented, but after digging around
> I can only find it mentioned in the contexts of saying that literal
> strings and quoted identifiers can't contain \0.  So yeah, we need to
> improve that.
> 
> I agree with the submitter that the place one would expect to read about
> this is in datatype-character.html.  So I'd propose the attached.
> Maybe there's reason to repeat the info in charset.sgml, but it seems
> like more of a datatype limitation than a character set issue.

+1 on your patch.

Yours,
Laurenz Albe




Re: Mention invalid null byte sequence

От
Tom Lane
Дата:
Laurenz Albe <laurenz.albe@cybertec.at> writes:
> On Mon, 2020-12-07 at 15:27 -0500, Tom Lane wrote:
>> I agree with the submitter that the place one would expect to read about
>> this is in datatype-character.html.  So I'd propose the attached.
>> Maybe there's reason to repeat the info in charset.sgml, but it seems
>> like more of a datatype limitation than a character set issue.

> +1 on your patch.

Pushed, thanks for looking it over.

            regards, tom lane