Re: jsonb, unicode escapes and escaped backslashes

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: jsonb, unicode escapes and escaped backslashes
Дата
Msg-id 2127.1422598836@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: jsonb, unicode escapes and escaped backslashes  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: jsonb, unicode escapes and escaped backslashes  (Peter Geoghegan <pg@heroku.com>)
Список pgsql-hackers
Attached is a draft patch for this.  It basically reverts commit
0ad1a816320a2b539a51628e2a0b1e83ff096b1d, adds a ban of \u0000 if
that would need to be converted to text (so it still works in the
plain json type, so long as you don't do much processing), and adds
some regression tests.

I made the \u0000 error be errcode(ERRCODE_INVALID_TEXT_REPRESENTATION)
and errmsg("invalid input syntax for type json"), by analogy to what's
thrown for non-ASCII Unicode escapes in non-UTF8 encoding.  I'm not
terribly happy with that, though.  ISTM that for both cases, this is
not "invalid syntax" at all, but an implementation restriction that
forces us to reject perfectly valid syntax.  So I think we ought to
use a different ERRCODE and text message, though I'm not entirely
sure what it should be instead.  ERRCODE_FEATURE_NOT_SUPPORTED is
one possibility.

            regards, tom lane

diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml
index 8feb2fb..b4b97a7 100644
*** a/doc/src/sgml/json.sgml
--- b/doc/src/sgml/json.sgml
***************
*** 69,80 ****
    regardless of the database encoding, and are checked only for syntactic
    correctness (that is, that four hex digits follow <literal>\u</>).
    However, the input function for <type>jsonb</> is stricter: it disallows
!   Unicode escapes for non-ASCII characters (those
!   above <literal>U+007F</>) unless the database encoding is UTF8.  It also
!   insists that any use of Unicode surrogate pairs to designate characters
!   outside the Unicode Basic Multilingual Plane be correct.  Valid Unicode
!   escapes, except for <literal>\u0000</>, are then converted to the
!   equivalent ASCII or UTF8 character for storage.
   </para>

   <note>
--- 69,82 ----
    regardless of the database encoding, and are checked only for syntactic
    correctness (that is, that four hex digits follow <literal>\u</>).
    However, the input function for <type>jsonb</> is stricter: it disallows
!   Unicode escapes for non-ASCII characters (those above <literal>U+007F</>)
!   unless the database encoding is UTF8.  The <type>jsonb</> type also
!   rejects <literal>\u0000</> (because that cannot be represented in
!   <productname>PostgreSQL</productname>'s <type>text</> type), and it insists
!   that any use of Unicode surrogate pairs to designate characters outside
!   the Unicode Basic Multilingual Plane be correct.  Valid Unicode escapes
!   are converted to the equivalent ASCII or UTF8 character for storage;
!   this includes folding surrogate pairs into a single character.
   </para>

   <note>
***************
*** 134,140 ****
         <row>
          <entry><type>string</></entry>
          <entry><type>text</></entry>
!         <entry>See notes above concerning encoding restrictions</entry>
         </row>
         <row>
          <entry><type>number</></entry>
--- 136,143 ----
         <row>
          <entry><type>string</></entry>
          <entry><type>text</></entry>
!         <entry><literal>\u0000</> is disallowed, as are non-ASCII Unicode
!          escapes if database encoding is not UTF8</entry>
         </row>
         <row>
          <entry><type>number</></entry>
diff --git a/doc/src/sgml/release-9.4.sgml b/doc/src/sgml/release-9.4.sgml
index 961e461..11bbf3b 100644
*** a/doc/src/sgml/release-9.4.sgml
--- b/doc/src/sgml/release-9.4.sgml
***************
*** 103,124 ****

      <listitem>
       <para>
-       Unicode escapes in <link linkend="datatype-json"><type>JSON</type></link>
-       text values are no longer rendered with the backslash escaped
-       (Andrew Dunstan)
-      </para>
-
-      <para>
-       Previously, all backslashes in text values being formed into JSON
-       were escaped. Now a backslash followed by <literal>u</> and four
-       hexadecimal digits is not escaped, as this is a legal sequence in a
-       JSON string value, and escaping the backslash led to some perverse
-       results.
-      </para>
-     </listitem>
-
-     <listitem>
-      <para>
        When converting values of type <type>date</>, <type>timestamp</>
        or <type>timestamptz</>
        to <link linkend="datatype-json"><type>JSON</type></link>, render the
--- 103,108 ----
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 3c137ea..4e46b0a 100644
*** a/src/backend/utils/adt/json.c
--- b/src/backend/utils/adt/json.c
*************** json_lex_string(JsonLexContext *lex)
*** 806,819 ****
                       * For UTF8, replace the escape sequence by the actual
                       * utf8 character in lex->strval. Do this also for other
                       * encodings if the escape designates an ASCII character,
!                      * otherwise raise an error. We don't ever unescape a
!                      * \u0000, since that would result in an impermissible nul
!                      * byte.
                       */

                      if (ch == 0)
                      {
!                         appendStringInfoString(lex->strval, "\\u0000");
                      }
                      else if (GetDatabaseEncoding() == PG_UTF8)
                      {
--- 806,822 ----
                       * For UTF8, replace the escape sequence by the actual
                       * utf8 character in lex->strval. Do this also for other
                       * encodings if the escape designates an ASCII character,
!                      * otherwise raise an error.
                       */

                      if (ch == 0)
                      {
!                         /* We can't allow this, since our TEXT type doesn't */
!                         ereport(ERROR,
!                                 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
!                                  errmsg("invalid input syntax for type json"),
!                                  errdetail("\\u0000 cannot be converted to text."),
!                                  report_json_context(lex)));
                      }
                      else if (GetDatabaseEncoding() == PG_UTF8)
                      {
*************** escape_json(StringInfo buf, const char *
*** 2382,2411 ****
                  appendStringInfoString(buf, "\\\"");
                  break;
              case '\\':
!
!                 /*
!                  * Unicode escapes are passed through as is. There is no
!                  * requirement that they denote a valid character in the
!                  * server encoding - indeed that is a big part of their
!                  * usefulness.
!                  *
!                  * All we require is that they consist of \uXXXX where the Xs
!                  * are hexadecimal digits. It is the responsibility of the
!                  * caller of, say, to_json() to make sure that the unicode
!                  * escape is valid.
!                  *
!                  * In the case of a jsonb string value being escaped, the only
!                  * unicode escape that should be present is \u0000, all the
!                  * other unicode escapes will have been resolved.
!                  */
!                 if (p[1] == 'u' &&
!                     isxdigit((unsigned char) p[2]) &&
!                     isxdigit((unsigned char) p[3]) &&
!                     isxdigit((unsigned char) p[4]) &&
!                     isxdigit((unsigned char) p[5]))
!                     appendStringInfoCharMacro(buf, *p);
!                 else
!                     appendStringInfoString(buf, "\\\\");
                  break;
              default:
                  if ((unsigned char) *p < ' ')
--- 2385,2391 ----
                  appendStringInfoString(buf, "\\\"");
                  break;
              case '\\':
!                 appendStringInfoString(buf, "\\\\");
                  break;
              default:
                  if ((unsigned char) *p < ' ')
diff --git a/src/test/regress/expected/json.out b/src/test/regress/expected/json.out
index e435d3e..63d6cf6 100644
*** a/src/test/regress/expected/json.out
--- b/src/test/regress/expected/json.out
*************** select to_json(timestamptz '2014-05-28 1
*** 426,445 ****
  (1 row)

  COMMIT;
- -- unicode escape - backslash is not escaped
- select to_json(text '\uabcd');
-  to_json
- ----------
-  "\uabcd"
- (1 row)
-
- -- any other backslash is escaped
- select to_json(text '\abcd');
-  to_json
- ----------
-  "\\abcd"
- (1 row)
-
  --json_agg
  SELECT json_agg(q)
    FROM ( SELECT $$a$$ || x AS b, y AS c,
--- 426,431 ----
*************** ERROR:  invalid input syntax for type js
*** 1400,1405 ****
--- 1386,1421 ----
  DETAIL:  Unicode low surrogate must follow a high surrogate.
  CONTEXT:  JSON data, line 1: { "a":...
  --handling of simple unicode escapes
+ select json '{ "a":  "the Copyright \u00a9 sign" }' as correct_in_utf8;
+             correct_in_utf8
+ ---------------------------------------
+  { "a":  "the Copyright \u00a9 sign" }
+ (1 row)
+
+ select json '{ "a":  "dollar \u0024 character" }' as correct_everywhere;
+          correct_everywhere
+ -------------------------------------
+  { "a":  "dollar \u0024 character" }
+ (1 row)
+
+ select json '{ "a":  "dollar \\u0024 character" }' as not_an_escape;
+             not_an_escape
+ --------------------------------------
+  { "a":  "dollar \\u0024 character" }
+ (1 row)
+
+ select json '{ "a":  "null \u0000 escape" }' as not_unescaped;
+          not_unescaped
+ --------------------------------
+  { "a":  "null \u0000 escape" }
+ (1 row)
+
+ select json '{ "a":  "null \\u0000 escape" }' as not_an_escape;
+           not_an_escape
+ ---------------------------------
+  { "a":  "null \\u0000 escape" }
+ (1 row)
+
  select json '{ "a":  "the Copyright \u00a9 sign" }' ->> 'a' as correct_in_utf8;
     correct_in_utf8
  ----------------------
*************** select json '{ "a":  "dollar \u0024 char
*** 1412,1419 ****
   dollar $ character
  (1 row)

! select json '{ "a":  "null \u0000 escape" }' ->> 'a' as not_unescaped;
!    not_unescaped
  --------------------
   null \u0000 escape
  (1 row)
--- 1428,1445 ----
   dollar $ character
  (1 row)

! select json '{ "a":  "dollar \\u0024 character" }' ->> 'a' as not_an_escape;
!       not_an_escape
! -------------------------
!  dollar \u0024 character
! (1 row)
!
! select json '{ "a":  "null \u0000 escape" }' ->> 'a' as fails;
! ERROR:  invalid input syntax for type json
! DETAIL:  \u0000 cannot be converted to text.
! CONTEXT:  JSON data, line 1: { "a":...
! select json '{ "a":  "null \\u0000 escape" }' ->> 'a' as not_an_escape;
!    not_an_escape
  --------------------
   null \u0000 escape
  (1 row)
diff --git a/src/test/regress/expected/json_1.out b/src/test/regress/expected/json_1.out
index 106b481..550a14a 100644
*** a/src/test/regress/expected/json_1.out
--- b/src/test/regress/expected/json_1.out
*************** select to_json(timestamptz '2014-05-28 1
*** 426,445 ****
  (1 row)

  COMMIT;
- -- unicode escape - backslash is not escaped
- select to_json(text '\uabcd');
-  to_json
- ----------
-  "\uabcd"
- (1 row)
-
- -- any other backslash is escaped
- select to_json(text '\abcd');
-  to_json
- ----------
-  "\\abcd"
- (1 row)
-
  --json_agg
  SELECT json_agg(q)
    FROM ( SELECT $$a$$ || x AS b, y AS c,
--- 426,431 ----
*************** ERROR:  invalid input syntax for type js
*** 1398,1403 ****
--- 1384,1419 ----
  DETAIL:  Unicode low surrogate must follow a high surrogate.
  CONTEXT:  JSON data, line 1: { "a":...
  --handling of simple unicode escapes
+ select json '{ "a":  "the Copyright \u00a9 sign" }' as correct_in_utf8;
+             correct_in_utf8
+ ---------------------------------------
+  { "a":  "the Copyright \u00a9 sign" }
+ (1 row)
+
+ select json '{ "a":  "dollar \u0024 character" }' as correct_everywhere;
+          correct_everywhere
+ -------------------------------------
+  { "a":  "dollar \u0024 character" }
+ (1 row)
+
+ select json '{ "a":  "dollar \\u0024 character" }' as not_an_escape;
+             not_an_escape
+ --------------------------------------
+  { "a":  "dollar \\u0024 character" }
+ (1 row)
+
+ select json '{ "a":  "null \u0000 escape" }' as not_unescaped;
+          not_unescaped
+ --------------------------------
+  { "a":  "null \u0000 escape" }
+ (1 row)
+
+ select json '{ "a":  "null \\u0000 escape" }' as not_an_escape;
+           not_an_escape
+ ---------------------------------
+  { "a":  "null \\u0000 escape" }
+ (1 row)
+
  select json '{ "a":  "the Copyright \u00a9 sign" }' ->> 'a' as correct_in_utf8;
  ERROR:  invalid input syntax for type json
  DETAIL:  Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8.
*************** select json '{ "a":  "dollar \u0024 char
*** 1408,1415 ****
   dollar $ character
  (1 row)

! select json '{ "a":  "null \u0000 escape" }' ->> 'a' as not_unescaped;
!    not_unescaped
  --------------------
   null \u0000 escape
  (1 row)
--- 1424,1441 ----
   dollar $ character
  (1 row)

! select json '{ "a":  "dollar \\u0024 character" }' ->> 'a' as not_an_escape;
!       not_an_escape
! -------------------------
!  dollar \u0024 character
! (1 row)
!
! select json '{ "a":  "null \u0000 escape" }' ->> 'a' as fails;
! ERROR:  invalid input syntax for type json
! DETAIL:  \u0000 cannot be converted to text.
! CONTEXT:  JSON data, line 1: { "a":...
! select json '{ "a":  "null \\u0000 escape" }' ->> 'a' as not_an_escape;
!    not_an_escape
  --------------------
   null \u0000 escape
  (1 row)
diff --git a/src/test/regress/expected/jsonb.out b/src/test/regress/expected/jsonb.out
index aa5686f..8d5da21 100644
*** a/src/test/regress/expected/jsonb.out
--- b/src/test/regress/expected/jsonb.out
*************** LINE 1: SELECT '"\u000g"'::jsonb;
*** 60,71 ****
                 ^
  DETAIL:  "\u" must be followed by four hexadecimal digits.
  CONTEXT:  JSON data, line 1: "\u000g...
! SELECT '"\u0000"'::jsonb;        -- OK, legal escape
!   jsonb
! ----------
!  "\u0000"
  (1 row)

  -- use octet_length here so we don't get an odd unicode char in the
  -- output
  SELECT octet_length('"\uaBcD"'::jsonb::text); -- OK, uppercase and lower case both OK
--- 60,77 ----
                 ^
  DETAIL:  "\u" must be followed by four hexadecimal digits.
  CONTEXT:  JSON data, line 1: "\u000g...
! SELECT '"\u0045"'::jsonb;        -- OK, legal escape
!  jsonb
! -------
!  "E"
  (1 row)

+ SELECT '"\u0000"'::jsonb;        -- ERROR, we don't support U+0000
+ ERROR:  invalid input syntax for type json
+ LINE 1: SELECT '"\u0000"'::jsonb;
+                ^
+ DETAIL:  \u0000 cannot be converted to text.
+ CONTEXT:  JSON data, line 1: ...
  -- use octet_length here so we don't get an odd unicode char in the
  -- output
  SELECT octet_length('"\uaBcD"'::jsonb::text); -- OK, uppercase and lower case both OK
*************** select to_jsonb(timestamptz '2014-05-28
*** 324,343 ****
  (1 row)

  COMMIT;
- -- unicode escape - backslash is not escaped
- select to_jsonb(text '\uabcd');
-  to_jsonb
- ----------
-  "\uabcd"
- (1 row)
-
- -- any other backslash is escaped
- select to_jsonb(text '\abcd');
-  to_jsonb
- ----------
-  "\\abcd"
- (1 row)
-
  --jsonb_agg
  CREATE TEMP TABLE rows AS
  SELECT x, 'txt' || x as y
--- 330,335 ----
*************** LINE 1: SELECT jsonb '{ "a":  "\ude04X"
*** 1971,1990 ****
  DETAIL:  Unicode low surrogate must follow a high surrogate.
  CONTEXT:  JSON data, line 1: { "a":...
  -- handling of simple unicode escapes
! SELECT jsonb '{ "a":  "the Copyright \u00a9 sign" }' ->> 'a' AS correct_in_utf8;
     correct_in_utf8
  ----------------------
   the Copyright �� sign
  (1 row)

! SELECT jsonb '{ "a":  "dollar \u0024 character" }' ->> 'a' AS correct_everyWHERE;
   correct_everywhere
  --------------------
   dollar $ character
  (1 row)

! SELECT jsonb '{ "a":  "null \u0000 escape" }' ->> 'a' AS not_unescaped;
!    not_unescaped
  --------------------
   null \u0000 escape
  (1 row)
--- 1963,2024 ----
  DETAIL:  Unicode low surrogate must follow a high surrogate.
  CONTEXT:  JSON data, line 1: { "a":...
  -- handling of simple unicode escapes
! SELECT jsonb '{ "a":  "the Copyright \u00a9 sign" }' as correct_in_utf8;
!         correct_in_utf8
! -------------------------------
!  {"a": "the Copyright �� sign"}
! (1 row)
!
! SELECT jsonb '{ "a":  "dollar \u0024 character" }' as correct_everywhere;
!      correct_everywhere
! -----------------------------
!  {"a": "dollar $ character"}
! (1 row)
!
! SELECT jsonb '{ "a":  "dollar \\u0024 character" }' as not_an_escape;
!            not_an_escape
! -----------------------------------
!  {"a": "dollar \\u0024 character"}
! (1 row)
!
! SELECT jsonb '{ "a":  "null \u0000 escape" }' as fails;
! ERROR:  invalid input syntax for type json
! LINE 1: SELECT jsonb '{ "a":  "null \u0000 escape" }' as fails;
!                      ^
! DETAIL:  \u0000 cannot be converted to text.
! CONTEXT:  JSON data, line 1: { "a":...
! SELECT jsonb '{ "a":  "null \\u0000 escape" }' as not_an_escape;
!         not_an_escape
! ------------------------------
!  {"a": "null \\u0000 escape"}
! (1 row)
!
! SELECT jsonb '{ "a":  "the Copyright \u00a9 sign" }' ->> 'a' as correct_in_utf8;
     correct_in_utf8
  ----------------------
   the Copyright �� sign
  (1 row)

! SELECT jsonb '{ "a":  "dollar \u0024 character" }' ->> 'a' as correct_everywhere;
   correct_everywhere
  --------------------
   dollar $ character
  (1 row)

! SELECT jsonb '{ "a":  "dollar \\u0024 character" }' ->> 'a' as not_an_escape;
!       not_an_escape
! -------------------------
!  dollar \u0024 character
! (1 row)
!
! SELECT jsonb '{ "a":  "null \u0000 escape" }' ->> 'a' as fails;
! ERROR:  invalid input syntax for type json
! LINE 1: SELECT jsonb '{ "a":  "null \u0000 escape" }' ->> 'a' as fai...
!                      ^
! DETAIL:  \u0000 cannot be converted to text.
! CONTEXT:  JSON data, line 1: { "a":...
! SELECT jsonb '{ "a":  "null \\u0000 escape" }' ->> 'a' as not_an_escape;
!    not_an_escape
  --------------------
   null \u0000 escape
  (1 row)
diff --git a/src/test/regress/expected/jsonb_1.out b/src/test/regress/expected/jsonb_1.out
index 687ae63..b7b0201 100644
*** a/src/test/regress/expected/jsonb_1.out
--- b/src/test/regress/expected/jsonb_1.out
*************** LINE 1: SELECT '"\u000g"'::jsonb;
*** 60,71 ****
                 ^
  DETAIL:  "\u" must be followed by four hexadecimal digits.
  CONTEXT:  JSON data, line 1: "\u000g...
! SELECT '"\u0000"'::jsonb;        -- OK, legal escape
!   jsonb
! ----------
!  "\u0000"
  (1 row)

  -- use octet_length here so we don't get an odd unicode char in the
  -- output
  SELECT octet_length('"\uaBcD"'::jsonb::text); -- OK, uppercase and lower case both OK
--- 60,77 ----
                 ^
  DETAIL:  "\u" must be followed by four hexadecimal digits.
  CONTEXT:  JSON data, line 1: "\u000g...
! SELECT '"\u0045"'::jsonb;        -- OK, legal escape
!  jsonb
! -------
!  "E"
  (1 row)

+ SELECT '"\u0000"'::jsonb;        -- ERROR, we don't support U+0000
+ ERROR:  invalid input syntax for type json
+ LINE 1: SELECT '"\u0000"'::jsonb;
+                ^
+ DETAIL:  \u0000 cannot be converted to text.
+ CONTEXT:  JSON data, line 1: ...
  -- use octet_length here so we don't get an odd unicode char in the
  -- output
  SELECT octet_length('"\uaBcD"'::jsonb::text); -- OK, uppercase and lower case both OK
*************** select to_jsonb(timestamptz '2014-05-28
*** 324,343 ****
  (1 row)

  COMMIT;
- -- unicode escape - backslash is not escaped
- select to_jsonb(text '\uabcd');
-  to_jsonb
- ----------
-  "\uabcd"
- (1 row)
-
- -- any other backslash is escaped
- select to_jsonb(text '\abcd');
-  to_jsonb
- ----------
-  "\\abcd"
- (1 row)
-
  --jsonb_agg
  CREATE TEMP TABLE rows AS
  SELECT x, 'txt' || x as y
--- 330,335 ----
*************** LINE 1: SELECT jsonb '{ "a":  "\ude04X"
*** 1971,1990 ****
  DETAIL:  Unicode low surrogate must follow a high surrogate.
  CONTEXT:  JSON data, line 1: { "a":...
  -- handling of simple unicode escapes
! SELECT jsonb '{ "a":  "the Copyright \u00a9 sign" }' ->> 'a' AS correct_in_utf8;
  ERROR:  invalid input syntax for type json
  LINE 1: SELECT jsonb '{ "a":  "the Copyright \u00a9 sign" }' ->> 'a'...
                       ^
  DETAIL:  Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8.
  CONTEXT:  JSON data, line 1: { "a":...
! SELECT jsonb '{ "a":  "dollar \u0024 character" }' ->> 'a' AS correct_everyWHERE;
   correct_everywhere
  --------------------
   dollar $ character
  (1 row)

! SELECT jsonb '{ "a":  "null \u0000 escape" }' ->> 'a' AS not_unescaped;
!    not_unescaped
  --------------------
   null \u0000 escape
  (1 row)
--- 1963,2024 ----
  DETAIL:  Unicode low surrogate must follow a high surrogate.
  CONTEXT:  JSON data, line 1: { "a":...
  -- handling of simple unicode escapes
! SELECT jsonb '{ "a":  "the Copyright \u00a9 sign" }' as correct_in_utf8;
! ERROR:  invalid input syntax for type json
! LINE 1: SELECT jsonb '{ "a":  "the Copyright \u00a9 sign" }' as corr...
!                      ^
! DETAIL:  Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8.
! CONTEXT:  JSON data, line 1: { "a":...
! SELECT jsonb '{ "a":  "dollar \u0024 character" }' as correct_everywhere;
!      correct_everywhere
! -----------------------------
!  {"a": "dollar $ character"}
! (1 row)
!
! SELECT jsonb '{ "a":  "dollar \\u0024 character" }' as not_an_escape;
!            not_an_escape
! -----------------------------------
!  {"a": "dollar \\u0024 character"}
! (1 row)
!
! SELECT jsonb '{ "a":  "null \u0000 escape" }' as fails;
! ERROR:  invalid input syntax for type json
! LINE 1: SELECT jsonb '{ "a":  "null \u0000 escape" }' as fails;
!                      ^
! DETAIL:  \u0000 cannot be converted to text.
! CONTEXT:  JSON data, line 1: { "a":...
! SELECT jsonb '{ "a":  "null \\u0000 escape" }' as not_an_escape;
!         not_an_escape
! ------------------------------
!  {"a": "null \\u0000 escape"}
! (1 row)
!
! SELECT jsonb '{ "a":  "the Copyright \u00a9 sign" }' ->> 'a' as correct_in_utf8;
  ERROR:  invalid input syntax for type json
  LINE 1: SELECT jsonb '{ "a":  "the Copyright \u00a9 sign" }' ->> 'a'...
                       ^
  DETAIL:  Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8.
  CONTEXT:  JSON data, line 1: { "a":...
! SELECT jsonb '{ "a":  "dollar \u0024 character" }' ->> 'a' as correct_everywhere;
   correct_everywhere
  --------------------
   dollar $ character
  (1 row)

! SELECT jsonb '{ "a":  "dollar \\u0024 character" }' ->> 'a' as not_an_escape;
!       not_an_escape
! -------------------------
!  dollar \u0024 character
! (1 row)
!
! SELECT jsonb '{ "a":  "null \u0000 escape" }' ->> 'a' as fails;
! ERROR:  invalid input syntax for type json
! LINE 1: SELECT jsonb '{ "a":  "null \u0000 escape" }' ->> 'a' as fai...
!                      ^
! DETAIL:  \u0000 cannot be converted to text.
! CONTEXT:  JSON data, line 1: { "a":...
! SELECT jsonb '{ "a":  "null \\u0000 escape" }' ->> 'a' as not_an_escape;
!    not_an_escape
  --------------------
   null \u0000 escape
  (1 row)
diff --git a/src/test/regress/sql/json.sql b/src/test/regress/sql/json.sql
index 36a6674..53a37a8 100644
*** a/src/test/regress/sql/json.sql
--- b/src/test/regress/sql/json.sql
*************** SET LOCAL TIME ZONE -8;
*** 111,124 ****
  select to_json(timestamptz '2014-05-28 12:22:35.614298-04');
  COMMIT;

- -- unicode escape - backslash is not escaped
-
- select to_json(text '\uabcd');
-
- -- any other backslash is escaped
-
- select to_json(text '\abcd');
-
  --json_agg

  SELECT json_agg(q)
--- 111,116 ----
*************** select json '{ "a":  "\ude04X" }' -> 'a'
*** 401,409 ****

  --handling of simple unicode escapes

  select json '{ "a":  "the Copyright \u00a9 sign" }' ->> 'a' as correct_in_utf8;
  select json '{ "a":  "dollar \u0024 character" }' ->> 'a' as correct_everywhere;
! select json '{ "a":  "null \u0000 escape" }' ->> 'a' as not_unescaped;

  --json_typeof() function
  select value, json_typeof(value)
--- 393,409 ----

  --handling of simple unicode escapes

+ select json '{ "a":  "the Copyright \u00a9 sign" }' as correct_in_utf8;
+ select json '{ "a":  "dollar \u0024 character" }' as correct_everywhere;
+ select json '{ "a":  "dollar \\u0024 character" }' as not_an_escape;
+ select json '{ "a":  "null \u0000 escape" }' as not_unescaped;
+ select json '{ "a":  "null \\u0000 escape" }' as not_an_escape;
+
  select json '{ "a":  "the Copyright \u00a9 sign" }' ->> 'a' as correct_in_utf8;
  select json '{ "a":  "dollar \u0024 character" }' ->> 'a' as correct_everywhere;
! select json '{ "a":  "dollar \\u0024 character" }' ->> 'a' as not_an_escape;
! select json '{ "a":  "null \u0000 escape" }' ->> 'a' as fails;
! select json '{ "a":  "null \\u0000 escape" }' ->> 'a' as not_an_escape;

  --json_typeof() function
  select value, json_typeof(value)
diff --git a/src/test/regress/sql/jsonb.sql b/src/test/regress/sql/jsonb.sql
index a846103..53cc239 100644
*** a/src/test/regress/sql/jsonb.sql
--- b/src/test/regress/sql/jsonb.sql
*************** SELECT '"\v"'::jsonb;            -- ERROR, not a
*** 10,16 ****
  SELECT '"\u"'::jsonb;            -- ERROR, incomplete escape
  SELECT '"\u00"'::jsonb;            -- ERROR, incomplete escape
  SELECT '"\u000g"'::jsonb;        -- ERROR, g is not a hex digit
! SELECT '"\u0000"'::jsonb;        -- OK, legal escape
  -- use octet_length here so we don't get an odd unicode char in the
  -- output
  SELECT octet_length('"\uaBcD"'::jsonb::text); -- OK, uppercase and lower case both OK
--- 10,17 ----
  SELECT '"\u"'::jsonb;            -- ERROR, incomplete escape
  SELECT '"\u00"'::jsonb;            -- ERROR, incomplete escape
  SELECT '"\u000g"'::jsonb;        -- ERROR, g is not a hex digit
! SELECT '"\u0045"'::jsonb;        -- OK, legal escape
! SELECT '"\u0000"'::jsonb;        -- ERROR, we don't support U+0000
  -- use octet_length here so we don't get an odd unicode char in the
  -- output
  SELECT octet_length('"\uaBcD"'::jsonb::text); -- OK, uppercase and lower case both OK
*************** SET LOCAL TIME ZONE -8;
*** 73,86 ****
  select to_jsonb(timestamptz '2014-05-28 12:22:35.614298-04');
  COMMIT;

- -- unicode escape - backslash is not escaped
-
- select to_jsonb(text '\uabcd');
-
- -- any other backslash is escaped
-
- select to_jsonb(text '\abcd');
-
  --jsonb_agg

  CREATE TEMP TABLE rows AS
--- 74,79 ----
*************** SELECT jsonb '{ "a":  "\ud83dX" }' -> 'a
*** 488,496 ****
  SELECT jsonb '{ "a":  "\ude04X" }' -> 'a'; -- orphan low surrogate

  -- handling of simple unicode escapes
! SELECT jsonb '{ "a":  "the Copyright \u00a9 sign" }' ->> 'a' AS correct_in_utf8;
! SELECT jsonb '{ "a":  "dollar \u0024 character" }' ->> 'a' AS correct_everyWHERE;
! SELECT jsonb '{ "a":  "null \u0000 escape" }' ->> 'a' AS not_unescaped;

  -- jsonb_to_record and jsonb_to_recordset

--- 481,498 ----
  SELECT jsonb '{ "a":  "\ude04X" }' -> 'a'; -- orphan low surrogate

  -- handling of simple unicode escapes
!
! SELECT jsonb '{ "a":  "the Copyright \u00a9 sign" }' as correct_in_utf8;
! SELECT jsonb '{ "a":  "dollar \u0024 character" }' as correct_everywhere;
! SELECT jsonb '{ "a":  "dollar \\u0024 character" }' as not_an_escape;
! SELECT jsonb '{ "a":  "null \u0000 escape" }' as fails;
! SELECT jsonb '{ "a":  "null \\u0000 escape" }' as not_an_escape;
!
! SELECT jsonb '{ "a":  "the Copyright \u00a9 sign" }' ->> 'a' as correct_in_utf8;
! SELECT jsonb '{ "a":  "dollar \u0024 character" }' ->> 'a' as correct_everywhere;
! SELECT jsonb '{ "a":  "dollar \\u0024 character" }' ->> 'a' as not_an_escape;
! SELECT jsonb '{ "a":  "null \u0000 escape" }' ->> 'a' as fails;
! SELECT jsonb '{ "a":  "null \\u0000 escape" }' ->> 'a' as not_an_escape;

  -- jsonb_to_record and jsonb_to_recordset


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Safe memory allocation functions
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: PATCH: Reducing lock strength of trigger and foreign key DDL