Cleaning up array_in()

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Cleaning up array_in()
Дата
Msg-id 2794005.1683042087@sss.pgh.pa.us
обсуждение исходный текст
Ответы Re: Cleaning up array_in()  (Nathan Bossart <nathandbossart@gmail.com>)
Re: Cleaning up array_in()  (Alexander Lakhin <exclusion@gmail.com>)
Список pgsql-hackers
This is in response to Alexander's observation at [1], but I'm
starting a fresh thread to keep this patch separate from the plperl
fixes in the cfbot's eyes.

Alexander Lakhin <exclusion@gmail.com> writes:
> I continue watching the array handling bugs dancing Sirtaki too. Now it's
> another asymmetry:
> select '{{1},{{2}}}'::int[];
>   {{{1}},{{2}}}
> but:
> select '{{{1}},{2}}'::int[];
>   {}

Bleah.  Both of those should be rejected, for sure, but it's the same
situation as in the PLs: we weren't doing anything to enforce that all
the scalar elements appear at the same nesting depth.

I spent some time examining array_in(), and was pretty disheartened
by what a mess it is.  It looks like back in the dim mists of the
Berkeley era, there was an intentional attempt to allow
non-rectangular array input, with the missing elements automatically
filled out as NULLs.  Since that was undocumented, we concluded it was
a bug and plastered on some code to check for rectangularity of the
input.  I don't quibble with enforcing rectangularity, but the
underlying logic should have been simplified while we were at it.
The element-counting logic was basically magic (why is it okay to
increment temp[ndim - 1] when the current nest_level might be
different from that?) and the extra layers of checks didn't make it
any more intelligible.  Plus, ReadArrayStr was expending far more
cycles than it needs to given the assumption of rectangularity.

So, here's a rewrite.

Although I view this as a bug fix, AFAICT the only effects are to
accept input that should be rejected.  So again I don't advocate
back-patching.  But should we sneak it into v16, or wait for v17?

            regards, tom lane

[1] https://www.postgresql.org/message-id/9cd163da-d096-7e9e-28f6-f3620962a660%40gmail.com

From 6a9fe8117e1b91958111c679d02a2bd7944fae22 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Mon, 1 May 2023 18:31:40 -0400
Subject: [PATCH v1 1/2] Simplify and speed up ReadArrayStr().

ReadArrayStr() seems to have been written on the assumption that
non-rectangular input is fine and it should pad with NULLs anywhere
that elements are missing.  We disallowed non-rectangular input
ages ago (commit 0e13d627b), but never simplified this function
as a follow-up.  In particular, the existing code recomputes each
element's linear location from scratch, which is quite unnecessary
for rectangular input: we can just assign the elements sequentially,
saving lots of arithmetic.  Add some more commentary while at it.

(This leaves ArrayGetOffset0() unused, but I'm unsure whether to
remove that.)
---
 src/backend/utils/adt/arrayfuncs.c | 69 ++++++++++++++----------------
 1 file changed, 33 insertions(+), 36 deletions(-)

diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 87c987fb27..39b5efc661 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -93,7 +93,7 @@ static bool array_isspace(char ch);
 static int    ArrayCount(const char *str, int *dim, char typdelim,
                        Node *escontext);
 static bool ReadArrayStr(char *arrayStr, const char *origStr,
-                         int nitems, int ndim, int *dim,
+                         int nitems,
                          FmgrInfo *inputproc, Oid typioparam, int32 typmod,
                          char typdelim,
                          int typlen, bool typbyval, char typalign,
@@ -391,7 +391,7 @@ array_in(PG_FUNCTION_ARGS)
     dataPtr = (Datum *) palloc(nitems * sizeof(Datum));
     nullsPtr = (bool *) palloc(nitems * sizeof(bool));
     if (!ReadArrayStr(p, string,
-                      nitems, ndim, dim,
+                      nitems,
                       &my_extra->proc, typioparam, typmod,
                       typdelim,
                       typlen, typbyval, typalign,
@@ -457,7 +457,8 @@ array_isspace(char ch)

 /*
  * ArrayCount
- *     Determines the dimensions for an array string.
+ *     Determines the dimensions for an array string.  This includes
+ *     syntax-checking the array structure decoration (braces and delimiters).
  *
  * Returns number of dimensions as function result.  The axis lengths are
  * returned in dim[], which must be of size MAXDIM.
@@ -704,16 +705,14 @@ ArrayCount(const char *str, int *dim, char typdelim, Node *escontext)
 /*
  * ReadArrayStr :
  *     parses the array string pointed to by "arrayStr" and converts the values
- *     to internal format.  Unspecified elements are initialized to nulls.
- *     The array dimensions must already have been determined.
+ *     to internal format.  The array dimensions must have been determined,
+ *     and the case of an empty array must have been handled earlier.
  *
  * Inputs:
  *    arrayStr: the string to parse.
  *              CAUTION: the contents of "arrayStr" will be modified!
  *    origStr: the unmodified input string, used only in error messages.
  *    nitems: total number of array elements, as already determined.
- *    ndim: number of array dimensions
- *    dim[]: array axis lengths
  *    inputproc: type-specific input procedure for element datatype.
  *    typioparam, typmod: auxiliary values to pass to inputproc.
  *    typdelim: the value delimiter (type-specific).
@@ -738,8 +737,6 @@ static bool
 ReadArrayStr(char *arrayStr,
              const char *origStr,
              int nitems,
-             int ndim,
-             int *dim,
              FmgrInfo *inputproc,
              Oid typioparam,
              int32 typmod,
@@ -753,20 +750,13 @@ ReadArrayStr(char *arrayStr,
              int32 *nbytes,
              Node *escontext)
 {
-    int            i,
+    int            i = 0,
                 nest_level = 0;
     char       *srcptr;
     bool        in_quotes = false;
     bool        eoArray = false;
     bool        hasnull;
     int32        totbytes;
-    int            indx[MAXDIM] = {0},
-                prod[MAXDIM];
-
-    mda_get_prod(ndim, dim, prod);
-
-    /* Initialize is-null markers to true */
-    memset(nulls, true, nitems * sizeof(bool));

     /*
      * We have to remove " and \ characters to create a clean item value to
@@ -789,11 +779,20 @@ ReadArrayStr(char *arrayStr,
         bool        itemdone = false;
         bool        leadingspace = true;
         bool        hasquoting = false;
-        char       *itemstart;
-        char       *dstptr;
-        char       *dstendptr;
+        char       *itemstart;    /* start of de-escaped text */
+        char       *dstptr;        /* next output point for de-escaped text */
+        char       *dstendptr;    /* last significant output char + 1 */

-        i = -1;
+        /*
+         * Parse next array element, collecting the de-escaped text into
+         * itemstart..dstendptr-1.
+         *
+         * Notice that we do not set "itemdone" until we see a separator
+         * (typdelim character) or the array's final right brace.  Since the
+         * array is already verified to be nonempty and rectangular, there is
+         * guaranteed to be another element to be processed in the first case,
+         * while in the second case of course we'll exit the outer loop.
+         */
         itemstart = dstptr = dstendptr = srcptr;

         while (!itemdone)
@@ -840,13 +839,7 @@ ReadArrayStr(char *arrayStr,
                 case '{':
                     if (!in_quotes)
                     {
-                        if (nest_level >= ndim)
-                            ereturn(escontext, false,
-                                    (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-                                     errmsg("malformed array literal: \"%s\"",
-                                            origStr)));
                         nest_level++;
-                        indx[nest_level - 1] = 0;
                         srcptr++;
                     }
                     else
@@ -860,14 +853,9 @@ ReadArrayStr(char *arrayStr,
                                     (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                                      errmsg("malformed array literal: \"%s\"",
                                             origStr)));
-                        if (i == -1)
-                            i = ArrayGetOffset0(ndim, indx, prod);
-                        indx[nest_level - 1] = 0;
                         nest_level--;
                         if (nest_level == 0)
                             eoArray = itemdone = true;
-                        else
-                            indx[nest_level - 1]++;
                         srcptr++;
                     }
                     else
@@ -878,10 +866,7 @@ ReadArrayStr(char *arrayStr,
                         *dstptr++ = *srcptr++;
                     else if (*srcptr == typdelim)
                     {
-                        if (i == -1)
-                            i = ArrayGetOffset0(ndim, indx, prod);
                         itemdone = true;
-                        indx[ndim - 1]++;
                         srcptr++;
                     }
                     else if (array_isspace(*srcptr))
@@ -905,15 +890,18 @@ ReadArrayStr(char *arrayStr,
             }
         }

+        /* Terminate de-escaped string */
         Assert(dstptr < srcptr);
         *dstendptr = '\0';

-        if (i < 0 || i >= nitems)
+        /* Safety check that we don't write past the output arrays */
+        if (i >= nitems)
             ereturn(escontext, false,
                     (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                      errmsg("malformed array literal: \"%s\"",
                             origStr)));

+        /* Convert the de-escaped string into the next output array entries */
         if (Array_nulls && !hasquoting &&
             pg_strcasecmp(itemstart, "NULL") == 0)
         {
@@ -934,8 +922,17 @@ ReadArrayStr(char *arrayStr,
                 return false;
             nulls[i] = false;
         }
+
+        i++;
     }

+    /* Cross-check that we filled all the output array entries */
+    if (i != nitems)
+        ereturn(escontext, false,
+                (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                 errmsg("malformed array literal: \"%s\"",
+                        origStr)));
+
     /*
      * Check for nulls, compute total data space needed
      */
--
2.31.1

From cf15943f7438306559719407126ce52958a2c061 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Tue, 2 May 2023 11:23:31 -0400
Subject: [PATCH v1 2/2] Rewrite ArrayCount() to make dimensionality checks
 more complete.

ArrayCount has had a few layers of revision over the years,
reaching a point where it's really quite baroque and unintelligible.
There are actually four different arrays holding dimensions, which
can be reduced to two with little effort.  It was also close to
impossible to figure out why the dimension-counting actually worked
correctly.  Rewrite to make it perhaps a little less opaque.
In particular, a new element or subarray is now counted when we see
the start of the element or subarray, not at some later point where
we might not even be at the same nest_level anymore.

Also, install guards to catch non-rectangularity cases that were
previously missed, along the same lines as recent plperl and plpython
fixes: we were not checking that all scalar elements appear at the
same nesting depth.  Thanks to the very different logic, this seems
not to have caused any of the sort of internal errors that the PLs
suffered from, but it's still wrong.

On the other hand, the behavior of plperl and plpython suggests
that we should allow empty sub-arrays, as in '{{},{}}'; those
languages interpret equivalent constructs such as '[[],[]]' as
an empty (hence zero-dimensional) array, and it seems unclear
why array_in should not.  Remove the hack that was installed to
reject that case.

In passing, remove ArrayParseState.ARRAY_ELEM_COMPLETED, a state
that the state machine never entered yet wasted cycles checking for.

(I reindented the code for consistency, but it's probably easiest
to review this patch by examining "git diff -b" output.)
---
 src/backend/utils/adt/arrayfuncs.c   | 367 +++++++++++++++------------
 src/test/regress/expected/arrays.out |  41 ++-
 src/test/regress/sql/arrays.sql      |   8 +-
 3 files changed, 253 insertions(+), 163 deletions(-)

diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 39b5efc661..2a8f0c048a 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -57,7 +57,6 @@ typedef enum
     ARRAY_NO_LEVEL,
     ARRAY_LEVEL_STARTED,
     ARRAY_ELEM_STARTED,
-    ARRAY_ELEM_COMPLETED,
     ARRAY_QUOTED_ELEM_STARTED,
     ARRAY_QUOTED_ELEM_COMPLETED,
     ARRAY_ELEM_DELIMITED,
@@ -472,213 +471,270 @@ ArrayCount(const char *str, int *dim, char typdelim, Node *escontext)
     int            nest_level = 0,
                 i;
     int            ndim = 1,
-                temp[MAXDIM],
-                nelems[MAXDIM],
-                nelems_last[MAXDIM];
+                nelems[MAXDIM];
+    bool        ndim_frozen = false;
     bool        in_quotes = false;
     bool        eoArray = false;
     bool        empty_array = true;
     const char *ptr;
     ArrayParseState parse_state = ARRAY_NO_LEVEL;

+    /* Initialize dim[] entries to -1 meaning "unknown" */
     for (i = 0; i < MAXDIM; ++i)
-    {
-        temp[i] = dim[i] = nelems_last[i] = 0;
-        nelems[i] = 1;
-    }
+        dim[i] = -1;

+    /* Scan string until we reach closing brace */
     ptr = str;
     while (!eoArray)
     {
-        bool        itemdone = false;
+        bool        new_element = false;

-        while (!itemdone)
+        switch (*ptr)
         {
-            if (parse_state == ARRAY_ELEM_STARTED ||
-                parse_state == ARRAY_QUOTED_ELEM_STARTED)
-                empty_array = false;
+            case '\0':
+                /* Signal a premature end of the string */
+                ereturn(escontext, -1,
+                        (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                         errmsg("malformed array literal: \"%s\"", str),
+                         errdetail("Unexpected end of input.")));
+            case '\\':

-            switch (*ptr)
-            {
-                case '\0':
-                    /* Signal a premature end of the string */
+                /*
+                 * An escape must be after a level start, after an element
+                 * start, or after an element delimiter. In any case we now
+                 * must be past an element start.
+                 */
+                switch (parse_state)
+                {
+                    case ARRAY_LEVEL_STARTED:
+                    case ARRAY_ELEM_DELIMITED:
+                        /* start new unquoted element */
+                        parse_state = ARRAY_ELEM_STARTED;
+                        new_element = true;
+                        break;
+                    case ARRAY_ELEM_STARTED:
+                    case ARRAY_QUOTED_ELEM_STARTED:
+                        /* already in element */
+                        break;
+                    default:
+                        ereturn(escontext, -1,
+                                (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                                 errmsg("malformed array literal: \"%s\"", str),
+                                 errdetail("Unexpected \"%c\" character.",
+                                           '\\')));
+                }
+                /* skip the escaped character */
+                if (*(ptr + 1))
+                    ptr++;
+                else
                     ereturn(escontext, -1,
                             (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                              errmsg("malformed array literal: \"%s\"", str),
                              errdetail("Unexpected end of input.")));
-                case '\\':
+                break;
+            case '"':

+                /*
+                 * A quote must be after a level start, after a quoted element
+                 * start, or after an element delimiter. In any case we now
+                 * must be past an element start.
+                 */
+                switch (parse_state)
+                {
+                    case ARRAY_LEVEL_STARTED:
+                    case ARRAY_ELEM_DELIMITED:
+                        /* start new quoted element */
+                        Assert(!in_quotes);
+                        in_quotes = true;
+                        parse_state = ARRAY_QUOTED_ELEM_STARTED;
+                        new_element = true;
+                        break;
+                    case ARRAY_QUOTED_ELEM_STARTED:
+                        /* already in element, end it */
+                        Assert(in_quotes);
+                        in_quotes = false;
+                        parse_state = ARRAY_QUOTED_ELEM_COMPLETED;
+                        break;
+                    default:
+                        ereturn(escontext, -1,
+                                (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                                 errmsg("malformed array literal: \"%s\"", str),
+                                 errdetail("Unexpected array element.")));
+                }
+                break;
+            case '{':
+                if (!in_quotes)
+                {
                     /*
-                     * An escape must be after a level start, after an element
-                     * start, or after an element delimiter. In any case we
-                     * now must be past an element start.
+                     * A left brace can occur if no nesting has occurred yet,
+                     * after a level start, or after a level delimiter.
                      */
-                    if (parse_state != ARRAY_LEVEL_STARTED &&
-                        parse_state != ARRAY_ELEM_STARTED &&
-                        parse_state != ARRAY_QUOTED_ELEM_STARTED &&
-                        parse_state != ARRAY_ELEM_DELIMITED)
+                    if (parse_state != ARRAY_NO_LEVEL &&
+                        parse_state != ARRAY_LEVEL_STARTED &&
+                        parse_state != ARRAY_LEVEL_DELIMITED)
                         ereturn(escontext, -1,
                                 (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                                  errmsg("malformed array literal: \"%s\"", str),
                                  errdetail("Unexpected \"%c\" character.",
-                                           '\\')));
-                    if (parse_state != ARRAY_QUOTED_ELEM_STARTED)
-                        parse_state = ARRAY_ELEM_STARTED;
-                    /* skip the escaped character */
-                    if (*(ptr + 1))
-                        ptr++;
-                    else
+                                           '{')));
+                    parse_state = ARRAY_LEVEL_STARTED;
+                    /* Nested sub-arrays count as elements of outer level */
+                    if (nest_level > 0)
+                        nelems[nest_level - 1]++;
+                    /* Initialize element counting in the new level */
+                    if (nest_level >= MAXDIM)
                         ereturn(escontext, -1,
-                                (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-                                 errmsg("malformed array literal: \"%s\"", str),
-                                 errdetail("Unexpected end of input.")));
-                    break;
-                case '"':
-
+                                (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+                                 errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)",
+                                        nest_level + 1, MAXDIM)));
+                    nelems[nest_level] = 0;
+                    nest_level++;
+                    if (ndim < nest_level)
+                    {
+                        /* Can't increase ndim once it's frozen */
+                        if (ndim_frozen)
+                            ereturn(escontext, -1,
+                                    (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                                     errmsg("malformed array literal: \"%s\"", str),
+                                     errdetail("Multidimensional arrays must have sub-arrays with matching
dimensions.")));
+                        ndim = nest_level;
+                    }
+                }
+                break;
+            case '}':
+                if (!in_quotes)
+                {
                     /*
-                     * A quote must be after a level start, after a quoted
-                     * element start, or after an element delimiter. In any
-                     * case we now must be past an element start.
+                     * A right brace can occur after an element start, an
+                     * element completion, a quoted element completion, or a
+                     * level completion.  We also allow it after a level
+                     * start, that is an empty sub-array "{}" --- but that
+                     * freezes the number of dimensions and all such
+                     * sub-arrays must be at the same level, just like
+                     * sub-arrays containing elements.
                      */
-                    if (parse_state != ARRAY_LEVEL_STARTED &&
-                        parse_state != ARRAY_QUOTED_ELEM_STARTED &&
-                        parse_state != ARRAY_ELEM_DELIMITED)
-                        ereturn(escontext, -1,
-                                (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-                                 errmsg("malformed array literal: \"%s\"", str),
-                                 errdetail("Unexpected array element.")));
-                    in_quotes = !in_quotes;
-                    if (in_quotes)
-                        parse_state = ARRAY_QUOTED_ELEM_STARTED;
-                    else
-                        parse_state = ARRAY_QUOTED_ELEM_COMPLETED;
-                    break;
-                case '{':
-                    if (!in_quotes)
+                    switch (parse_state)
                     {
-                        /*
-                         * A left brace can occur if no nesting has occurred
-                         * yet, after a level start, or after a level
-                         * delimiter.
-                         */
-                        if (parse_state != ARRAY_NO_LEVEL &&
-                            parse_state != ARRAY_LEVEL_STARTED &&
-                            parse_state != ARRAY_LEVEL_DELIMITED)
+                        case ARRAY_ELEM_STARTED:
+                        case ARRAY_QUOTED_ELEM_COMPLETED:
+                        case ARRAY_LEVEL_COMPLETED:
+                            /* okay */
+                            break;
+                        case ARRAY_LEVEL_STARTED:
+                            /* empty sub-array: OK if at correct nest_level */
+                            ndim_frozen = true;
+                            if (nest_level != ndim)
+                                ereturn(escontext, -1,
+                                        (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                                         errmsg("malformed array literal: \"%s\"", str),
+                                         errdetail("Multidimensional arrays must have sub-arrays with matching
dimensions.")));
+                            break;
+                        default:
                             ereturn(escontext, -1,
                                     (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                                      errmsg("malformed array literal: \"%s\"", str),
                                      errdetail("Unexpected \"%c\" character.",
-                                               '{')));
-                        parse_state = ARRAY_LEVEL_STARTED;
-                        if (nest_level >= MAXDIM)
-                            ereturn(escontext, -1,
-                                    (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-                                     errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)",
-                                            nest_level + 1, MAXDIM)));
-                        temp[nest_level] = 0;
-                        nest_level++;
-                        if (ndim < nest_level)
-                            ndim = nest_level;
+                                               '}')));
                     }
-                    break;
-                case '}':
-                    if (!in_quotes)
+                    parse_state = ARRAY_LEVEL_COMPLETED;
+                    /* The parse state check assured we're in a level. */
+                    Assert(nest_level > 0);
+                    nest_level--;
+
+                    if (dim[nest_level] < 0)
+                    {
+                        /* Save length of first sub-array of this level */
+                        dim[nest_level] = nelems[nest_level];
+                    }
+                    else if (nelems[nest_level] != dim[nest_level])
+                    {
+                        /* Subsequent sub-arrays must have same length */
+                        ereturn(escontext, -1,
+                                (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                                 errmsg("malformed array literal: \"%s\"", str),
+                                 errdetail("Multidimensional arrays must have sub-arrays with matching
dimensions.")));
+                    }
+                    /* Done if this is the outermost level's '}' */
+                    if (nest_level == 0)
+                        eoArray = true;
+                }
+                break;
+            default:
+                if (!in_quotes)
+                {
+                    if (*ptr == typdelim)
                     {
                         /*
-                         * A right brace can occur after an element start, an
-                         * element completion, a quoted element completion, or
-                         * a level completion.
+                         * Delimiters can occur after an element start, a
+                         * quoted element completion, or a level completion.
                          */
                         if (parse_state != ARRAY_ELEM_STARTED &&
-                            parse_state != ARRAY_ELEM_COMPLETED &&
                             parse_state != ARRAY_QUOTED_ELEM_COMPLETED &&
-                            parse_state != ARRAY_LEVEL_COMPLETED &&
-                            !(nest_level == 1 && parse_state == ARRAY_LEVEL_STARTED))
+                            parse_state != ARRAY_LEVEL_COMPLETED)
                             ereturn(escontext, -1,
                                     (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                                      errmsg("malformed array literal: \"%s\"", str),
                                      errdetail("Unexpected \"%c\" character.",
-                                               '}')));
-                        parse_state = ARRAY_LEVEL_COMPLETED;
-                        if (nest_level == 0)
-                            ereturn(escontext, -1,
-                                    (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-                                     errmsg("malformed array literal: \"%s\"", str),
-                                     errdetail("Unmatched \"%c\" character.", '}')));
-                        nest_level--;
-
-                        if (nelems_last[nest_level] != 0 &&
-                            nelems[nest_level] != nelems_last[nest_level])
-                            ereturn(escontext, -1,
-                                    (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-                                     errmsg("malformed array literal: \"%s\"", str),
-                                     errdetail("Multidimensional arrays must have "
-                                               "sub-arrays with matching "
-                                               "dimensions.")));
-                        nelems_last[nest_level] = nelems[nest_level];
-                        nelems[nest_level] = 1;
-                        if (nest_level == 0)
-                            eoArray = itemdone = true;
+                                               typdelim)));
+                        if (parse_state == ARRAY_LEVEL_COMPLETED)
+                            parse_state = ARRAY_LEVEL_DELIMITED;
                         else
-                        {
-                            /*
-                             * We don't set itemdone here; see comments in
-                             * ReadArrayStr
-                             */
-                            temp[nest_level - 1]++;
-                        }
+                            parse_state = ARRAY_ELEM_DELIMITED;
                     }
-                    break;
-                default:
-                    if (!in_quotes)
+                    else if (!array_isspace(*ptr))
                     {
-                        if (*ptr == typdelim)
-                        {
-                            /*
-                             * Delimiters can occur after an element start, an
-                             * element completion, a quoted element
-                             * completion, or a level completion.
-                             */
-                            if (parse_state != ARRAY_ELEM_STARTED &&
-                                parse_state != ARRAY_ELEM_COMPLETED &&
-                                parse_state != ARRAY_QUOTED_ELEM_COMPLETED &&
-                                parse_state != ARRAY_LEVEL_COMPLETED)
-                                ereturn(escontext, -1,
-                                        (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
-                                         errmsg("malformed array literal: \"%s\"", str),
-                                         errdetail("Unexpected \"%c\" character.",
-                                                   typdelim)));
-                            if (parse_state == ARRAY_LEVEL_COMPLETED)
-                                parse_state = ARRAY_LEVEL_DELIMITED;
-                            else
-                                parse_state = ARRAY_ELEM_DELIMITED;
-                            itemdone = true;
-                            nelems[nest_level - 1]++;
-                        }
-                        else if (!array_isspace(*ptr))
+                        /*
+                         * Other non-space characters must be after a level
+                         * start, after an element start, or after an element
+                         * delimiter. In any case we now must be past an
+                         * element start.
+                         *
+                         * If it's a space character, we can ignore it; it
+                         * might be data or not, but it doesn't change the
+                         * parsing state.
+                         */
+                        switch (parse_state)
                         {
-                            /*
-                             * Other non-space characters must be after a
-                             * level start, after an element start, or after
-                             * an element delimiter. In any case we now must
-                             * be past an element start.
-                             */
-                            if (parse_state != ARRAY_LEVEL_STARTED &&
-                                parse_state != ARRAY_ELEM_STARTED &&
-                                parse_state != ARRAY_ELEM_DELIMITED)
+                            case ARRAY_LEVEL_STARTED:
+                            case ARRAY_ELEM_DELIMITED:
+                                /* start new unquoted element */
+                                parse_state = ARRAY_ELEM_STARTED;
+                                new_element = true;
+                                break;
+                            case ARRAY_ELEM_STARTED:
+                                /* already in element */
+                                break;
+                            default:
                                 ereturn(escontext, -1,
                                         (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
                                          errmsg("malformed array literal: \"%s\"", str),
                                          errdetail("Unexpected array element.")));
-                            parse_state = ARRAY_ELEM_STARTED;
                         }
                     }
-                    break;
-            }
-            if (!itemdone)
-                ptr++;
+                }
+                break;
         }
-        temp[ndim - 1]++;
+
+        /* To reduce duplication, all new-element cases go through here. */
+        if (new_element)
+        {
+            /*
+             * Once we have found an element, the number of dimensions can no
+             * longer increase, and subsequent elements must all be at the
+             * same nesting depth.
+             */
+            ndim_frozen = true;
+            if (nest_level != ndim)
+                ereturn(escontext, -1,
+                        (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                         errmsg("malformed array literal: \"%s\"", str),
+                         errdetail("Multidimensional arrays must have sub-arrays with matching dimensions.")));
+            /* Count the new element */
+            nelems[nest_level - 1]++;
+            /* The array's now known non-empty, too */
+            empty_array = false;
+        }
+
         ptr++;
     }

@@ -696,9 +752,6 @@ ArrayCount(const char *str, int *dim, char typdelim, Node *escontext)
     if (empty_array)
         return 0;

-    for (i = 0; i < ndim; ++i)
-        dim[i] = temp[i];
-
     return ndim;
 }

diff --git a/src/test/regress/expected/arrays.out b/src/test/regress/expected/arrays.out
index 7064391468..71fa12f828 100644
--- a/src/test/regress/expected/arrays.out
+++ b/src/test/regress/expected/arrays.out
@@ -1476,11 +1476,6 @@ ERROR:  malformed array literal: "{{1,{2}},{2,3}}"
 LINE 1: select '{{1,{2}},{2,3}}'::text[];
                ^
 DETAIL:  Unexpected "{" character.
-select '{{},{}}'::text[];
-ERROR:  malformed array literal: "{{},{}}"
-LINE 1: select '{{},{}}'::text[];
-               ^
-DETAIL:  Unexpected "}" character.
 select E'{{1,2},\\{2,3}}'::text[];
 ERROR:  malformed array literal: "{{1,2},\{2,3}}"
 LINE 1: select E'{{1,2},\\{2,3}}'::text[];
@@ -1501,6 +1496,36 @@ ERROR:  malformed array literal: "{ }}"
 LINE 1: select '{ }}'::text[];
                ^
 DETAIL:  Junk after closing right brace.
+select '{{1},{{2}}}'::text[];
+ERROR:  malformed array literal: "{{1},{{2}}}"
+LINE 1: select '{{1},{{2}}}'::text[];
+               ^
+DETAIL:  Multidimensional arrays must have sub-arrays with matching dimensions.
+select '{{{1}},{2}}'::text[];
+ERROR:  malformed array literal: "{{{1}},{2}}"
+LINE 1: select '{{{1}},{2}}'::text[];
+               ^
+DETAIL:  Multidimensional arrays must have sub-arrays with matching dimensions.
+select '{{},{{}}}'::text[];
+ERROR:  malformed array literal: "{{},{{}}}"
+LINE 1: select '{{},{{}}}'::text[];
+               ^
+DETAIL:  Multidimensional arrays must have sub-arrays with matching dimensions.
+select '{{{}},{}}'::text[];
+ERROR:  malformed array literal: "{{{}},{}}"
+LINE 1: select '{{{}},{}}'::text[];
+               ^
+DETAIL:  Multidimensional arrays must have sub-arrays with matching dimensions.
+select '{{1},{}}'::text[];
+ERROR:  malformed array literal: "{{1},{}}"
+LINE 1: select '{{1},{}}'::text[];
+               ^
+DETAIL:  Multidimensional arrays must have sub-arrays with matching dimensions.
+select '{{},{1}}'::text[];
+ERROR:  malformed array literal: "{{},{1}}"
+LINE 1: select '{{},{1}}'::text[];
+               ^
+DETAIL:  Multidimensional arrays must have sub-arrays with matching dimensions.
 select array[];
 ERROR:  cannot determine type of empty array
 LINE 1: select array[];
@@ -1514,6 +1539,12 @@ select '{}'::text[];
  {}
 (1 row)

+select '{{},{}}'::text[];
+ text
+------
+ {}
+(1 row)
+
 select '{{{1,2,3,4},{2,3,4,5}},{{3,4,5,6},{4,5,6,7}}}'::text[];
                      text
 -----------------------------------------------
diff --git a/src/test/regress/sql/arrays.sql b/src/test/regress/sql/arrays.sql
index f1375621e0..ea7b55bd09 100644
--- a/src/test/regress/sql/arrays.sql
+++ b/src/test/regress/sql/arrays.sql
@@ -454,16 +454,22 @@ select 'foo' ilike all (array['F%', '%O']); -- t

 -- none of the following should be accepted
 select '{{1,{2}},{2,3}}'::text[];
-select '{{},{}}'::text[];
 select E'{{1,2},\\{2,3}}'::text[];
 select '{{"1 2" x},{3}}'::text[];
 select '{}}'::text[];
 select '{ }}'::text[];
+select '{{1},{{2}}}'::text[];
+select '{{{1}},{2}}'::text[];
+select '{{},{{}}}'::text[];
+select '{{{}},{}}'::text[];
+select '{{1},{}}'::text[];
+select '{{},{1}}'::text[];
 select array[];
 -- none of the above should be accepted

 -- all of the following should be accepted
 select '{}'::text[];
+select '{{},{}}'::text[];
 select '{{{1,2,3,4},{2,3,4,5}},{{3,4,5,6},{4,5,6,7}}}'::text[];
 select '{0 second  ,0 second}'::interval[];
 select '{ { "," } , { 3 } }'::text[];
--
2.31.1


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Noah Misch
Дата:
Сообщение: Re: ICU locale validation / canonicalization
Следующее
От: Robert Haas
Дата:
Сообщение: code cleanup for CREATE STATISTICS