Re: jsonb format is pessimal for toast compression

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: jsonb format is pessimal for toast compression
Дата
Msg-id 15378.1408548595@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: jsonb format is pessimal for toast compression  (Josh Berkus <josh@agliodbs.com>)
Список pgsql-hackers
Josh Berkus <josh@agliodbs.com> writes:
> On 08/15/2014 04:19 PM, Tom Lane wrote:
>> Personally I'd prefer to go to the all-lengths approach, but a large
>> part of that comes from a subjective assessment that the hybrid approach
>> is too messy.  Others might well disagree.

> ... So, that extraction test is about 1% *slower* than the basic Tom Lane
> lengths-only patch, and still 80% slower than original JSONB.  And it's
> the same size as the lengths-only version.

Since it's looking like this might be the direction we want to go, I took
the time to flesh out my proof-of-concept patch.  The attached version
takes care of cosmetic issues (like fixing the comments), and includes
code to avoid O(N^2) penalties in findJsonbValueFromContainer and
JsonbIteratorNext.  I'm not sure whether those changes will help
noticeably on Josh's test case; for me, they seemed worth making, but
they do not bring the code back to full speed parity with the all-offsets
version.  But as we've been discussing, it seems likely that those costs
would be swamped by compression and I/O considerations in most scenarios
with large documents; and of course for small documents it hardly matters.

Even if we don't go this way, there are parts of this patch that would
need to get committed.  I found for instance that convertJsonbArray and
convertJsonbObject have insufficient defenses against overflowing the
overall length field for the array or object.

For my own part, I'm satisfied with the patch as attached (modulo the
need to teach pg_upgrade about the incompatibility).  There remains the
question of whether to take this opportunity to add a version ID to the
binary format.  I'm not as excited about that idea as I originally was;
having now studied the code more carefully, I think that any expansion
would likely happen by adding more type codes and/or commandeering the
currently-unused high-order bit of JEntrys.  We don't need a version ID
in the header for that.  Moreover, if we did have such an ID, it would be
notationally painful to get it to most of the places that might need it.

            regards, tom lane

diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index 2fd87fc..456011a 100644
*** a/src/backend/utils/adt/jsonb.c
--- b/src/backend/utils/adt/jsonb.c
*************** jsonb_from_cstring(char *json, int len)
*** 196,207 ****
  static size_t
  checkStringLen(size_t len)
  {
!     if (len > JENTRY_POSMASK)
          ereport(ERROR,
                  (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                   errmsg("string too long to represent as jsonb string"),
                   errdetail("Due to an implementation restriction, jsonb strings cannot exceed %d bytes.",
!                            JENTRY_POSMASK)));

      return len;
  }
--- 196,207 ----
  static size_t
  checkStringLen(size_t len)
  {
!     if (len > JENTRY_LENMASK)
          ereport(ERROR,
                  (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                   errmsg("string too long to represent as jsonb string"),
                   errdetail("Due to an implementation restriction, jsonb strings cannot exceed %d bytes.",
!                            JENTRY_LENMASK)));

      return len;
  }
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 04f35bf..e47eaea 100644
*** a/src/backend/utils/adt/jsonb_util.c
--- b/src/backend/utils/adt/jsonb_util.c
***************
*** 26,40 ****
   * in MaxAllocSize, and the number of elements (or pairs) must fit in the bits
   * reserved for that in the JsonbContainer.header field.
   *
!  * (the total size of an array's elements is also limited by JENTRY_POSMASK,
   * but we're not concerned about that here)
   */
  #define JSONB_MAX_ELEMS (Min(MaxAllocSize / sizeof(JsonbValue), JB_CMASK))
  #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), JB_CMASK))

! static void fillJsonbValue(JEntry *array, int index, char *base_addr,
                 JsonbValue *result);
! static bool    equalsJsonbScalarValue(JsonbValue *a, JsonbValue *b);
  static int    compareJsonbScalarValue(JsonbValue *a, JsonbValue *b);
  static Jsonb *convertToJsonb(JsonbValue *val);
  static void convertJsonbValue(StringInfo buffer, JEntry *header, JsonbValue *val, int level);
--- 26,41 ----
   * in MaxAllocSize, and the number of elements (or pairs) must fit in the bits
   * reserved for that in the JsonbContainer.header field.
   *
!  * (the total size of an array's elements is also limited by JENTRY_LENMASK,
   * but we're not concerned about that here)
   */
  #define JSONB_MAX_ELEMS (Min(MaxAllocSize / sizeof(JsonbValue), JB_CMASK))
  #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), JB_CMASK))

! static void fillJsonbValue(JEntry *children, int index,
!                char *base_addr, uint32 offset,
                 JsonbValue *result);
! static bool equalsJsonbScalarValue(JsonbValue *a, JsonbValue *b);
  static int    compareJsonbScalarValue(JsonbValue *a, JsonbValue *b);
  static Jsonb *convertToJsonb(JsonbValue *val);
  static void convertJsonbValue(StringInfo buffer, JEntry *header, JsonbValue *val, int level);
*************** JsonbValueToJsonb(JsonbValue *val)
*** 108,113 ****
--- 109,135 ----
  }

  /*
+  * Get the offset of the variable-length portion of a Jsonb node within
+  * the variable-length-data part of its container.  The node is identified
+  * by index within the container's JEntry array.
+  *
+  * We do this by adding up the lengths of all the previous nodes'
+  * variable-length portions.  It's best to avoid using this function when
+  * iterating through all the nodes in a container, since that would result
+  * in O(N^2) work.
+  */
+ uint32
+ getJsonbOffset(const JEntry *ja, int index)
+ {
+     uint32        off = 0;
+     int            i;
+
+     for (i = 0; i < index; i++)
+         off += JBE_LEN(ja, i);
+     return off;
+ }
+
+ /*
   * BT comparator worker function.  Returns an integer less than, equal to, or
   * greater than zero, indicating whether a is less than, equal to, or greater
   * than b.  Consistent with the requirements for a B-Tree operator class
*************** findJsonbValueFromContainer(JsonbContain
*** 279,324 ****
      if (flags & JB_FARRAY & container->header)
      {
          char       *base_addr = (char *) (children + count);
          int            i;

          for (i = 0; i < count; i++)
          {
!             fillJsonbValue(children, i, base_addr, result);

              if (key->type == result->type)
              {
                  if (equalsJsonbScalarValue(key, result))
                      return result;
              }
          }
      }
      else if (flags & JB_FOBJECT & container->header)
      {
          /* Since this is an object, account for *Pairs* of Jentrys */
          char       *base_addr = (char *) (children + count * 2);
          uint32        stopLow = 0,
!                     stopMiddle;

!         /* Object key past by caller must be a string */
          Assert(key->type == jbvString);

          /* Binary search on object/pair keys *only* */
!         while (stopLow < count)
          {
              int            index;
              int            difference;
              JsonbValue    candidate;

              /*
!              * Note how we compensate for the fact that we're iterating
!              * through pairs (not entries) throughout.
               */
-             stopMiddle = stopLow + (count - stopLow) / 2;
-
              index = stopMiddle * 2;

              candidate.type = jbvString;
!             candidate.val.string.val = base_addr + JBE_OFF(children, index);
              candidate.val.string.len = JBE_LEN(children, index);

              difference = lengthCompareJsonbStringValue(&candidate, key);
--- 301,370 ----
      if (flags & JB_FARRAY & container->header)
      {
          char       *base_addr = (char *) (children + count);
+         uint32        offset = 0;
          int            i;

          for (i = 0; i < count; i++)
          {
!             fillJsonbValue(children, i, base_addr, offset, result);

              if (key->type == result->type)
              {
                  if (equalsJsonbScalarValue(key, result))
                      return result;
              }
+
+             offset += JBE_LEN(children, i);
          }
      }
      else if (flags & JB_FOBJECT & container->header)
      {
          /* Since this is an object, account for *Pairs* of Jentrys */
          char       *base_addr = (char *) (children + count * 2);
+         uint32       *offsets;
+         uint32        lastoff;
+         int            lastoffpos;
          uint32        stopLow = 0,
!                     stopHigh = count;

!         /* Object key passed by caller must be a string */
          Assert(key->type == jbvString);

+         /*
+          * We use a cache to avoid redundant getJsonbOffset() computations
+          * inside the search loop.  Note that count may well be zero at this
+          * point; to avoid an ugly special case for initializing lastoff and
+          * lastoffpos, we allocate one extra array element.
+          */
+         offsets = (uint32 *) palloc((count * 2 + 1) * sizeof(uint32));
+         offsets[0] = lastoff = 0;
+         lastoffpos = 0;
+
          /* Binary search on object/pair keys *only* */
!         while (stopLow < stopHigh)
          {
+             uint32        stopMiddle;
              int            index;
              int            difference;
              JsonbValue    candidate;

+             stopMiddle = stopLow + (stopHigh - stopLow) / 2;
+
              /*
!              * Compensate for the fact that we're searching through pairs (not
!              * entries).
               */
              index = stopMiddle * 2;

+             /* Update the offsets cache through at least index+1 */
+             while (lastoffpos <= index)
+             {
+                 lastoff += JBE_LEN(children, lastoffpos);
+                 offsets[++lastoffpos] = lastoff;
+             }
+
              candidate.type = jbvString;
!             candidate.val.string.val = base_addr + offsets[index];
              candidate.val.string.len = JBE_LEN(children, index);

              difference = lengthCompareJsonbStringValue(&candidate, key);
*************** findJsonbValueFromContainer(JsonbContain
*** 326,333 ****
              if (difference == 0)
              {
                  /* Found our key, return value */
!                 fillJsonbValue(children, index + 1, base_addr, result);

                  return result;
              }
              else
--- 372,382 ----
              if (difference == 0)
              {
                  /* Found our key, return value */
!                 fillJsonbValue(children, index + 1,
!                                base_addr, offsets[index + 1],
!                                result);

+                 pfree(offsets);
                  return result;
              }
              else
*************** findJsonbValueFromContainer(JsonbContain
*** 335,343 ****
                  if (difference < 0)
                      stopLow = stopMiddle + 1;
                  else
!                     count = stopMiddle;
              }
          }
      }

      /* Not found */
--- 384,394 ----
                  if (difference < 0)
                      stopLow = stopMiddle + 1;
                  else
!                     stopHigh = stopMiddle;
              }
          }
+
+         pfree(offsets);
      }

      /* Not found */
*************** getIthJsonbValueFromContainer(JsonbConta
*** 368,374 ****

      result = palloc(sizeof(JsonbValue));

!     fillJsonbValue(container->children, i, base_addr, result);

      return result;
  }
--- 419,427 ----

      result = palloc(sizeof(JsonbValue));

!     fillJsonbValue(container->children, i, base_addr,
!                    getJsonbOffset(container->children, i),
!                    result);

      return result;
  }
*************** getIthJsonbValueFromContainer(JsonbConta
*** 377,387 ****
   * A helper function to fill in a JsonbValue to represent an element of an
   * array, or a key or value of an object.
   *
   * A nested array or object will be returned as jbvBinary, ie. it won't be
   * expanded.
   */
  static void
! fillJsonbValue(JEntry *children, int index, char *base_addr, JsonbValue *result)
  {
      JEntry        entry = children[index];

--- 430,446 ----
   * A helper function to fill in a JsonbValue to represent an element of an
   * array, or a key or value of an object.
   *
+  * The node's JEntry is at children[index], and its variable-length data
+  * is at base_addr + offset.  We make the caller determine the offset since
+  * in many cases the caller can amortize the work across multiple children.
+  *
   * A nested array or object will be returned as jbvBinary, ie. it won't be
   * expanded.
   */
  static void
! fillJsonbValue(JEntry *children, int index,
!                char *base_addr, uint32 offset,
!                JsonbValue *result)
  {
      JEntry        entry = children[index];

*************** fillJsonbValue(JEntry *children, int ind
*** 392,405 ****
      else if (JBE_ISSTRING(entry))
      {
          result->type = jbvString;
!         result->val.string.val = base_addr + JBE_OFF(children, index);
!         result->val.string.len = JBE_LEN(children, index);
          Assert(result->val.string.len >= 0);
      }
      else if (JBE_ISNUMERIC(entry))
      {
          result->type = jbvNumeric;
!         result->val.numeric = (Numeric) (base_addr + INTALIGN(JBE_OFF(children, index)));
      }
      else if (JBE_ISBOOL_TRUE(entry))
      {
--- 451,464 ----
      else if (JBE_ISSTRING(entry))
      {
          result->type = jbvString;
!         result->val.string.val = base_addr + offset;
!         result->val.string.len = JBE_LENFLD(entry);
          Assert(result->val.string.len >= 0);
      }
      else if (JBE_ISNUMERIC(entry))
      {
          result->type = jbvNumeric;
!         result->val.numeric = (Numeric) (base_addr + INTALIGN(offset));
      }
      else if (JBE_ISBOOL_TRUE(entry))
      {
*************** fillJsonbValue(JEntry *children, int ind
*** 415,422 ****
      {
          Assert(JBE_ISCONTAINER(entry));
          result->type = jbvBinary;
!         result->val.binary.data = (JsonbContainer *) (base_addr + INTALIGN(JBE_OFF(children, index)));
!         result->val.binary.len = JBE_LEN(children, index) - (INTALIGN(JBE_OFF(children, index)) - JBE_OFF(children,
index));
      }
  }

--- 474,482 ----
      {
          Assert(JBE_ISCONTAINER(entry));
          result->type = jbvBinary;
!         /* Remove alignment padding from data pointer and len */
!         result->val.binary.data = (JsonbContainer *) (base_addr + INTALIGN(offset));
!         result->val.binary.len = JBE_LENFLD(entry) - (INTALIGN(offset) - offset);
      }
  }

*************** recurse:
*** 668,680 ****
               * a full conversion
               */
              val->val.array.rawScalar = (*it)->isScalar;
!             (*it)->i = 0;
              /* Set state for next call */
              (*it)->state = JBI_ARRAY_ELEM;
              return WJB_BEGIN_ARRAY;

          case JBI_ARRAY_ELEM:
!             if ((*it)->i >= (*it)->nElems)
              {
                  /*
                   * All elements within array already processed.  Report this
--- 728,741 ----
               * a full conversion
               */
              val->val.array.rawScalar = (*it)->isScalar;
!             (*it)->curIndex = 0;
!             (*it)->curDataOffset = 0;
              /* Set state for next call */
              (*it)->state = JBI_ARRAY_ELEM;
              return WJB_BEGIN_ARRAY;

          case JBI_ARRAY_ELEM:
!             if ((*it)->curIndex >= (*it)->nElems)
              {
                  /*
                   * All elements within array already processed.  Report this
*************** recurse:
*** 686,692 ****
                  return WJB_END_ARRAY;
              }

!             fillJsonbValue((*it)->children, (*it)->i++, (*it)->dataProper, val);

              if (!IsAJsonbScalar(val) && !skipNested)
              {
--- 747,758 ----
                  return WJB_END_ARRAY;
              }

!             fillJsonbValue((*it)->children, (*it)->curIndex,
!                            (*it)->dataProper, (*it)->curDataOffset,
!                            val);
!
!             (*it)->curDataOffset += JBE_LEN((*it)->children, (*it)->curIndex);
!             (*it)->curIndex++;

              if (!IsAJsonbScalar(val) && !skipNested)
              {
*************** recurse:
*** 712,724 ****
               * v->val.object.pairs is not actually set, because we aren't
               * doing a full conversion
               */
!             (*it)->i = 0;
              /* Set state for next call */
              (*it)->state = JBI_OBJECT_KEY;
              return WJB_BEGIN_OBJECT;

          case JBI_OBJECT_KEY:
!             if ((*it)->i >= (*it)->nElems)
              {
                  /*
                   * All pairs within object already processed.  Report this to
--- 778,791 ----
               * v->val.object.pairs is not actually set, because we aren't
               * doing a full conversion
               */
!             (*it)->curIndex = 0;
!             (*it)->curDataOffset = 0;
              /* Set state for next call */
              (*it)->state = JBI_OBJECT_KEY;
              return WJB_BEGIN_OBJECT;

          case JBI_OBJECT_KEY:
!             if ((*it)->curIndex >= (*it)->nElems)
              {
                  /*
                   * All pairs within object already processed.  Report this to
*************** recurse:
*** 732,738 ****
              else
              {
                  /* Return key of a key/value pair.  */
!                 fillJsonbValue((*it)->children, (*it)->i * 2, (*it)->dataProper, val);
                  if (val->type != jbvString)
                      elog(ERROR, "unexpected jsonb type as object key");

--- 799,807 ----
              else
              {
                  /* Return key of a key/value pair.  */
!                 fillJsonbValue((*it)->children, (*it)->curIndex * 2,
!                                (*it)->dataProper, (*it)->curDataOffset,
!                                val);
                  if (val->type != jbvString)
                      elog(ERROR, "unexpected jsonb type as object key");

*************** recurse:
*** 745,752 ****
              /* Set state for next call */
              (*it)->state = JBI_OBJECT_KEY;

!             fillJsonbValue((*it)->children, ((*it)->i++) * 2 + 1,
!                            (*it)->dataProper, val);

              /*
               * Value may be a container, in which case we recurse with new,
--- 814,829 ----
              /* Set state for next call */
              (*it)->state = JBI_OBJECT_KEY;

!             (*it)->curDataOffset += JBE_LEN((*it)->children,
!                                             (*it)->curIndex * 2);
!
!             fillJsonbValue((*it)->children, (*it)->curIndex * 2 + 1,
!                            (*it)->dataProper, (*it)->curDataOffset,
!                            val);
!
!             (*it)->curDataOffset += JBE_LEN((*it)->children,
!                                             (*it)->curIndex * 2 + 1);
!             (*it)->curIndex++;

              /*
               * Value may be a container, in which case we recurse with new,
*************** convertJsonbArray(StringInfo buffer, JEn
*** 1340,1353 ****
      int            totallen;
      uint32        header;

!     /* Initialize pointer into conversion buffer at this level */
      offset = buffer->len;

      padBufferToInt(buffer);

      /*
!      * Construct the header Jentry, stored in the beginning of the variable-
!      * length payload.
       */
      header = val->val.array.nElems | JB_FARRAY;
      if (val->val.array.rawScalar)
--- 1417,1431 ----
      int            totallen;
      uint32        header;

!     /* Remember where variable-length data starts for this array */
      offset = buffer->len;

+     /* Align to 4-byte boundary (any padding counts as part of my data) */
      padBufferToInt(buffer);

      /*
!      * Construct the header Jentry and store it in the beginning of the
!      * variable-length payload.
       */
      header = val->val.array.nElems | JB_FARRAY;
      if (val->val.array.rawScalar)
*************** convertJsonbArray(StringInfo buffer, JEn
*** 1358,1364 ****
      }

      appendToBuffer(buffer, (char *) &header, sizeof(uint32));
!     /* reserve space for the JEntries of the elements. */
      metaoffset = reserveFromBuffer(buffer, sizeof(JEntry) * val->val.array.nElems);

      totallen = 0;
--- 1436,1443 ----
      }

      appendToBuffer(buffer, (char *) &header, sizeof(uint32));
!
!     /* Reserve space for the JEntries of the elements. */
      metaoffset = reserveFromBuffer(buffer, sizeof(JEntry) * val->val.array.nElems);

      totallen = 0;
*************** convertJsonbArray(StringInfo buffer, JEn
*** 1368,1391 ****
          int            len;
          JEntry        meta;

          convertJsonbValue(buffer, &meta, elem, level + 1);
-         len = meta & JENTRY_POSMASK;
-         totallen += len;

!         if (totallen > JENTRY_POSMASK)
              ereport(ERROR,
                      (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                       errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
!                             JENTRY_POSMASK)));

-         if (i > 0)
-             meta = (meta & ~JENTRY_POSMASK) | totallen;
          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);
      }

      totallen = buffer->len - offset;

      /* Initialize the header of this node, in the container's JEntry array */
      *pheader = JENTRY_ISCONTAINER | totallen;
  }
--- 1447,1485 ----
          int            len;
          JEntry        meta;

+         /*
+          * Convert element, producing a JEntry and appending its
+          * variable-length data to buffer
+          */
          convertJsonbValue(buffer, &meta, elem, level + 1);

!         /*
!          * Bail out if total variable-length data exceeds what will fit in a
!          * JEntry length field.  We check this in each iteration, not just
!          * once at the end, to forestall possible integer overflow.
!          */
!         len = JBE_LENFLD(meta);
!         totallen += len;
!         if (totallen > JENTRY_LENMASK)
              ereport(ERROR,
                      (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                       errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
!                             JENTRY_LENMASK)));

          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);
      }

+     /* Total data size is everything we've appended to buffer */
      totallen = buffer->len - offset;

+     /* Check length again, since we didn't include the metadata above */
+     if (totallen > JENTRY_LENMASK)
+         ereport(ERROR,
+                 (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+                  errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
+                         JENTRY_LENMASK)));
+
      /* Initialize the header of this node, in the container's JEntry array */
      *pheader = JENTRY_ISCONTAINER | totallen;
  }
*************** convertJsonbArray(StringInfo buffer, JEn
*** 1393,1457 ****
  static void
  convertJsonbObject(StringInfo buffer, JEntry *pheader, JsonbValue *val, int level)
  {
-     uint32        header;
      int            offset;
      int            metaoffset;
      int            i;
      int            totallen;

!     /* Initialize pointer into conversion buffer at this level */
      offset = buffer->len;

      padBufferToInt(buffer);

!     /* Initialize header */
      header = val->val.object.nPairs | JB_FOBJECT;
      appendToBuffer(buffer, (char *) &header, sizeof(uint32));

!     /* reserve space for the JEntries of the keys and values */
      metaoffset = reserveFromBuffer(buffer, sizeof(JEntry) * val->val.object.nPairs * 2);

      totallen = 0;
      for (i = 0; i < val->val.object.nPairs; i++)
      {
!         JsonbPair *pair = &val->val.object.pairs[i];
!         int len;
!         JEntry meta;

!         /* put key */
          convertJsonbScalar(buffer, &meta, &pair->key);

!         len = meta & JENTRY_POSMASK;
          totallen += len;

-         if (totallen > JENTRY_POSMASK)
-             ereport(ERROR,
-                     (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-                      errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
-                             JENTRY_POSMASK)));
-
-         if (i > 0)
-             meta = (meta & ~JENTRY_POSMASK) | totallen;
          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);

!         convertJsonbValue(buffer, &meta, &pair->value, level);
!         len = meta & JENTRY_POSMASK;
!         totallen += len;

!         if (totallen > JENTRY_POSMASK)
!             ereport(ERROR,
!                     (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
!                      errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
!                             JENTRY_POSMASK)));

-         meta = (meta & ~JENTRY_POSMASK) | totallen;
          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);
      }

      totallen = buffer->len - offset;

      *pheader = JENTRY_ISCONTAINER | totallen;
  }

--- 1487,1570 ----
  static void
  convertJsonbObject(StringInfo buffer, JEntry *pheader, JsonbValue *val, int level)
  {
      int            offset;
      int            metaoffset;
      int            i;
      int            totallen;
+     uint32        header;

!     /* Remember where variable-length data starts for this object */
      offset = buffer->len;

+     /* Align to 4-byte boundary (any padding counts as part of my data) */
      padBufferToInt(buffer);

!     /*
!      * Construct the header Jentry and store it in the beginning of the
!      * variable-length payload.
!      */
      header = val->val.object.nPairs | JB_FOBJECT;
      appendToBuffer(buffer, (char *) &header, sizeof(uint32));

!     /* Reserve space for the JEntries of the keys and values. */
      metaoffset = reserveFromBuffer(buffer, sizeof(JEntry) * val->val.object.nPairs * 2);

      totallen = 0;
      for (i = 0; i < val->val.object.nPairs; i++)
      {
!         JsonbPair  *pair = &val->val.object.pairs[i];
!         int            len;
!         JEntry        meta;

!         /*
!          * Convert key, producing a JEntry and appending its variable-length
!          * data to buffer
!          */
          convertJsonbScalar(buffer, &meta, &pair->key);

!         len = JBE_LENFLD(meta);
          totallen += len;

          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);

!         /*
!          * Convert value, producing a JEntry and appending its variable-length
!          * data to buffer
!          */
!         convertJsonbValue(buffer, &meta, &pair->value, level + 1);

!         len = JBE_LENFLD(meta);
!         totallen += len;

          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);
+
+         /*
+          * Bail out if total variable-length data exceeds what will fit in a
+          * JEntry length field.  We check this in each iteration, not just
+          * once at the end, to forestall possible integer overflow.  But it
+          * should be sufficient to check once per iteration, since
+          * JENTRY_LENMASK is several bits narrower than int.
+          */
+         if (totallen > JENTRY_LENMASK)
+             ereport(ERROR,
+                     (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+                      errmsg("total size of jsonb object elements exceeds the maximum of %u bytes",
+                             JENTRY_LENMASK)));
      }

+     /* Total data size is everything we've appended to buffer */
      totallen = buffer->len - offset;

+     /* Check length again, since we didn't include the metadata above */
+     if (totallen > JENTRY_LENMASK)
+         ereport(ERROR,
+                 (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+                  errmsg("total size of jsonb object elements exceeds the maximum of %u bytes",
+                         JENTRY_LENMASK)));
+
+     /* Initialize the header of this node, in the container's JEntry array */
      *pheader = JENTRY_ISCONTAINER | totallen;
  }

diff --git a/src/include/utils/jsonb.h b/src/include/utils/jsonb.h
index 91e3e14..b9a4314 100644
*** a/src/include/utils/jsonb.h
--- b/src/include/utils/jsonb.h
*************** typedef struct JsonbValue JsonbValue;
*** 83,91 ****
   * buffer is accessed, but they can also be deep copied and passed around.
   *
   * Jsonb is a tree structure. Each node in the tree consists of a JEntry
!  * header, and a variable-length content.  The JEntry header indicates what
!  * kind of a node it is, e.g. a string or an array, and the offset and length
!  * of its variable-length portion within the container.
   *
   * The JEntry and the content of a node are not stored physically together.
   * Instead, the container array or object has an array that holds the JEntrys
--- 83,91 ----
   * buffer is accessed, but they can also be deep copied and passed around.
   *
   * Jsonb is a tree structure. Each node in the tree consists of a JEntry
!  * header and a variable-length content (possibly of zero size).  The JEntry
!  * header indicates what kind of a node it is, e.g. a string or an array,
!  * and includes the length of its variable-length portion.
   *
   * The JEntry and the content of a node are not stored physically together.
   * Instead, the container array or object has an array that holds the JEntrys
*************** typedef struct JsonbValue JsonbValue;
*** 95,133 ****
   * hold its JEntry. Hence, no JEntry header is stored for the root node.  It
   * is implicitly known that the root node must be an array or an object,
   * so we can get away without the type indicator as long as we can distinguish
!  * the two.  For that purpose, both an array and an object begins with a uint32
   * header field, which contains an JB_FOBJECT or JB_FARRAY flag.  When a naked
   * scalar value needs to be stored as a Jsonb value, what we actually store is
   * an array with one element, with the flags in the array's header field set
   * to JB_FSCALAR | JB_FARRAY.
   *
-  * To encode the length and offset of the variable-length portion of each
-  * node in a compact way, the JEntry stores only the end offset within the
-  * variable-length portion of the container node. For the first JEntry in the
-  * container's JEntry array, that equals to the length of the node data.  The
-  * begin offset and length of the rest of the entries can be calculated using
-  * the end offset of the previous JEntry in the array.
-  *
   * Overall, the Jsonb struct requires 4-bytes alignment. Within the struct,
   * the variable-length portion of some node types is aligned to a 4-byte
   * boundary, while others are not. When alignment is needed, the padding is
   * in the beginning of the node that requires it. For example, if a numeric
   * node is stored after a string node, so that the numeric node begins at
   * offset 3, the variable-length portion of the numeric node will begin with
!  * one padding byte.
   */

  /*
   * Jentry format.
   *
!  * The least significant 28 bits store the end offset of the entry (see
!  * JBE_ENDPOS, JBE_OFF, JBE_LEN macros below). The next three bits
!  * are used to store the type of the entry. The most significant bit
!  * is unused, and should be set to zero.
   */
  typedef uint32 JEntry;

! #define JENTRY_POSMASK            0x0FFFFFFF
  #define JENTRY_TYPEMASK            0x70000000

  /* values stored in the type bits */
--- 95,126 ----
   * hold its JEntry. Hence, no JEntry header is stored for the root node.  It
   * is implicitly known that the root node must be an array or an object,
   * so we can get away without the type indicator as long as we can distinguish
!  * the two.  For that purpose, both an array and an object begin with a uint32
   * header field, which contains an JB_FOBJECT or JB_FARRAY flag.  When a naked
   * scalar value needs to be stored as a Jsonb value, what we actually store is
   * an array with one element, with the flags in the array's header field set
   * to JB_FSCALAR | JB_FARRAY.
   *
   * Overall, the Jsonb struct requires 4-bytes alignment. Within the struct,
   * the variable-length portion of some node types is aligned to a 4-byte
   * boundary, while others are not. When alignment is needed, the padding is
   * in the beginning of the node that requires it. For example, if a numeric
   * node is stored after a string node, so that the numeric node begins at
   * offset 3, the variable-length portion of the numeric node will begin with
!  * one padding byte so that the actual numeric data is 4-byte aligned.
   */

  /*
   * Jentry format.
   *
!  * The least significant 28 bits store the data length of the entry (see
!  * JBE_LENFLD and JBE_LEN macros below). The next three bits store the type
!  * of the entry. The most significant bit is reserved for future use, and
!  * should be set to zero.
   */
  typedef uint32 JEntry;

! #define JENTRY_LENMASK            0x0FFFFFFF
  #define JENTRY_TYPEMASK            0x70000000

  /* values stored in the type bits */
*************** typedef uint32 JEntry;
*** 148,160 ****
  #define JBE_ISBOOL(je_)            (JBE_ISBOOL_TRUE(je_) || JBE_ISBOOL_FALSE(je_))

  /*
!  * Macros for getting the offset and length of an element. Note multiple
!  * evaluations and access to prior array element.
   */
! #define JBE_ENDPOS(je_)            ((je_) & JENTRY_POSMASK)
! #define JBE_OFF(ja, i)            ((i) == 0 ? 0 : JBE_ENDPOS((ja)[i - 1]))
! #define JBE_LEN(ja, i)            ((i) == 0 ? JBE_ENDPOS((ja)[i]) \
!                                  : JBE_ENDPOS((ja)[i]) - JBE_ENDPOS((ja)[i - 1]))

  /*
   * A jsonb array or object node, within a Jsonb Datum.
--- 141,150 ----
  #define JBE_ISBOOL(je_)            (JBE_ISBOOL_TRUE(je_) || JBE_ISBOOL_FALSE(je_))

  /*
!  * Macros for getting the data length of a JEntry.
   */
! #define JBE_LENFLD(je_)            ((je_) & JENTRY_LENMASK)
! #define JBE_LEN(ja, i)            JBE_LENFLD((ja)[i])

  /*
   * A jsonb array or object node, within a Jsonb Datum.
*************** typedef struct JsonbIterator
*** 287,306 ****
  {
      /* Container being iterated */
      JsonbContainer *container;
!     uint32        nElems;            /* Number of elements in children array (will be
!                                  * nPairs for objects) */
      bool        isScalar;        /* Pseudo-array scalar value? */
!     JEntry       *children;

      /* Current item in buffer (up to nElems, but must * 2 for objects) */
!     int            i;

!     /*
!      * Data proper.  This points just past end of children array.
!      * We use the JBE_OFF() macro on the Jentrys to find offsets of each
!      * child in this area.
!      */
!     char       *dataProper;

      /* Private state */
      JsonbIterState state;
--- 277,294 ----
  {
      /* Container being iterated */
      JsonbContainer *container;
!     uint32        nElems;            /* Number of elements in children array (will
!                                  * be nPairs for objects) */
      bool        isScalar;        /* Pseudo-array scalar value? */
!     JEntry       *children;        /* JEntrys for child nodes */
!     /* Data proper.  This points just past end of children array */
!     char       *dataProper;

      /* Current item in buffer (up to nElems, but must * 2 for objects) */
!     int            curIndex;

!     /* Data offset corresponding to current item */
!     uint32        curDataOffset;

      /* Private state */
      JsonbIterState state;
*************** extern Datum gin_consistent_jsonb_path(P
*** 344,349 ****
--- 332,338 ----
  extern Datum gin_triconsistent_jsonb_path(PG_FUNCTION_ARGS);

  /* Support functions */
+ extern uint32 getJsonbOffset(const JEntry *ja, int index);
  extern int    compareJsonbContainers(JsonbContainer *a, JsonbContainer *b);
  extern JsonbValue *findJsonbValueFromContainer(JsonbContainer *sheader,
                              uint32 flags,

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [patch] pg_copy - a command for reliable WAL archiving
Следующее
От: Thom Brown
Дата:
Сообщение: Re: [GSoC2014] Patch ALTER TABLE ... SET LOGGED