Re: jsonb format is pessimal for toast compression

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: jsonb format is pessimal for toast compression
Дата
Msg-id 14953.1407977550@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: jsonb format is pessimal for toast compression  (Bruce Momjian <bruce@momjian.us>)
Ответы Re: jsonb format is pessimal for toast compression
Список pgsql-hackers
Bruce Momjian <bruce@momjian.us> writes:
> Seems we have two issues:
> 1)  the header makes testing for compression likely to fail
> 2)  use of pointers rather than offsets reduces compression potential

> I understand we are focusing on #1, but how much does compression reduce
> the storage size with and without #2?  Seems we need to know that answer
> before deciding if it is worth reducing the ability to do fast lookups
> with #2.

That's a fair question.  I did a very very simple hack to replace the item
offsets with item lengths -- turns out that that mostly requires removing
some code that changes lengths to offsets ;-).  I then loaded up Larry's
example of a noncompressible JSON value, and compared pg_column_size()
which is just about the right thing here since it reports datum size after
compression.  Remembering that the textual representation is 12353 bytes:

json:                382 bytes
jsonb, using offsets:        12593 bytes
jsonb, using lengths:        406 bytes

So this confirms my suspicion that the choice of offsets not lengths
is what's killing compressibility.  If it used lengths, jsonb would be
very nearly as compressible as the original text.

Hack attached in case anyone wants to collect more thorough statistics.
We'd not actually want to do it like this because of the large expense
of recomputing the offsets on-demand all the time.  (It does pass the
regression tests, for what that's worth.)

            regards, tom lane

diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 04f35bf..2297504 100644
*** a/src/backend/utils/adt/jsonb_util.c
--- b/src/backend/utils/adt/jsonb_util.c
*************** convertJsonbArray(StringInfo buffer, JEn
*** 1378,1385 ****
                       errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
                              JENTRY_POSMASK)));

-         if (i > 0)
-             meta = (meta & ~JENTRY_POSMASK) | totallen;
          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);
      }
--- 1378,1383 ----
*************** convertJsonbObject(StringInfo buffer, JE
*** 1430,1437 ****
                       errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
                              JENTRY_POSMASK)));

-         if (i > 0)
-             meta = (meta & ~JENTRY_POSMASK) | totallen;
          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);

--- 1428,1433 ----
*************** convertJsonbObject(StringInfo buffer, JE
*** 1445,1451 ****
                       errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
                              JENTRY_POSMASK)));

-         meta = (meta & ~JENTRY_POSMASK) | totallen;
          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);
      }
--- 1441,1446 ----
*************** uniqueifyJsonbObject(JsonbValue *object)
*** 1592,1594 ****
--- 1587,1600 ----
          object->val.object.nPairs = res + 1 - object->val.object.pairs;
      }
  }
+
+ uint32
+ jsonb_get_offset(const JEntry *ja, int index)
+ {
+     uint32    off = 0;
+     int i;
+
+     for (i = 0; i < index; i++)
+         off += JBE_LEN(ja, i);
+     return off;
+ }
diff --git a/src/include/utils/jsonb.h b/src/include/utils/jsonb.h
index 5f2594b..c9b18e1 100644
*** a/src/include/utils/jsonb.h
--- b/src/include/utils/jsonb.h
*************** typedef uint32 JEntry;
*** 153,162 ****
   * Macros for getting the offset and length of an element. Note multiple
   * evaluations and access to prior array element.
   */
! #define JBE_ENDPOS(je_)            ((je_) & JENTRY_POSMASK)
! #define JBE_OFF(ja, i)            ((i) == 0 ? 0 : JBE_ENDPOS((ja)[i - 1]))
! #define JBE_LEN(ja, i)            ((i) == 0 ? JBE_ENDPOS((ja)[i]) \
!                                  : JBE_ENDPOS((ja)[i]) - JBE_ENDPOS((ja)[i - 1]))

  /*
   * A jsonb array or object node, within a Jsonb Datum.
--- 153,163 ----
   * Macros for getting the offset and length of an element. Note multiple
   * evaluations and access to prior array element.
   */
! #define JBE_LENFLD(je_)            ((je_) & JENTRY_POSMASK)
! #define JBE_OFF(ja, i)            jsonb_get_offset(ja, i)
! #define JBE_LEN(ja, i)            JBE_LENFLD((ja)[i])
!
! extern uint32 jsonb_get_offset(const JEntry *ja, int index);

  /*
   * A jsonb array or object node, within a Jsonb Datum.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: replication commands and log_statements
Следующее
От: Tom Lane
Дата:
Сообщение: Re: jsonb format is pessimal for toast compression