Обсуждение: JSON vs. JSONB storage size
I recently stumbled over the presentation "How to Use JSON in MySQL Wrong" by Bill Karwin[1]
While most of the indexing part simply doesn't apply to Postgres, I was curious about the statement that the data type
ofa json value inside the json matters as well (Slide 56)
Apparently in MySQL storing {"a": 123456789} takes less space than {"a": '123456789'}
So I tested that with Postgres both using json and jsonb - my expectation was, that this would be similar in Postgres
aswell.
However, it turned out that for a json column there was no difference at all (both versions would show up the same with
pg_total_relation_size())
The table size with jsonb was bigger in general, but the one with the "integer" value was even bigger than the one with
the"string" storage.
The following little test script:
create table json_length_test1 (id serial primary key, d json);
insert into json_length_test1
select i, jsonb_build_object('a', 1234567890)
from generate_series(1,1e6) t(i);
create table json_length_test2 (id serial primary key, d json);
insert into json_length_test2
select i, jsonb_build_object('a', '1234567890')
from generate_series(1,1e6) t(i);
create table jsonb_length_test1 (id serial primary key, d jsonb);
insert into jsonb_length_test1
select i, jsonb_build_object('a', 1234567890)
from generate_series(1,1e6) t(i);
create table jsonb_length_test2 (id serial primary key, d jsonb);
insert into jsonb_length_test2
select i, jsonb_build_object('a', '1234567890')
from generate_series(1,1e6) t(i);
select 'json', pg_size_pretty(pg_total_relation_size('json_length_test1')) as json_int_size,
pg_size_pretty(pg_total_relation_size('json_length_test2')) as json_text_size
union all
select 'jsonb', pg_size_pretty(pg_total_relation_size('jsonb_length_test1')) as json_int_size,
pg_size_pretty(pg_total_relation_size('jsonb_length_test2')) as json_text_size
Returns (Postgres 12, Windows 10)
?column? | json_int_size | json_text_size
---------+---------------+---------------
json | 71 MB | 71 MB
jsonb | 87 MB | 79 MB
I am a bit surprised by this (not because the jsonb sizes are generally bigger, but that the string value takes less
space)
Is this caused by the fact that a string value compresses better internally?
Thomas
[1] https://www.slideshare.net/billkarwin/how-to-use-json-in-mysql-wrong
> On Fri, Oct 11, 2019 at 1:40 PM Thomas Kellerer <spam_eater@gmx.net> wrote: > > I am a bit surprised by this (not because the jsonb sizes are generally > bigger, but that the string value takes less space) > > Is this caused by the fact that a string value compresses better internally? Those jsonb objects are quite small, so it could be that an alignment kicks in, since as far as I remember, jsonb header and data should be aligned by 4 byte boundary.
On 10/11/19 4:40 AM, Thomas Kellerer wrote:
> I recently stumbled over the presentation "How to Use JSON in MySQL Wrong" by Bill Karwin[1]
>
> While most of the indexing part simply doesn't apply to Postgres, I was curious about the statement that the data
typeof a json value inside the json matters as well (Slide 56)
>
> Apparently in MySQL storing {"a": 123456789} takes less space than {"a": '123456789'}
>
> So I tested that with Postgres both using json and jsonb - my expectation was, that this would be similar in Postgres
aswell.
>
> However, it turned out that for a json column there was no difference at all (both versions would show up the same
withpg_total_relation_size())
>
> The table size with jsonb was bigger in general, but the one with the "integer" value was even bigger than the one
withthe "string" storage.
>
> The following little test script:
>
> create table json_length_test1 (id serial primary key, d json);
> insert into json_length_test1
> select i, jsonb_build_object('a', 1234567890)
> from generate_series(1,1e6) t(i);
>
> create table json_length_test2 (id serial primary key, d json);
> insert into json_length_test2
> select i, jsonb_build_object('a', '1234567890')
> from generate_series(1,1e6) t(i);
>
> create table jsonb_length_test1 (id serial primary key, d jsonb);
> insert into jsonb_length_test1
> select i, jsonb_build_object('a', 1234567890)
> from generate_series(1,1e6) t(i);
>
> create table jsonb_length_test2 (id serial primary key, d jsonb);
> insert into jsonb_length_test2
> select i, jsonb_build_object('a', '1234567890')
> from generate_series(1,1e6) t(i);
>
> select 'json', pg_size_pretty(pg_total_relation_size('json_length_test1')) as json_int_size,
> pg_size_pretty(pg_total_relation_size('json_length_test2')) as json_text_size
> union all
> select 'jsonb', pg_size_pretty(pg_total_relation_size('jsonb_length_test1')) as json_int_size,
> pg_size_pretty(pg_total_relation_size('jsonb_length_test2')) as json_text_size
>
>
> Returns (Postgres 12, Windows 10)
>
> ?column? | json_int_size | json_text_size
> ---------+---------------+---------------
> json | 71 MB | 71 MB
> jsonb | 87 MB | 79 MB
>
> I am a bit surprised by this (not because the jsonb sizes are generally bigger, but that the string value takes less
space)
>
> Is this caused by the fact that a string value compresses better internally?
Not sure if it applies here:
https://www.postgresql.org/docs/11/datatype-json.html
"When converting textual JSON input into jsonb, the primitive types
described by RFC 7159 are effectively mapped onto native PostgreSQL
types, as shown in Table 8.23. ..."
Table 8.23.
JSON primitive type PostgreSQL type Notes
...
number numeric NaN and infinity values are disallowed
...
>
>
> Thomas
>
> [1] https://www.slideshare.net/billkarwin/how-to-use-json-in-mysql-wrong
>
>
>
--
Adrian Klaver
adrian.klaver@aklaver.com
>>>>> "Thomas" == Thomas Kellerer <spam_eater@gmx.net> writes: Thomas> The table size with jsonb was bigger in general, but the one Thomas> with the "integer" value was even bigger than the one with the Thomas> "string" storage. jsonb stores numeric values as "numeric", not as integers or floats, so the storage needed will depend on the number of decimal digits. The size results you're seeing are mainly the consequence of the fact that jsonb stores the whole Numeric datum, varlena header included (and without packing the header), so there's an extra 4 bytes you might not have accounted for: 1234567890 is three numeric "digits" (2 bytes each) plus a 2 byte numeric header (for weight/scale/sign) plus the 4 byte varlena header, for 12 bytes total, whereas "1234567890" takes only 10 (since the length is encoded in the jsonb value offsets). Furthermore, there may be up to 3 padding bytes before the numeric value. I think in your test, the extra 3 bytes is pushing the size of a single row up to the next multiple of MAXALIGN, so you're getting slightly fewer rows per page. I don't know what Windows is doing, but on my system (freebsd amd64) I get 136 rows/page vs. 120 rows/page, which would make a million rows take 57MB or 65MB. (Your use of pg_total_relation_size is including the pkey index, which confuses the results a bit.) -- Andrew (irc:RhodiumToad)