Re: BUG #11109: No Toast compression on JSON, JSONB columns

Поиск
Список
Период
Сортировка
От Larry White
Тема Re: BUG #11109: No Toast compression on JSON, JSONB columns
Дата
Msg-id CAMdbzViVdr=uKhG4nBqyzF_yKtKMx96AX_0pJDwRcaRCzCMiuA@mail.gmail.com
обсуждение исходный текст
Ответ на BUG #11109: No Toast compression on JSON, JSONB columns  (ljw1001@gmail.com)
Список pgsql-bugs
Hi Tom,

Thanks again for your help.

I think I have a test case that will fail.  It seems possibly related (in part) to the encoding, but I'm not sure. 

In any case, here is the table. The 'bad.json' file is attached.  It's a 13K file that produces a 32K TOAST table

DROP TABLE document;

CREATE TABLE document
(
  owner character varying(36) NOT NULL,
  document_type character varying(36) NOT NULL,
  guid character varying(36) NOT NULL,
  schema_version numeric(7,3) NOT NULL,
  payload jsonb NOT NULL,
  last_update timestamp with time zone NOT NULL,
  payload_class character varying NOT NULL,
  instance_version character varying(36) NOT NULL,
  acl_read character varying(36)[],
  deleted boolean NOT NULL DEFAULT false,
  fts_text tsvector,
  CONSTRAINT guid_idx PRIMARY KEY (guid)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE document
  OWNER TO postgres;

ALTER TABLE document ALTER COLUMN payload SET STORAGE EXTENDED;
-- Index: document_payload_idx

-- DROP INDEX document_payload_idx;

CREATE INDEX document_payload_idx
  ON document
  USING gin
  (payload jsonb_path_ops);

-- Index: document_type_idx

-- DROP INDEX document_type_idx;

CREATE INDEX document_type_idx
  ON document
  USING btree
  (document_type COLLATE pg_catalog."default");

-- Index: fts_idx

-- DROP INDEX fts_idx;

CREATE INDEX fts_idx
  ON document
  USING gin
  (fts_text);

-- Index: owner_idx

-- DROP INDEX owner_idx;

CREATE INDEX owner_idx
  ON document
  USING btree
  (owner COLLATE pg_catalog."default");




On Thu, Aug 7, 2014 at 8:21 PM, Larry White <ljw1001@gmail.com> wrote:
Thank you for looking into this. This is hard to replicate. Some (most) files work fine. It seems that I just hit on a peculiar case.  

Actually, I can replicate it at will running through my app, but have not come up with an easily separable case that works. Even with the same data, it seems to compress ok if I insert one, but not if I do it in batches via JDBC batch updates. 

Anyway, I'm relieved to see that this isn't a general problem, but some kind of weird corner case I lucked into on my first test :). I will still try to create something reproducible and portable and send it in.


On Thu, Aug 7, 2014 at 5:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
ljw1001@gmail.com writes:
> if PG can, after compression, put an entire JSON or JSONB 'document' into
> one row/page in the toast table it does. However, if the document is too big
> to fit in one row after compression, it does no compression at all.

I can't reproduce this.

It does seem that JSONB is noticeably less compressible than the
equivalent plain-text representation, which is disappointing,
but I don't see it failing to compress at all.

I experimented with this SQL script:


create table bigjson(f1 jsonb);

-- alter table bigjson alter column f1 set storage external;

insert into bigjson values(
'{"junk":["124245etweetwet345gwtretwt43 qwrqwq qwre qwrsdflkas",
"q4535 wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrssdfsd",
"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas",
... 50000 repetitions of above two lines ...
"q4535wqrqwrqwrqw2 wrqwrqwrq32232w kswe sfasrssdfsd",
"124245etweetwet345gwtretwt43 qwrqwq qwre qw rsdflkas",
"foo"]}'
);

select pg_size_pretty(pg_table_size('bigjson'));

drop table bigjson;


I get about 568kB table size, versus 5640kB with the "external" option
uncommented, so it is compressing about 10-to-1.  If I change the column
type to text (or json) then I get a table size of 120kB, which is better.
But on the other hand we should not put a lot of stock in the compression
rate achievable on such bogus data, so I'd not panic until I saw numbers
for more realistic test data.

What might be happening for you is that your actual test case triggers the
"first_success_by" filter in pglz_compress() because the first kilobyte of
input data isn't compressible.  But since you didn't supply a fully
reproducible test case, that's just speculation.

> This is not the behavior observed with Text compression in Toast.  For very
> large files Text compression works correctly.  I got these results running
> similar tests on larger json files:

> Column Type  - Storage       - TOAST table size
> JSONB           - EXTERNAL -  2448 MB
> JSONB           - EXTENDED - 2448 MB
> JSON             - EXTENDED - 2504 MB
> TEXT             - EXTERNAL  - 2409 MB
> TEXT             - EXTENDED -      40 MB

json and text should (and do, for me) store exactly the same.  So there's
something fishy going on here.  Maybe you've changed the typstorage
values for json and jsonb in pg_type?

                        regards, tom lane


Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: BUG #11120: Decrease in no. of rows while sorting
Следующее
От: Sandeep Thakkar
Дата:
Сообщение: Re: Re: BUG #11039: installation fails when trying to install C++ redistributable