Re: jsonb format is pessimal for toast compression

Поиск
Список
Период
Сортировка
От Andrew Dunstan
Тема Re: jsonb format is pessimal for toast compression
Дата
Msg-id 53E4EE5F.5090904@dunslane.net
обсуждение исходный текст
Ответ на Re: jsonb format is pessimal for toast compression  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: jsonb format is pessimal for toast compression  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On 08/08/2014 11:18 AM, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> On 08/07/2014 11:17 PM, Tom Lane wrote:
>>> I looked into the issue reported in bug #11109.  The problem appears to be
>>> that jsonb's on-disk format is designed in such a way that the leading
>>> portion of any JSON array or object will be fairly incompressible, because
>>> it consists mostly of a strictly-increasing series of integer offsets.
>
>> Back when this structure was first presented at pgCon 2013, I wondered
>> if we shouldn't extract the strings into a dictionary, because of key
>> repetition, and convinced myself that this shouldn't be necessary
>> because in significant cases TOAST would take care of it.
> That's not really the issue here, I think.  The problem is that a
> relatively minor aspect of the representation, namely the choice to store
> a series of offsets rather than a series of lengths, produces
> nonrepetitive data even when the original input is repetitive.


It would certainly be worth validating that changing this would fix the 
problem.

I don't know how invasive that would be - I suspect (without looking 
very closely) not terribly much.

> 2. Are we going to ship 9.4 without fixing this?  I definitely don't see
> replacing pg_lzcompress as being on the agenda for 9.4, whereas changing
> jsonb is still within the bounds of reason.
>
> Considering all the hype that's built up around jsonb, shipping a design
> with a fundamental performance handicap doesn't seem like a good plan
> to me.  We could perhaps band-aid around it by using different compression
> parameters for jsonb, although that would require some painful API changes
> since toast_compress_datum() doesn't know what datatype it's operating on.
>
>             


Yeah, it would be a bit painful, but after all finding out this sort of 
thing is why we have betas.


cheers

andrew



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: replication commands and log_statements
Следующее
От: David Rowley
Дата:
Сообщение: Defining a foreign key with a duplicate column is broken