Re: jsonb format is pessimal for toast compression

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: jsonb format is pessimal for toast compression
Дата	8 августа 2014 г. 18:02:36
Msg-id	10010.1407510146@sss.pgh.pa.us обсуждение исходный текст
Ответ на	Re: jsonb format is pessimal for toast compression (Stephen Frost <sfrost@snowman.net>)
Ответы	Re: jsonb format is pessimal for toast compression (John W Higgins <wishdev@gmail.com>) Re: jsonb format is pessimal for toast compression (Bruce Momjian <bruce@momjian.us>) Re: jsonb format is pessimal for toast compression (Stephen Frost <sfrost@snowman.net>)
Список	pgsql-hackers

Дерево обсуждения

Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> I looked into the issue reported in bug #11109.  The problem appears to be
>> that jsonb's on-disk format is designed in such a way that the leading
>> portion of any JSON array or object will be fairly incompressible, because
>> it consists mostly of a strictly-increasing series of integer offsets.
>> This interacts poorly with the code in pglz_compress() that gives up if
>> it's found nothing compressible in the first first_success_by bytes of a
>> value-to-be-compressed.  (first_success_by is 1024 in the default set of
>> compression parameters.)

> I haven't looked at this in any detail, so take this with a grain of
> salt, but what about teaching pglz_compress about using an offset
> farther into the data, if the incoming data is quite a bit larger than
> 1k?  This is just a test to see if it's worthwhile to keep going, no?

Well, the point of the existing approach is that it's a *nearly free*
test to see if it's worthwhile to keep going; there's just one if-test
added in the outer loop of the compression code.  (cf commit ad434473ebd2,
which added that along with some other changes.)  AFAICS, what we'd have
to do to do it as you suggest would to execute compression on some subset
of the data and then throw away that work entirely.  I do not find that
attractive, especially when for most datatypes there's no particular
reason to look at one subset instead of another.

> I'm rather disinclined to change the on-disk format because of this
> specific test, that feels a bit like the tail wagging the dog to me,
> especially as I do hope that some day we'll figure out a way to use a
> better compression algorithm than pglz.

I'm unimpressed by that argument too, for a number of reasons:

1. The real problem here is that jsonb is emitting quite a bit of
fundamentally-nonrepetitive data, even when the user-visible input is very
repetitive.  That's a compression-unfriendly transformation by anyone's
measure.  Assuming that some future replacement for pg_lzcompress() will
nonetheless be able to compress the data strikes me as mostly wishful
thinking.  Besides, we'd more than likely have a similar early-exit rule
in any substitute implementation, so that we'd still be at risk even if
it usually worked.

2. Are we going to ship 9.4 without fixing this?  I definitely don't see
replacing pg_lzcompress as being on the agenda for 9.4, whereas changing
jsonb is still within the bounds of reason.

Considering all the hype that's built up around jsonb, shipping a design
with a fundamental performance handicap doesn't seem like a good plan
to me.  We could perhaps band-aid around it by using different compression
parameters for jsonb, although that would require some painful API changes
since toast_compress_datum() doesn't know what datatype it's operating on.
        regards, tom lane

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Bruce Momjian
Дата: 08 августа 2014 г., 18:01:07
Сообщение: Re: replication commands and log_statements

Следующее

От: Heikki Linnakangas
Дата: 08 августа 2014 г., 18:04:17
Сообщение: Re: Minmax indexes

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: jsonb format is pessimal for toast compression

Предыдущее

Следующее