Re: jsonb format is pessimal for toast compression
От | Tom Lane |
---|---|
Тема | Re: jsonb format is pessimal for toast compression |
Дата | |
Msg-id | 10010.1407510146@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: jsonb format is pessimal for toast compression (Stephen Frost <sfrost@snowman.net>) |
Ответы |
Re: jsonb format is pessimal for toast compression
(John W Higgins <wishdev@gmail.com>)
Re: jsonb format is pessimal for toast compression (Bruce Momjian <bruce@momjian.us>) Re: jsonb format is pessimal for toast compression (Stephen Frost <sfrost@snowman.net>) |
Список | pgsql-hackers |
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> I looked into the issue reported in bug #11109. The problem appears to be >> that jsonb's on-disk format is designed in such a way that the leading >> portion of any JSON array or object will be fairly incompressible, because >> it consists mostly of a strictly-increasing series of integer offsets. >> This interacts poorly with the code in pglz_compress() that gives up if >> it's found nothing compressible in the first first_success_by bytes of a >> value-to-be-compressed. (first_success_by is 1024 in the default set of >> compression parameters.) > I haven't looked at this in any detail, so take this with a grain of > salt, but what about teaching pglz_compress about using an offset > farther into the data, if the incoming data is quite a bit larger than > 1k? This is just a test to see if it's worthwhile to keep going, no? Well, the point of the existing approach is that it's a *nearly free* test to see if it's worthwhile to keep going; there's just one if-test added in the outer loop of the compression code. (cf commit ad434473ebd2, which added that along with some other changes.) AFAICS, what we'd have to do to do it as you suggest would to execute compression on some subset of the data and then throw away that work entirely. I do not find that attractive, especially when for most datatypes there's no particular reason to look at one subset instead of another. > I'm rather disinclined to change the on-disk format because of this > specific test, that feels a bit like the tail wagging the dog to me, > especially as I do hope that some day we'll figure out a way to use a > better compression algorithm than pglz. I'm unimpressed by that argument too, for a number of reasons: 1. The real problem here is that jsonb is emitting quite a bit of fundamentally-nonrepetitive data, even when the user-visible input is very repetitive. That's a compression-unfriendly transformation by anyone's measure. Assuming that some future replacement for pg_lzcompress() will nonetheless be able to compress the data strikes me as mostly wishful thinking. Besides, we'd more than likely have a similar early-exit rule in any substitute implementation, so that we'd still be at risk even if it usually worked. 2. Are we going to ship 9.4 without fixing this? I definitely don't see replacing pg_lzcompress as being on the agenda for 9.4, whereas changing jsonb is still within the bounds of reason. Considering all the hype that's built up around jsonb, shipping a design with a fundamental performance handicap doesn't seem like a good plan to me. We could perhaps band-aid around it by using different compression parameters for jsonb, although that would require some painful API changes since toast_compress_datum() doesn't know what datatype it's operating on. regards, tom lane
В списке pgsql-hackers по дате отправления: