Re: jsonb format is pessimal for toast compression

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: jsonb format is pessimal for toast compression
Дата
Msg-id 541C242E.3030004@vmware.com
обсуждение исходный текст
Ответ на Re: jsonb format is pessimal for toast compression  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Ответы Re: jsonb format is pessimal for toast compression  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: jsonb format is pessimal for toast compression  (Peter Geoghegan <pg@heroku.com>)
Re: jsonb format is pessimal for toast compression  (Andres Freund <andres@2ndquadrant.com>)
Список pgsql-hackers
On 09/18/2014 09:27 PM, Heikki Linnakangas wrote:
> On 09/18/2014 07:53 PM, Josh Berkus wrote:
>> On 09/16/2014 08:45 PM, Tom Lane wrote:
>>> We're somewhat comparing apples and oranges here, in that I pushed my
>>> approach to something that I think is of committable quality (and which,
>>> not incidentally, fixes some existing bugs that we'd need to fix in any
>>> case); while Heikki's patch was just proof-of-concept.  It would be worth
>>> pushing Heikki's patch to committable quality so that we had a more
>>> complete understanding of just what the complexity difference really is.
>>
>> Is anyone actually working on this?
>>
>> If not, I'm voting for the all-lengths patch so that we can get 9.4 out
>> the door.
>
> I'll try to write a more polished patch tomorrow. We'll then see what it
> looks like, and can decide if we want it.

Ok, here are two patches. One is a refined version of my earlier patch,
and the other implements the separate offsets array approach. They are
both based on Tom's jsonb-lengths-merged.patch, so they include all the
whitespace fixes etc. he mentioned.

There is no big difference in terms of code complexity between the
patches. IMHO the separate offsets array is easier to understand, but it
makes for more complicated accessor macros to find the beginning of the
variable-length data.

Unlike Tom's patch, these patches don't cache any offsets when doing a
binary search. Doesn't seem worth it, when the access time is O(1) anyway.

Both of these patches have a #define JB_OFFSET_STRIDE for the "stride
size". For the separate offsets array, the offsets array has one element
for every JB_OFFSET_STRIDE children. For the other patch, every
JB_OFFSET_STRIDE child stores the end offset, while others store the
length. A smaller value makes random access faster, at the cost of
compressibility / on-disk size. I haven't done any measurements to find
the optimal value, the values in the patches are arbitrary.

I think we should bite the bullet and break compatibility with 9.4beta2
format, even if we go with "my patch". In a jsonb object, it makes sense
to store all the keys first, like Tom did, because of cache benefits,
and the future possibility to do smart EXTERNAL access. Also, even if we
can make the on-disk format compatible, it's weird that you can get
different runtime behavior with datums created with a beta version.
Seems more clear to just require a pg_dump + restore.

Tom: You mentioned earlier that your patch fixes some existing bugs.
What were they? There were a bunch of whitespace and comment fixes that
we should apply in any case, but I couldn't see any actual bugs. I think
we should apply those fixes separately, to make sure we don't forget
about them, and to make it easier to review these patches.

- Heikki


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Rahila Syed
Дата:
Сообщение: Re: [REVIEW] Re: Compression of full-page-writes
Следующее
От: Tom Lane
Дата:
Сообщение: Re: GCC memory barriers are missing "cc" clobbers