Re: mcvstats serialization code is still shy of a load

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: mcvstats serialization code is still shy of a load
Дата
Msg-id 20190629141312.bo6armpsdf65gy6m@development
обсуждение исходный текст
Ответ на Re: mcvstats serialization code is still shy of a load  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: mcvstats serialization code is still shy of a load  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Thu, Jun 27, 2019 at 01:26:32PM +0200, Tomas Vondra wrote:
>On Thu, Jun 27, 2019 at 12:04:30AM -0400, Tom Lane wrote:
>>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>>>OK. Attached is a patch ditching the alignment in serialized data. I've
>>>ditched the macros to access parts of serialized data, and everything
>>>gets copied.
>>
>>I lack energy to actually read this patch right now, and I don't currently
>>have an opinion about whether it's worth another catversion bump to fix
>>this stuff in v12.  But I did test the patch, and I can confirm it gets
>>through the core regression tests on hppa (both gaur's host environment
>>with gcc 3.4.6, and the OpenBSD installation with gcc 4.2.1).
>>
>
>Thanks for running it through regression tests, that alone is a very
>useful piece of information for me.
>
>As for the catversion bump - I'd probably vote to do it. Not just because
>of this serialization stuff, but to fix the pg_mcv_list_items function.
>It's not something I'm very enthusiastic about (kinda embarassed about it,
>really), but it seems better than shipping something that we'll need to
>rework in PG13.
>

Attached is a slightly improved version of the serialization patch. The
main difference is that when serializing varlena values, the previous
patch version stored

    length (uint32) + full varlena (incl. the header)

which is kinda redundant, because the varlena stores the length too. So
now it only stores the length + data, without the varlena header. I
don't think there's a better way to store varlena values without
enforcing alignment (which is what happens in current master).

There's one additional change I failed to mention before - I had to add
another field to DimensionInfo, tracking how much space will be needed
for deserialized data. This is needed because the deserialization
allocates the whole MCV as a single chunk of memory, to reduce palloc
overhead. It could parse the data twice (first to determine the space,
then to actually parse it), this allows doing just a single pass. Which
seems useful for large MCV lists, but maybe it's not worth it?

Barring objections I'll commit this together with the pg_mcv_list_items
fix, posted in a separate thread. Of course, this requires catversion
bump - an alternative would be to keep enforcing the alignment, but
tweak the macros to work on all platforms without SIGBUS.

Considering how troublesome this serialiation part of the patch turner
out to be, I'm not really sure by anything at this point. So I'd welcome
thoughts about the proposed changes.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: TM format can mix encodings in to_char()
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: Avoid full GIN index scan when possible