Re: mcvstats serialization code is still shy of a load
От | Tomas Vondra |
---|---|
Тема | Re: mcvstats serialization code is still shy of a load |
Дата | |
Msg-id | 20190629141312.bo6armpsdf65gy6m@development обсуждение исходный текст |
Ответ на | Re: mcvstats serialization code is still shy of a load (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Ответы |
Re: mcvstats serialization code is still shy of a load
|
Список | pgsql-hackers |
On Thu, Jun 27, 2019 at 01:26:32PM +0200, Tomas Vondra wrote: >On Thu, Jun 27, 2019 at 12:04:30AM -0400, Tom Lane wrote: >>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >>>OK. Attached is a patch ditching the alignment in serialized data. I've >>>ditched the macros to access parts of serialized data, and everything >>>gets copied. >> >>I lack energy to actually read this patch right now, and I don't currently >>have an opinion about whether it's worth another catversion bump to fix >>this stuff in v12. But I did test the patch, and I can confirm it gets >>through the core regression tests on hppa (both gaur's host environment >>with gcc 3.4.6, and the OpenBSD installation with gcc 4.2.1). >> > >Thanks for running it through regression tests, that alone is a very >useful piece of information for me. > >As for the catversion bump - I'd probably vote to do it. Not just because >of this serialization stuff, but to fix the pg_mcv_list_items function. >It's not something I'm very enthusiastic about (kinda embarassed about it, >really), but it seems better than shipping something that we'll need to >rework in PG13. > Attached is a slightly improved version of the serialization patch. The main difference is that when serializing varlena values, the previous patch version stored length (uint32) + full varlena (incl. the header) which is kinda redundant, because the varlena stores the length too. So now it only stores the length + data, without the varlena header. I don't think there's a better way to store varlena values without enforcing alignment (which is what happens in current master). There's one additional change I failed to mention before - I had to add another field to DimensionInfo, tracking how much space will be needed for deserialized data. This is needed because the deserialization allocates the whole MCV as a single chunk of memory, to reduce palloc overhead. It could parse the data twice (first to determine the space, then to actually parse it), this allows doing just a single pass. Which seems useful for large MCV lists, but maybe it's not worth it? Barring objections I'll commit this together with the pg_mcv_list_items fix, posted in a separate thread. Of course, this requires catversion bump - an alternative would be to keep enforcing the alignment, but tweak the macros to work on all platforms without SIGBUS. Considering how troublesome this serialiation part of the patch turner out to be, I'm not really sure by anything at this point. So I'd welcome thoughts about the proposed changes. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
В списке pgsql-hackers по дате отправления: