Обсуждение: mcvstats serialization code is still shy of a load

Поиск
Список
Период
Сортировка

mcvstats serialization code is still shy of a load

От
Tom Lane
Дата:
I'm seeing a reproducible bus error here:

#0  0x00417420 in statext_mcv_serialize (mcvlist=0x62223450, stats=Variable "stats" is not available.
)
    at mcv.c:785
785                     memcpy(ITEM_BASE_FREQUENCY(item, ndims), &mcvitem->base_frequency, sizeof(double));

What appears to be happening is that since ITEM_BASE_FREQUENCY is defined as

#define ITEM_BASE_FREQUENCY(item,ndims)    ((double *) (ITEM_FREQUENCY(item, ndims) + 1))

the compiler is assuming that the first argument to memcpy is
double-aligned, and it is generating code that depends on that being
true, and of course it isn't true and kaboom.

You can *not* cast something to an aligned pointer type if it's not
actually certain to be aligned suitably for that type.  In this example,
even if you wrote "(char *)" in front of this, it wouldn't save you;
the compiler would still be entitled to believe that the intermediate
cast value meant something.  The casts in the underlying macros
ITEM_FREQUENCY and so on are equally unsafe.

(For the record, this is with gcc 4.2.1 on OpenBSD/hppa 6.4.)

            regards, tom lane



Re: mcvstats serialization code is still shy of a load

От
Tomas Vondra
Дата:
On Tue, Jun 25, 2019 at 11:52:28PM -0400, Tom Lane wrote:
>I'm seeing a reproducible bus error here:
>
>#0  0x00417420 in statext_mcv_serialize (mcvlist=0x62223450, stats=Variable "stats" is not available.
>)
>    at mcv.c:785
>785                     memcpy(ITEM_BASE_FREQUENCY(item, ndims), &mcvitem->base_frequency, sizeof(double));
>
>What appears to be happening is that since ITEM_BASE_FREQUENCY is defined as
>
>#define ITEM_BASE_FREQUENCY(item,ndims)    ((double *) (ITEM_FREQUENCY(item, ndims) + 1))
>
>the compiler is assuming that the first argument to memcpy is
>double-aligned, and it is generating code that depends on that being
>true, and of course it isn't true and kaboom.
>
>You can *not* cast something to an aligned pointer type if it's not
>actually certain to be aligned suitably for that type.  In this example,
>even if you wrote "(char *)" in front of this, it wouldn't save you;
>the compiler would still be entitled to believe that the intermediate
>cast value meant something.  The casts in the underlying macros
>ITEM_FREQUENCY and so on are equally unsafe.
>

OK. So the solution is to ditch the casts altogether, and then do plain
pointer arithmetics like this:

#define ITEM_INDEXES(item)            (item)
#define ITEM_NULLS(item,ndims)        (ITEM_INDEXES(item) + (ndims))
#define ITEM_FREQUENCY(item,ndims)    (ITEM_NULLS(item, ndims) + (ndims))
#define ITEM_BASE_FREQUENCY(item,ndims)    (ITEM_FREQUENCY(item, ndims) + sizeof(double))

Or is that still relying on alignment, somehow?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: mcvstats serialization code is still shy of a load

От
Tom Lane
Дата:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On Tue, Jun 25, 2019 at 11:52:28PM -0400, Tom Lane wrote:
>> You can *not* cast something to an aligned pointer type if it's not
>> actually certain to be aligned suitably for that type.

> OK. So the solution is to ditch the casts altogether, and then do plain
> pointer arithmetics like this:

> #define ITEM_INDEXES(item)            (item)
> #define ITEM_NULLS(item,ndims)        (ITEM_INDEXES(item) + (ndims))
> #define ITEM_FREQUENCY(item,ndims)    (ITEM_NULLS(item, ndims) + (ndims))
> #define ITEM_BASE_FREQUENCY(item,ndims)    (ITEM_FREQUENCY(item, ndims) + sizeof(double))

> Or is that still relying on alignment, somehow?

No, constructs like a char* pointer plus n times sizeof(something) should
be safe.

            regards, tom lane



Re: mcvstats serialization code is still shy of a load

От
Tomas Vondra
Дата:
On Wed, Jun 26, 2019 at 09:49:46AM +0200, Tomas Vondra wrote:
>On Tue, Jun 25, 2019 at 11:52:28PM -0400, Tom Lane wrote:
>>I'm seeing a reproducible bus error here:
>>
>>#0  0x00417420 in statext_mcv_serialize (mcvlist=0x62223450, stats=Variable "stats" is not available.
>>)
>>   at mcv.c:785
>>785                     memcpy(ITEM_BASE_FREQUENCY(item, ndims), &mcvitem->base_frequency, sizeof(double));
>>
>>What appears to be happening is that since ITEM_BASE_FREQUENCY is defined as
>>
>>#define ITEM_BASE_FREQUENCY(item,ndims)    ((double *) (ITEM_FREQUENCY(item, ndims) + 1))
>>
>>the compiler is assuming that the first argument to memcpy is
>>double-aligned, and it is generating code that depends on that being
>>true, and of course it isn't true and kaboom.
>>
>>You can *not* cast something to an aligned pointer type if it's not
>>actually certain to be aligned suitably for that type.  In this example,
>>even if you wrote "(char *)" in front of this, it wouldn't save you;
>>the compiler would still be entitled to believe that the intermediate
>>cast value meant something.  The casts in the underlying macros
>>ITEM_FREQUENCY and so on are equally unsafe.
>>
>
>OK. So the solution is to ditch the casts altogether, and then do plain
>pointer arithmetics like this:
>
>#define ITEM_INDEXES(item)            (item)
>#define ITEM_NULLS(item,ndims)        (ITEM_INDEXES(item) + (ndims))
>#define ITEM_FREQUENCY(item,ndims)    (ITEM_NULLS(item, ndims) + (ndims))
>#define ITEM_BASE_FREQUENCY(item,ndims)    (ITEM_FREQUENCY(item, ndims) + sizeof(double))
>
>Or is that still relying on alignment, somehow?
>

Attached is a patch that should (hopefully) fix this. It essentially
treats the item as (char *) and does all pointer arithmetics without any
additional casts. So there are no intermediate casts.

I have no way to test this, so I may either wait for you to test this
first, or push and wait. It seems to fail only on a very small number of
buildfarm animals, so having a confirmation would be nice.

The fix keeps the binary format as is, so the serialized MCV items are
max-aligned. That means we can access the uint16 indexes directly, but we
need to copy the rest of the fields (because those may not be aligned). In
hindsight that seems a bit silly, we might as well copy everything, not
care about the alignment and maybe save a few more bytes. But that would
require catversion bump. OTOH we may beed to do that anyway, to fix the
pg_mcv_list_items() signature (as discussed in the other MCV thread).

regards 

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Вложения

Re: mcvstats serialization code is still shy of a load

От
Tom Lane
Дата:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> Attached is a patch that should (hopefully) fix this. It essentially
> treats the item as (char *) and does all pointer arithmetics without any
> additional casts. So there are no intermediate casts.

This passes the eyeball test, and it also allows my OpenBSD/hppa
installation to get through the core regression tests, so I think
it's good as far as it goes.  Please push.

However ... nosing around in mcv.c, I noticed that the next macro:

/*
 * Used to compute size of serialized MCV list representation.
 */
#define MinSizeOfMCVList        \
    (VARHDRSZ + sizeof(uint32) * 3 + sizeof(AttrNumber))

#define SizeOfMCVList(ndims,nitems)    \
    (MAXALIGN(MinSizeOfMCVList + sizeof(Oid) * (ndims)) + \
     MAXALIGN((ndims) * sizeof(DimensionInfo)) + \
     MAXALIGN((nitems) * ITEM_SIZE(ndims)))

is both woefully underdocumented and completely at variance with
reality.  It doesn't seem to be accounting for the actual data values.
No doubt this is why it's not used in the places where it'd matter;
the tests that do use it are testing much weaker conditions than they
should.

> The fix keeps the binary format as is, so the serialized MCV items are
> max-aligned. That means we can access the uint16 indexes directly, but we
> need to copy the rest of the fields (because those may not be aligned). In
> hindsight that seems a bit silly, we might as well copy everything, not
> care about the alignment and maybe save a few more bytes.

I think that part of the problem here is that the way this code is
written, "maxaligned" is no such thing.  What you're actually maxaligning
seems to be the offset from the start of the data area of a varlena value,
which is generally going to be a maxaligned palloc result plus 4 bytes.
So "aligned" double values are actually guaranteed to be on odd word
boundaries not even ones.

What's more, it's difficult to convince oneself that the maxaligns done
in different parts of the code are all enforcing the same choices about
which substructures get pseudo-maxaligned and which don't, because the
logic doesn't line up very well.

If we do need another catversion bump before v12, I'd vote for ripping
out Every Single One of the "maxalign" operations in this code, just
on the grounds of code simplicity and bug reduction.

            regards, tom lane



Re: mcvstats serialization code is still shy of a load

От
Tomas Vondra
Дата:
On Wed, Jun 26, 2019 at 11:26:21AM -0400, Tom Lane wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> Attached is a patch that should (hopefully) fix this. It essentially
>> treats the item as (char *) and does all pointer arithmetics without any
>> additional casts. So there are no intermediate casts.
>
>This passes the eyeball test, and it also allows my OpenBSD/hppa
>installation to get through the core regression tests, so I think
>it's good as far as it goes.  Please push.
>
>However ... nosing around in mcv.c, I noticed that the next macro:
>
>/*
> * Used to compute size of serialized MCV list representation.
> */
>#define MinSizeOfMCVList        \
>    (VARHDRSZ + sizeof(uint32) * 3 + sizeof(AttrNumber))
>
>#define SizeOfMCVList(ndims,nitems)    \
>    (MAXALIGN(MinSizeOfMCVList + sizeof(Oid) * (ndims)) + \
>     MAXALIGN((ndims) * sizeof(DimensionInfo)) + \
>     MAXALIGN((nitems) * ITEM_SIZE(ndims)))
>
>is both woefully underdocumented and completely at variance with
>reality.  It doesn't seem to be accounting for the actual data values.
>No doubt this is why it's not used in the places where it'd matter;
>the tests that do use it are testing much weaker conditions than they
>should.
>

I agree about the macro being underdocumented, but AFAICS it's used
correctly to check the expected length. It can't include the data values
directly, because that's variable amount of data - and it's encoded in not
yet verified part of the data.

So this only includes parts with known lengths, and then the code does
this:

    for (dim = 0; dim < ndims; dim++)
    {
        ...
        expected_size += MAXALIGN(info[dim].nbytes);
    }

and uses that to check the actual length.

    if (VARSIZE_ANY(data) != expected_size)
        elog(ERROR, ...);

That being said, maybe this is unnecessarily defensive and we should just
trust the values not being corrupted. So if we get pg_mcv_list value, we'd
simply assume it's OK.

>> The fix keeps the binary format as is, so the serialized MCV items are
>> max-aligned. That means we can access the uint16 indexes directly, but we
>> need to copy the rest of the fields (because those may not be aligned). In
>> hindsight that seems a bit silly, we might as well copy everything, not
>> care about the alignment and maybe save a few more bytes.
>
>I think that part of the problem here is that the way this code is
>written, "maxaligned" is no such thing.  What you're actually maxaligning
>seems to be the offset from the start of the data area of a varlena value,
>which is generally going to be a maxaligned palloc result plus 4 bytes.
>So "aligned" double values are actually guaranteed to be on odd word
>boundaries not even ones.
>

I don't think so. The pointers should be maxaligned with respect to the
whole varlena value, which is what 'raw' points to. At least that was the
intent of code like this:

    raw = palloc0(total_length);

    ...

    /* the header may not be exactly aligned, so make sure it is */
    ptr = raw + MAXALIGN(ptr - raw);

If it's not like that in some place, it's a bug.

>What's more, it's difficult to convince oneself that the maxaligns done
>in different parts of the code are all enforcing the same choices about
>which substructures get pseudo-maxaligned and which don't, because the
>logic doesn't line up very well.
>

Not sure. If there's a way to make it clearer, I'm ready to do the work.
Unfortunately it's hard for me to judge that, because I've spent so much
time on that code that it seems fairly clear to me.

>If we do need another catversion bump before v12, I'd vote for ripping
>out Every Single One of the "maxalign" operations in this code, just
>on the grounds of code simplicity and bug reduction.
>

Hmmm, OK. The original reason to keep the parts aligned was to be able to
reference the parts directly during processing. If we get rid of the
alignment, we'll have to memcpy everything during deserialization. But
if it makes the code simpler, it might be worth it - this part of the
code was clearly the weakest part of the patch.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: mcvstats serialization code is still shy of a load

От
Tom Lane
Дата:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On Wed, Jun 26, 2019 at 11:26:21AM -0400, Tom Lane wrote:
>> #define SizeOfMCVList(ndims,nitems)    \
>> is both woefully underdocumented and completely at variance with
>> reality.  It doesn't seem to be accounting for the actual data values.

> I agree about the macro being underdocumented, but AFAICS it's used
> correctly to check the expected length. It can't include the data values
> directly, because that's variable amount of data - and it's encoded in not
> yet verified part of the data.

Well, it should have some other name then.  Or *at least* a comment.
It's unbelievably misleading as it stands.

> That being said, maybe this is unnecessarily defensive and we should just
> trust the values not being corrupted.

No, I'm on board with checking the lengths.  I just don't like how
hard it is to discern what's being checked.

>> I think that part of the problem here is that the way this code is
>> written, "maxaligned" is no such thing.  What you're actually maxaligning
>> seems to be the offset from the start of the data area of a varlena value,

> I don't think so. The pointers should be maxaligned with respect to the
> whole varlena value, which is what 'raw' points to.

[ squint ... ]  OK, I think I misread this:

statext_mcv_deserialize(bytea *data)
{
...
    /* pointer to the data part (skip the varlena header) */
    ptr = VARDATA_ANY(data);
    raw = (char *) data;

I think this is confusing in itself --- I read it as "raw = (char *) ptr"
and I think most other people would assume that too based on the order
of operations.  It'd read better as

    /* remember start of datum for maxalign reference */
    raw = (char *) data;

    /* pointer to the data part (skip the varlena header) */
    ptr = VARDATA_ANY(data);

Another problem with this code is that it flat doesn't work for
non-4-byte-header varlenas: it'd do the alignment differently than the
serialization side did.  That's okay given that the two extant call sites
are guaranteed to pass detoasted datums.  But using VARDATA_ANY gives a
completely misleading appearance of being ready to deal with short-header
varlenas, and heaven forbid there should be any comment to discourage
future coders from trying.  So really what I'd like to see here is

    /* remember start of datum for maxalign reference */
    raw = (char *) data;

    /* alignment logic assumes full-size datum header */
    Assert(VARATT_IS_4B(data));

    /* pointer to the data part (skip the varlena header) */
    ptr = VARDATA_ANY(data);

Or, of course, this could all go away if we got rid of the
bogus maxaligning...

            regards, tom lane



Re: mcvstats serialization code is still shy of a load

От
Tomas Vondra
Дата:
On Wed, Jun 26, 2019 at 12:31:13PM -0400, Tom Lane wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> On Wed, Jun 26, 2019 at 11:26:21AM -0400, Tom Lane wrote:
>>> #define SizeOfMCVList(ndims,nitems)    \
>>> is both woefully underdocumented and completely at variance with
>>> reality.  It doesn't seem to be accounting for the actual data values.
>
>> I agree about the macro being underdocumented, but AFAICS it's used
>> correctly to check the expected length. It can't include the data values
>> directly, because that's variable amount of data - and it's encoded in not
>> yet verified part of the data.
>
>Well, it should have some other name then.  Or *at least* a comment.
>It's unbelievably misleading as it stands.
>

True.

>> That being said, maybe this is unnecessarily defensive and we should just
>> trust the values not being corrupted.
>
>No, I'm on board with checking the lengths.  I just don't like how
>hard it is to discern what's being checked.
>

Understood.

>>> I think that part of the problem here is that the way this code is
>>> written, "maxaligned" is no such thing.  What you're actually maxaligning
>>> seems to be the offset from the start of the data area of a varlena value,
>
>> I don't think so. The pointers should be maxaligned with respect to the
>> whole varlena value, which is what 'raw' points to.
>
>[ squint ... ]  OK, I think I misread this:
>
>statext_mcv_deserialize(bytea *data)
>{
>...
>    /* pointer to the data part (skip the varlena header) */
>    ptr = VARDATA_ANY(data);
>    raw = (char *) data;
>
>I think this is confusing in itself --- I read it as "raw = (char *) ptr"
>and I think most other people would assume that too based on the order
>of operations.  It'd read better as
>
>    /* remember start of datum for maxalign reference */
>    raw = (char *) data;
>
>    /* pointer to the data part (skip the varlena header) */
>    ptr = VARDATA_ANY(data);
>

Yeah, that'd have been better.

>Another problem with this code is that it flat doesn't work for
>non-4-byte-header varlenas: it'd do the alignment differently than the
>serialization side did.  That's okay given that the two extant call sites
>are guaranteed to pass detoasted datums.  But using VARDATA_ANY gives a
>completely misleading appearance of being ready to deal with short-header
>varlenas, and heaven forbid there should be any comment to discourage
>future coders from trying.  So really what I'd like to see here is
>
>    /* remember start of datum for maxalign reference */
>    raw = (char *) data;
>
>    /* alignment logic assumes full-size datum header */
>    Assert(VARATT_IS_4B(data));
>
>    /* pointer to the data part (skip the varlena header) */
>    ptr = VARDATA_ANY(data);
>
>Or, of course, this could all go away if we got rid of the
>bogus maxaligning...
>

OK. Attached is a patch ditching the alignment in serialized data. I've
ditched the macros to access parts of serialized data, and everything
gets copied.

The main complication is with varlena values, which may or may not have
4B headers (for now there's the PG_DETOAST_DATUM call, but as you
mentioned we may want to remove it in the future). So I've stored the
length as uint32 separately, followed by the full varlena value (thanks
to that the deserialization is simpler). Not sure if that's the best
solution, though, because this way we store the length twice.

I've kept the alignment in the deserialization code, because there it
allows us to allocate the whole value as a single chunk, which I think
is useful (I admit I don't have any measurements to demonstrate that).
But if we decide to rework this later, we can - it's just in-memory
representation, not on-disk.

Is this roughly what you had in mind?

FWIW I'm sure some of the comments are stale and/or need clarification,
but it's a bit too late over here, so I'll look into that tomorrow.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: mcvstats serialization code is still shy of a load

От
Tom Lane
Дата:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> OK. Attached is a patch ditching the alignment in serialized data. I've
> ditched the macros to access parts of serialized data, and everything
> gets copied.

I lack energy to actually read this patch right now, and I don't currently
have an opinion about whether it's worth another catversion bump to fix
this stuff in v12.  But I did test the patch, and I can confirm it gets
through the core regression tests on hppa (both gaur's host environment
with gcc 3.4.6, and the OpenBSD installation with gcc 4.2.1).

            regards, tom lane



Re: mcvstats serialization code is still shy of a load

От
Tomas Vondra
Дата:
On Thu, Jun 27, 2019 at 12:04:30AM -0400, Tom Lane wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> OK. Attached is a patch ditching the alignment in serialized data. I've
>> ditched the macros to access parts of serialized data, and everything
>> gets copied.
>
>I lack energy to actually read this patch right now, and I don't currently
>have an opinion about whether it's worth another catversion bump to fix
>this stuff in v12.  But I did test the patch, and I can confirm it gets
>through the core regression tests on hppa (both gaur's host environment
>with gcc 3.4.6, and the OpenBSD installation with gcc 4.2.1).
>

Thanks for running it through regression tests, that alone is a very
useful piece of information for me.

As for the catversion bump - I'd probably vote to do it. Not just because
of this serialization stuff, but to fix the pg_mcv_list_items function.
It's not something I'm very enthusiastic about (kinda embarassed about it,
really), but it seems better than shipping something that we'll need to
rework in PG13.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: mcvstats serialization code is still shy of a load

От
Tomas Vondra
Дата:
On Thu, Jun 27, 2019 at 01:26:32PM +0200, Tomas Vondra wrote:
>On Thu, Jun 27, 2019 at 12:04:30AM -0400, Tom Lane wrote:
>>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>>>OK. Attached is a patch ditching the alignment in serialized data. I've
>>>ditched the macros to access parts of serialized data, and everything
>>>gets copied.
>>
>>I lack energy to actually read this patch right now, and I don't currently
>>have an opinion about whether it's worth another catversion bump to fix
>>this stuff in v12.  But I did test the patch, and I can confirm it gets
>>through the core regression tests on hppa (both gaur's host environment
>>with gcc 3.4.6, and the OpenBSD installation with gcc 4.2.1).
>>
>
>Thanks for running it through regression tests, that alone is a very
>useful piece of information for me.
>
>As for the catversion bump - I'd probably vote to do it. Not just because
>of this serialization stuff, but to fix the pg_mcv_list_items function.
>It's not something I'm very enthusiastic about (kinda embarassed about it,
>really), but it seems better than shipping something that we'll need to
>rework in PG13.
>

Attached is a slightly improved version of the serialization patch. The
main difference is that when serializing varlena values, the previous
patch version stored

    length (uint32) + full varlena (incl. the header)

which is kinda redundant, because the varlena stores the length too. So
now it only stores the length + data, without the varlena header. I
don't think there's a better way to store varlena values without
enforcing alignment (which is what happens in current master).

There's one additional change I failed to mention before - I had to add
another field to DimensionInfo, tracking how much space will be needed
for deserialized data. This is needed because the deserialization
allocates the whole MCV as a single chunk of memory, to reduce palloc
overhead. It could parse the data twice (first to determine the space,
then to actually parse it), this allows doing just a single pass. Which
seems useful for large MCV lists, but maybe it's not worth it?

Barring objections I'll commit this together with the pg_mcv_list_items
fix, posted in a separate thread. Of course, this requires catversion
bump - an alternative would be to keep enforcing the alignment, but
tweak the macros to work on all platforms without SIGBUS.

Considering how troublesome this serialiation part of the patch turner
out to be, I'm not really sure by anything at this point. So I'd welcome
thoughts about the proposed changes.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: mcvstats serialization code is still shy of a load

От
Tom Lane
Дата:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> Attached is a slightly improved version of the serialization patch.

I reviewed this patch, and tested it on hppa and ppc.  I found one
serious bug: in the deserialization varlena case, you need

-                    dataptr += MAXALIGN(len);
+                    dataptr += MAXALIGN(len + VARHDRSZ);

(approx. line 1100 in mcv.c).  Without this, the output data is corrupt,
plus the Assert a few lines further down about dataptr having been
advanced by the correct amount fires.  (On one machine I tested on,
that happened during the core regression tests.  The other machine
got through regression, but trying to do "select * from pg_stats_ext;"
afterwards exhibited the crash.  I didn't investigate closely, but
I suspect the difference has to do with different MAXALIGN values,
4 and 8 respectively.)

The attached patch (a delta atop your v2) corrects that plus some
cosmetic issues.

If we're going to push this, it would be considerably less complicated
to do so before v12 gets branched --- not long after that, there will be
catversion differences to cope with.  I'm planning to make the branch
tomorrow (Monday), probably ~1500 UTC.  Just sayin'.

            regards, tom lane

diff --git a/src/backend/statistics/mcv.c b/src/backend/statistics/mcv.c
index 256728a..cfe7e54 100644
--- a/src/backend/statistics/mcv.c
+++ b/src/backend/statistics/mcv.c
@@ -40,13 +40,14 @@
  * stored in a separate array (deduplicated, to minimize the size), and
  * so the serialized items only store uint16 indexes into that array.
  *
- * Each serialized item store (in this order):
+ * Each serialized item stores (in this order):
  *
  * - indexes to values      (ndim * sizeof(uint16))
  * - null flags              (ndim * sizeof(bool))
  * - frequency              (sizeof(double))
  * - base_frequency          (sizeof(double))
  *
+ * There is no alignment padding within an MCV item.
  * So in total each MCV item requires this many bytes:
  *
  *     ndim * (sizeof(uint16) + sizeof(bool)) + 2 * sizeof(double)
@@ -61,7 +62,7 @@
     (VARHDRSZ + sizeof(uint32) * 3 + sizeof(AttrNumber))

 /*
- * Size of the serialized MCV list, exluding the space needed for
+ * Size of the serialized MCV list, excluding the space needed for
  * deduplicated per-dimension values. The macro is meant to be used
  * when it's not yet safe to access the serialized info about amount
  * of data for each column.
@@ -619,6 +620,7 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats **stats)
         else if (info[dim].typlen == -1)    /* varlena */
         {
             info[dim].nbytes = 0;
+            info[dim].nbytes_aligned = 0;
             for (i = 0; i < info[dim].nvalues; i++)
             {
                 Size        len;
@@ -646,6 +648,7 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats **stats)
         else if (info[dim].typlen == -2)    /* cstring */
         {
             info[dim].nbytes = 0;
+            info[dim].nbytes_aligned = 0;
             for (i = 0; i < info[dim].nvalues; i++)
             {
                 Size        len;
@@ -743,7 +746,7 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats **stats)
                  * assumes little endian behavior.  store_att_byval does
                  * almost what we need, but it requires properly aligned
                  * buffer - the output buffer does not guarantee that. So we
-                 * simply use a static Datum variable (which guarantees proper
+                 * simply use a local Datum variable (which guarantees proper
                  * alignment), and then copy the value from it.
                  */
                 store_att_byval(&tmp, value, info[dim].typlen);
@@ -759,14 +762,14 @@ statext_mcv_serialize(MCVList *mcvlist, VacAttrStats **stats)
             }
             else if (info[dim].typlen == -1)    /* varlena */
             {
-                uint32        len = VARSIZE_ANY_EXHDR(value);
+                uint32        len = VARSIZE_ANY_EXHDR(DatumGetPointer(value));

                 /* copy the length */
                 memcpy(ptr, &len, sizeof(uint32));
                 ptr += sizeof(uint32);

                 /* data from the varlena value (without the header) */
-                memcpy(ptr, VARDATA(DatumGetPointer(value)), len);
+                memcpy(ptr, VARDATA_ANY(DatumGetPointer(value)), len);
                 ptr += len;
             }
             else if (info[dim].typlen == -2)    /* cstring */
@@ -1100,7 +1103,7 @@ statext_mcv_deserialize(bytea *data)
                     map[dim][i] = PointerGetDatum(dataptr);

                     /* skip to place of the next deserialized value */
-                    dataptr += MAXALIGN(len);
+                    dataptr += MAXALIGN(len + VARHDRSZ);
                 }
             }
             else if (info[dim].typlen == -2)
diff --git a/src/include/statistics/extended_stats_internal.h b/src/include/statistics/extended_stats_internal.h
index 6778746..8fc5419 100644
--- a/src/include/statistics/extended_stats_internal.h
+++ b/src/include/statistics/extended_stats_internal.h
@@ -36,7 +36,7 @@ typedef struct DimensionInfo
 {
     int            nvalues;        /* number of deduplicated values */
     int            nbytes;            /* number of bytes (serialized) */
-    int            nbytes_aligned;    /* deserialized data with alignment */
+    int            nbytes_aligned;    /* size of deserialized data with alignment */
     int            typlen;            /* pg_type.typlen */
     bool        typbyval;        /* pg_type.typbyval */
 } DimensionInfo;

Re: mcvstats serialization code is still shy of a load

От
Tomas Vondra
Дата:
On Sun, Jun 30, 2019 at 08:30:33PM -0400, Tom Lane wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> Attached is a slightly improved version of the serialization patch.
>
>I reviewed this patch, and tested it on hppa and ppc.  I found one
>serious bug: in the deserialization varlena case, you need
>
>-                    dataptr += MAXALIGN(len);
>+                    dataptr += MAXALIGN(len + VARHDRSZ);
>
>(approx. line 1100 in mcv.c).  Without this, the output data is corrupt,
>plus the Assert a few lines further down about dataptr having been
>advanced by the correct amount fires.  (On one machine I tested on,
>that happened during the core regression tests.  The other machine
>got through regression, but trying to do "select * from pg_stats_ext;"
>afterwards exhibited the crash.  I didn't investigate closely, but
>I suspect the difference has to do with different MAXALIGN values,
>4 and 8 respectively.)
>
>The attached patch (a delta atop your v2) corrects that plus some
>cosmetic issues.
>

Thanks.

>If we're going to push this, it would be considerably less complicated
>to do so before v12 gets branched --- not long after that, there will be
>catversion differences to cope with.  I'm planning to make the branch
>tomorrow (Monday), probably ~1500 UTC.  Just sayin'.
>

Unfortunately, I was travelling on Sunday and was quite busy on Monday, so
I've been unable to push this before the branching :-(

I'll push by the end of this week, once I get home.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: mcvstats serialization code is still shy of a load

От
Tomas Vondra
Дата:
On Tue, Jul 02, 2019 at 10:38:29AM +0200, Tomas Vondra wrote:
>On Sun, Jun 30, 2019 at 08:30:33PM -0400, Tom Lane wrote:
>>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>>>Attached is a slightly improved version of the serialization patch.
>>
>>I reviewed this patch, and tested it on hppa and ppc.  I found one
>>serious bug: in the deserialization varlena case, you need
>>
>>-                    dataptr += MAXALIGN(len);
>>+                    dataptr += MAXALIGN(len + VARHDRSZ);
>>
>>(approx. line 1100 in mcv.c).  Without this, the output data is corrupt,
>>plus the Assert a few lines further down about dataptr having been
>>advanced by the correct amount fires.  (On one machine I tested on,
>>that happened during the core regression tests.  The other machine
>>got through regression, but trying to do "select * from pg_stats_ext;"
>>afterwards exhibited the crash.  I didn't investigate closely, but
>>I suspect the difference has to do with different MAXALIGN values,
>>4 and 8 respectively.)
>>
>>The attached patch (a delta atop your v2) corrects that plus some
>>cosmetic issues.
>>
>
>Thanks.
>
>>If we're going to push this, it would be considerably less complicated
>>to do so before v12 gets branched --- not long after that, there will be
>>catversion differences to cope with.  I'm planning to make the branch
>>tomorrow (Monday), probably ~1500 UTC.  Just sayin'.
>>
>
>Unfortunately, I was travelling on Sunday and was quite busy on Monday, so
>I've been unable to push this before the branching :-(
>
>I'll push by the end of this week, once I get home.
>

I've pushed the fix (along with the pg_mcv_list_item fix) into master,
hopefully the buildfarm won't be upset about it.

I was about to push into REL_12_STABLE, when I realized that maybe we
need to do something about the catversion first. REL_12_STABLE is still
on 201906161, while master got to 201907041 thanks to commit
7b925e12703.  Simply cherry-picking the commits would get us to
201907052 in both branches, but that'd be wrong as the catalogs do
differ. I suppose this is what you meant by "catversion differences to
cope with".

I suppose this is not the first time this happened - how did we deal
with it in the past? I guess we could use some "past" non-conflicting
catversion number in the REL_12_STABLE branch (say, 201907030x) but
maybe that'd be wrong?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: mcvstats serialization code is still shy of a load

От
Tom Lane
Дата:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> I was about to push into REL_12_STABLE, when I realized that maybe we
> need to do something about the catversion first. REL_12_STABLE is still
> on 201906161, while master got to 201907041 thanks to commit
> 7b925e12703.  Simply cherry-picking the commits would get us to
> 201907052 in both branches, but that'd be wrong as the catalogs do
> differ. I suppose this is what you meant by "catversion differences to
> cope with".

Yeah, exactly.

My recommendation is to use 201907051 on v12 and 201907052
on master (or whatever is $today for you).  They need to be
different now that the branches' catalog histories have diverged,
and it seems to me that the back branch should be "older".

            regards, tom lane



Re: mcvstats serialization code is still shy of a load

От
Tomas Vondra
Дата:


On Fri, Jul 5, 2019, 03:28 Tom Lane <tgl@sss.pgh.pa.us> wrote:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> I was about to push into REL_12_STABLE, when I realized that maybe we
> need to do something about the catversion first. REL_12_STABLE is still
> on 201906161, while master got to 201907041 thanks to commit
> 7b925e12703.  Simply cherry-picking the commits would get us to
> 201907052 in both branches, but that'd be wrong as the catalogs do
> differ. I suppose this is what you meant by "catversion differences to
> cope with".

Yeah, exactly.

My recommendation is to use 201907051 on v12 and 201907052
on master (or whatever is $today for you).  They need to be
different now that the branches' catalog histories have diverged,
and it seems to me that the back branch should be "older".

                        regards, tom lane

Unfortunately, master is already using both 201907051 and 201907052 (two of the patches I pushed touched the catalog), so we can't quite do exactly that. We need to use 201907042 and 201907043 or something preceding 201907041 (which is the extra catversion on master).

At this point there's no perfect sequence, thanks to the extra commit on master, so REL_12_STABLE can't be exactly "older" :-(

Barring objections, I'll go ahead with 201907042+201907043 later today, before someone pushes another catversion-bumping patch.

regards

Re: mcvstats serialization code is still shy of a load

От
Tomas Vondra
Дата:
On Fri, Jul 05, 2019 at 10:36:59AM +0200, Tomas Vondra wrote:
>On Fri, Jul 5, 2019, 03:28 Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> > I was about to push into REL_12_STABLE, when I realized that maybe we
>> > need to do something about the catversion first. REL_12_STABLE is still
>> > on 201906161, while master got to 201907041 thanks to commit
>> > 7b925e12703.  Simply cherry-picking the commits would get us to
>> > 201907052 in both branches, but that'd be wrong as the catalogs do
>> > differ. I suppose this is what you meant by "catversion differences to
>> > cope with".
>>
>> Yeah, exactly.
>>
>> My recommendation is to use 201907051 on v12 and 201907052
>> on master (or whatever is $today for you).  They need to be
>> different now that the branches' catalog histories have diverged,
>> and it seems to me that the back branch should be "older".
>>
>>                         regards, tom lane
>>
>
>Unfortunately, master is already using both 201907051 and 201907052 (two of
>the patches I pushed touched the catalog), so we can't quite do exactly
>that. We need to use 201907042 and 201907043 or something preceding 201907041
>(which is the extra catversion on master).
>
>At this point there's no perfect sequence, thanks to the extra commit on
>master, so REL_12_STABLE can't be exactly "older" :-(
>
>Barring objections, I'll go ahead with 201907042+201907043 later today,
>before someone pushes another catversion-bumping patch.
>

I've pushed the REL_12_STABLE backpatches too, now. I've ended up using
201907031 and 201907032 - those values precede the first catversion bump
in master (201907041), so the back branch looks "older". And there's a
bit of slack for additional bumps (if the unlikely case we need them).

We might have "fixed" this by backpatching the commit with the extra
catversion bump (7b925e12) but the commit seems a bit too large for
that. It's fairly isolated though. But it seems like a bad practice.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: mcvstats serialization code is still shy of a load

От
Tom Lane
Дата:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> I've pushed the REL_12_STABLE backpatches too, now. I've ended up using
> 201907031 and 201907032 - those values precede the first catversion bump
> in master (201907041), so the back branch looks "older". And there's a
> bit of slack for additional bumps (if the unlikely case we need them).

FWIW, I don't think there's a need for every catversion on the back branch
to look older than any catversion on HEAD.  The requirement so far as the
core code is concerned is only for non-equality.  Now, extension code does
often do something like "if catversion >= xxx", but in practice they're
only concerned about numbers used by released versions.  HEAD's catversion
will be strictly greater than v12's soon enough, even if you had made it
not so today.  So I think sticking to today's-date-with-some-N is better
than artificially assigning other dates.

What's done is done, and there's no need to change it, but now you
know what to do next time.

> We might have "fixed" this by backpatching the commit with the extra
> catversion bump (7b925e12) but the commit seems a bit too large for
> that. It's fairly isolated though. But it seems like a bad practice.

Yeah, that approach flies in the face of the notion of feature freeze.

            regards, tom lane



Re: mcvstats serialization code is still shy of a load

От
Alvaro Herrera
Дата:
On 2019-Jul-05, Tom Lane wrote:

> FWIW, I don't think there's a need for every catversion on the back branch
> to look older than any catversion on HEAD.  The requirement so far as the
> core code is concerned is only for non-equality.  Now, extension code does
> often do something like "if catversion >= xxx", but in practice they're
> only concerned about numbers used by released versions.

pg_upgrade also uses >= catversion comparison for a couple of things.  I
don't think it affects this case, but it's worth keeping in mind.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services