Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On Wed, Jun 26, 2019 at 11:26:21AM -0400, Tom Lane wrote:
>> #define SizeOfMCVList(ndims,nitems) \
>> is both woefully underdocumented and completely at variance with
>> reality. It doesn't seem to be accounting for the actual data values.
> I agree about the macro being underdocumented, but AFAICS it's used
> correctly to check the expected length. It can't include the data values
> directly, because that's variable amount of data - and it's encoded in not
> yet verified part of the data.
Well, it should have some other name then. Or *at least* a comment.
It's unbelievably misleading as it stands.
> That being said, maybe this is unnecessarily defensive and we should just
> trust the values not being corrupted.
No, I'm on board with checking the lengths. I just don't like how
hard it is to discern what's being checked.
>> I think that part of the problem here is that the way this code is
>> written, "maxaligned" is no such thing. What you're actually maxaligning
>> seems to be the offset from the start of the data area of a varlena value,
> I don't think so. The pointers should be maxaligned with respect to the
> whole varlena value, which is what 'raw' points to.
[ squint ... ] OK, I think I misread this:
statext_mcv_deserialize(bytea *data)
{
...
/* pointer to the data part (skip the varlena header) */
ptr = VARDATA_ANY(data);
raw = (char *) data;
I think this is confusing in itself --- I read it as "raw = (char *) ptr"
and I think most other people would assume that too based on the order
of operations. It'd read better as
/* remember start of datum for maxalign reference */
raw = (char *) data;
/* pointer to the data part (skip the varlena header) */
ptr = VARDATA_ANY(data);
Another problem with this code is that it flat doesn't work for
non-4-byte-header varlenas: it'd do the alignment differently than the
serialization side did. That's okay given that the two extant call sites
are guaranteed to pass detoasted datums. But using VARDATA_ANY gives a
completely misleading appearance of being ready to deal with short-header
varlenas, and heaven forbid there should be any comment to discourage
future coders from trying. So really what I'd like to see here is
/* remember start of datum for maxalign reference */
raw = (char *) data;
/* alignment logic assumes full-size datum header */
Assert(VARATT_IS_4B(data));
/* pointer to the data part (skip the varlena header) */
ptr = VARDATA_ANY(data);
Or, of course, this could all go away if we got rid of the
bogus maxaligning...
regards, tom lane