Re: mcvstats serialization code is still shy of a load

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: mcvstats serialization code is still shy of a load
Дата
Msg-id 9201.1561566673@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: mcvstats serialization code is still shy of a load  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: mcvstats serialization code is still shy of a load  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On Wed, Jun 26, 2019 at 11:26:21AM -0400, Tom Lane wrote:
>> #define SizeOfMCVList(ndims,nitems)    \
>> is both woefully underdocumented and completely at variance with
>> reality.  It doesn't seem to be accounting for the actual data values.

> I agree about the macro being underdocumented, but AFAICS it's used
> correctly to check the expected length. It can't include the data values
> directly, because that's variable amount of data - and it's encoded in not
> yet verified part of the data.

Well, it should have some other name then.  Or *at least* a comment.
It's unbelievably misleading as it stands.

> That being said, maybe this is unnecessarily defensive and we should just
> trust the values not being corrupted.

No, I'm on board with checking the lengths.  I just don't like how
hard it is to discern what's being checked.

>> I think that part of the problem here is that the way this code is
>> written, "maxaligned" is no such thing.  What you're actually maxaligning
>> seems to be the offset from the start of the data area of a varlena value,

> I don't think so. The pointers should be maxaligned with respect to the
> whole varlena value, which is what 'raw' points to.

[ squint ... ]  OK, I think I misread this:

statext_mcv_deserialize(bytea *data)
{
...
    /* pointer to the data part (skip the varlena header) */
    ptr = VARDATA_ANY(data);
    raw = (char *) data;

I think this is confusing in itself --- I read it as "raw = (char *) ptr"
and I think most other people would assume that too based on the order
of operations.  It'd read better as

    /* remember start of datum for maxalign reference */
    raw = (char *) data;

    /* pointer to the data part (skip the varlena header) */
    ptr = VARDATA_ANY(data);

Another problem with this code is that it flat doesn't work for
non-4-byte-header varlenas: it'd do the alignment differently than the
serialization side did.  That's okay given that the two extant call sites
are guaranteed to pass detoasted datums.  But using VARDATA_ANY gives a
completely misleading appearance of being ready to deal with short-header
varlenas, and heaven forbid there should be any comment to discourage
future coders from trying.  So really what I'd like to see here is

    /* remember start of datum for maxalign reference */
    raw = (char *) data;

    /* alignment logic assumes full-size datum header */
    Assert(VARATT_IS_4B(data));

    /* pointer to the data part (skip the varlena header) */
    ptr = VARDATA_ANY(data);

Or, of course, this could all go away if we got rid of the
bogus maxaligning...

            regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: mcvstats serialization code is still shy of a load
Следующее
От: Tom Lane
Дата:
Сообщение: Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)