Re: [proposal] de-TOAST'ing using a iterator

Поиск
Список
Период
Сортировка
От Alvaro Herrera
Тема Re: [proposal] de-TOAST'ing using a iterator
Дата
Msg-id 20190903201226.GA16197@alvherre.pgsql
обсуждение исходный текст
Ответ на Re: [proposal] de-TOAST'ing using a iterator  (Binguo Bao <djydewang@gmail.com>)
Ответы Re: [proposal] de-TOAST'ing using a iterator  (Alvaro Herrera from 2ndQuadrant <alvherre@alvh.no-ip.org>)
Re: [proposal] de-TOAST'ing using a iterator  (Binguo Bao <djydewang@gmail.com>)
Список pgsql-hackers
> +static void
> +init_toast_buffer(ToastBuffer *buf, int32 size, bool compressed)
> +{
> +    buf->buf = (const char *) palloc0(size);

This API is weird -- you always palloc the ToastBuffer first, then call
init_toast_bufer on it.  Why not palloc the ToastBuffer struct in
init_toast_buffer and return it from there instead?  This is
particularly strange since the ToastBuffer itself is freed by the "free"
routine ... so it's not like we're thinking that caller can take
ownership of the struct by embedding it in a larger struct.

Also, this function needs a comment on top explaining what it does and
what the params are.

Why do we need ToastBuffer->buf_size?  Seems unused.

> +    if (iter == NULL)
> +    {
> +        return;
> +    }

Please, no braces around single-statement blocks.  (Many places).

> +/*
> + * If "ctrlc" field in iterator is equal to INVALID_CTRLC, it means that
> + * the field is invalid and need to read the control byte from the
> + * source buffer in the next iteration, see pglz_decompress_iterate().
> + */
> +#define INVALID_CTRLC 8

What does CTRLC stand for?  Also: this comment should explain why the
value 8 is what it is.

> +                /*
> +                 * Now we copy the bytes specified by the tag from OUTPUT to
> +                 * OUTPUT. It is dangerous and platform dependent to use
> +                 * memcpy() here, because the copied areas could overlap
> +                 * extremely!
> +                 */
> +                len = Min(len, destend - dp);
> +                while (len--)
> +                {
> +                    *dp = dp[-off];
> +                    dp++;
> +                }

So why not use memmove?

> +                /*
> +                 * Otherwise it contains the match length minus 3 and the
> +                 * upper 4 bits of the offset. The next following byte
> +                 * contains the lower 8 bits of the offset. If the length is
> +                 * coded as 18, another extension tag byte tells how much
> +                 * longer the match really was (0-255).
> +                 */
> +                int32        len;
> +                int32        off;
> +
> +                len = (sp[0] & 0x0f) + 3;
> +                off = ((sp[0] & 0xf0) << 4) | sp[1];
> +                sp += 2;
> +                if (len == 18)
> +                    len += *sp++;

Starting this para with "Otherwise" makes no sense, since there's no
previous opposite case.  Please reword.  However, I don't recognize this
code from anywhere, and it seems to have a lot of magical numbers.  Is
this code completely new?


Didn't much like FetchDatumIteratorData SnapshotToast struct member
name.  How about just "snapshot"?

> +#define PG_DETOAST_ITERATE(iter, need)                                    \
> +    do {                                                                \
> +        Assert(need >= iter->buf->buf && need <= iter->buf->capacity);    \
> +        while (!iter->done && need >= iter->buf->limit) {                 \
> +            detoast_iterate(iter);                                        \
> +        }                                                                \
> +    } while (0)

This needs parens around each "iter" and "need" in the macro definition.
Also, please add a comment documenting what the arguments are, since
it's not immediately obvious.

> +void free_detoast_iterator(DetoastIterator iter)
> +{
> +    if (iter == NULL)
> +    {
> +        return;
> +    }

If this function is going to do this, why do callers need to check for
NULL also?  Seems pointless.  I'd rather make callers simpler and keep
only the NULL-check inside the function, since this is not perf-critical
anyway.

> +extern void detoast_iterate(DetoastIterator detoast_iter)
> +{

Please, no "extern" in function definitions, only in prototypes in the
.h files.  Also, we indent the function name at the start of line, with
the return type appearing on its own in the previous line.

> +    if (!VARATT_IS_EXTERNAL_ONDISK(attr))
> +        elog(ERROR, "create_fetch_datum_itearator shouldn't be called for non-ondisk datums");

Typo for "iterator".

> +        iter->fetch_datum_iterator = create_fetch_datum_iterator(attr);
> +        VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
> +        if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
> +        {
> [...]
> +        }
> +        else
> +        {
> +            iter->compressed = false;
> +
> +            /* point the buffer directly at the raw data */
> +            iter->buf = iter->fetch_datum_iterator->buf;
> +        }

This arrangement where there are two ToastBuffers and they sometimes are
the same is cute, but I think we need a better way to know when each
needs to be freed afterwards; the proposed coding is confusing.  And it
certainly it needs more than zero comments about what's going on there.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: Proposal: roll pg_stat_statements into core
Следующее
От: Pierre Ducroquet
Дата:
Сообщение: Re: [Patch] Add a reset_computed_values function in pg_stat_statements