Re: [proposal] de-TOAST'ing using a iterator

Поиск
Список
Период
Сортировка
От Binguo Bao
Тема Re: [proposal] de-TOAST'ing using a iterator
Дата
Msg-id CAL-OGkv1NnAQef0aJaXtVoJLdN7M7REZcOgudeiKKydCfcqZug@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [proposal] de-TOAST'ing using a iterator  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Ответы Re: [proposal] de-TOAST'ing using a iterator  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Список pgsql-hackers

Alvaro Herrera <alvherre@2ndquadrant.com> 于2019年9月4日周三 上午4:12写道:
> +static void
> +init_toast_buffer(ToastBuffer *buf, int32 size, bool compressed)
> +{
> +     buf->buf = (const char *) palloc0(size);

This API is weird -- you always palloc the ToastBuffer first, then call
init_toast_bufer on it.  Why not palloc the ToastBuffer struct in
init_toast_buffer and return it from there instead?  This is
particularly strange since the ToastBuffer itself is freed by the "free"
routine ... so it's not like we're thinking that caller can take
ownership of the struct by embedding it in a larger struct.

I agree with you. I also change "init_detoast_iterator" to "create_detoast_iterator"
so the caller doesn't need to manage the memory allocation of the iterator
 
Also, this function needs a comment on top explaining what it does and
what the params are.

Done.
 
Why do we need ToastBuffer->buf_size?  Seems unused.

> +     if (iter == NULL)
> +     {
> +             return;
> +     }
 
Removed.
 
Please, no braces around single-statement blocks.  (Many places).

Done.
 
> +/*
> + * If "ctrlc" field in iterator is equal to INVALID_CTRLC, it means that
> + * the field is invalid and need to read the control byte from the
> + * source buffer in the next iteration, see pglz_decompress_iterate().
> + */
> +#define INVALID_CTRLC 8

What does CTRLC stand for?  Also: this comment should explain why the
value 8 is what it is.

I've improved the comment.
 

> +                             /*
> +                              * Now we copy the bytes specified by the tag from OUTPUT to
> +                              * OUTPUT. It is dangerous and platform dependent to use
> +                              * memcpy() here, because the copied areas could overlap
> +                              * extremely!
> +                              */
> +                             len = Min(len, destend - dp);
> +                             while (len--)
> +                             {
> +                                     *dp = dp[-off];
> +                                     dp++;
> +                             }

So why not use memmove?

> +                             /*
> +                              * Otherwise it contains the match length minus 3 and the
> +                              * upper 4 bits of the offset. The next following byte
> +                              * contains the lower 8 bits of the offset. If the length is
> +                              * coded as 18, another extension tag byte tells how much
> +                              * longer the match really was (0-255).
> +                              */
> +                             int32           len;
> +                             int32           off;
> +
> +                             len = (sp[0] & 0x0f) + 3;
> +                             off = ((sp[0] & 0xf0) << 4) | sp[1];
> +                             sp += 2;
> +                             if (len == 18)
> +                                     len += *sp++;

Starting this para with "Otherwise" makes no sense, since there's no
previous opposite case.  Please reword.  However, I don't recognize this
code from anywhere, and it seems to have a lot of magical numbers.  Is
this code completely new?

This function is based on pglz_decompress() in src/common/pg_lzcompress.c and I've
mentioned that in the function's comment at the beginning.
 
Didn't much like FetchDatumIteratorData SnapshotToast struct member
name.  How about just "snapshot"?
 
Done. 

> +#define PG_DETOAST_ITERATE(iter, need)                                                                       \
> +     do {                                                                                                                            \
> +             Assert(need >= iter->buf->buf && need <= iter->buf->capacity);  \
> +             while (!iter->done && need >= iter->buf->limit) {                               \
> +                     detoast_iterate(iter);                                                                          \
> +             }                                                                                                                               \
> +     } while (0)

This needs parens around each "iter" and "need" in the macro definition.
Also, please add a comment documenting what the arguments are, since
it's not immediately obvious.
 
Parens makes the macro more reliable. Done. 

> +void free_detoast_iterator(DetoastIterator iter)
> +{
> +     if (iter == NULL)
> +     {
> +             return;
> +     }

If this function is going to do this, why do callers need to check for
NULL also?  Seems pointless.  I'd rather make callers simpler and keep
only the NULL-check inside the function, since this is not perf-critical
anyway.
 
Good catch. Done.

 > +             iter->fetch_datum_iterator = create_fetch_datum_iterator(attr);
> +             VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
> +             if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer))
> +             {
> [...]
> +             }
> +             else
> +             {
> +                     iter->compressed = false;
> +
> +                     /* point the buffer directly at the raw data */
> +                     iter->buf = iter->fetch_datum_iterator->buf;
> +             }

This arrangement where there are two ToastBuffers and they sometimes are
the same is cute, but I think we need a better way to know when each
needs to be freed afterwards; 

We only need to check the "compressed" field in the iterator to figure out
which buffer should be freed.

--
Best regards,
Binguo Bao
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: Specifying attribute slot for storing/reading statistics
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: accounting for memory used for BufFile during hash joins