Re: [proposal] de-TOAST'ing using a iterator

Поиск
Список
Период
Сортировка
От Binguo Bao
Тема Re: [proposal] de-TOAST'ing using a iterator
Дата
Msg-id CAL-OGkstcrQ8bz_+Dt10Qc2SXeoNqpj-GvPmArmjRm4s8-Agrw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [proposal] de-TOAST'ing using a iterator  (John Naylor <john.naylor@2ndquadrant.com>)
Ответы Re: [proposal] de-TOAST'ing using a iterator  (John Naylor <john.naylor@2ndquadrant.com>)
Список pgsql-hackers


John Naylor <john.naylor@2ndquadrant.com> 于2019年8月2日周五 下午3:12写道:
On Tue, Jul 30, 2019 at 8:20 PM Binguo Bao <djydewang@gmail.com> wrote:
>
> John Naylor <john.naylor@2ndquadrant.com> 于2019年7月29日周一 上午11:49写道:
>>
>> 1). For every needle comparison, text_position_next_internal()
>> calculates how much of the value is needed and passes that to
>> detoast_iterate(), which then calculates if it has to do something or
>> not. This is a bit hard to follow. There might also be a performance
>> penalty -- the following is just a theory, but it sounds plausible:
>> The CPU can probably correctly predict that detoast_iterate() will
>> usually return the same value it did last time, but it still has to
>> call the function and make sure, which I imagine is more expensive
>> than advancing the needle. Ideally, we want to call the iterator only
>> if we have to.
>>
>> In the attached patch (applies on top of your v5),
>> text_position_next_internal() simply compares hptr to the detoast
>> buffer limit, and calls detoast_iterate() until it can proceed. I
>> think this is clearer.
>
>
> Yes, I think this is a general scenario where the caller continually
> calls detoast_iterate until gets enough data, so I think such operations can
> be extracted as a macro, as I did in patch v6. In the macro, the detoast_iterate
> function is called only when the data requested by the caller is greater than the
> buffer limit.

I like the use of a macro here. However, I think we can find a better
location for the definition. See the header comment of fmgr.h:
"Definitions for the Postgres function manager and function-call
interface." Maybe tuptoaster.h is as good a place as any?

PG_DETOAST_ITERATE isn't a sample function-call interface,
But I notice that PG_FREE_IF_COPY is also defined in fmgr.h, whose logic is
similar to PG_DETOAST_ITERATE, make condition check first and then
decide whether to call the function. Besides, PG_DETOAST_DATUM, 
PG_DETOAST_DATUM_COPY, PG_DETOAST_DATUM_SLICE, 
PG_DETOAST_DATUM_PACKED are all defined in fmgr.h, it is reasonable
to put all the de-TOAST interface together.

>> 2). detoast_iterate() and fetch_datum_iterate() return a value but we
>> don't check it or do anything with it. Should we do something with it?
>> It's also not yet clear if we should check the iterator state instead
>> of return values. I've added some XXX comments as a reminder. We
>> should also check the return value of pglz_decompress_iterate().
>
>
> IMO, we need to provide users with a simple iterative interface.
> Using the required data pointer to compare with the buffer limit is an easy way.
> And the application scenarios of the iterator are mostly read operations.
> So I think there is no need to return a value, and the iterator needs to throw an
> exception for some wrong calls, such as all the data have been iterated,
> but the user still calls the iterator.

Okay, and see these functions now return void. The orignal
pglz_decompress() returned a value that was check against corruption.
Is there a similar check we can do for the iterative version?

As far as I know, we can just do such check after all compressed data is decompressed.
If we are slicing, we can't do the check.
 

>> 3). Speaking of pglz_decompress_iterate(), I diff'd it with
>> pglz_decompress(), and I have some questions on it:
>>
>> a).
>> + srcend = (const unsigned char *) (source->limit == source->capacity
>> ? source->limit : (source->limit - 4));
>>
>> What does the 4 here mean in this expression?
>
>
> Since we fetch chunks one by one, if we make srcend equals to the source buffer limit,
> In the while loop "while (sp < srcend && dp < destend)", sp may exceed the source buffer limit and read unallocated bytes.

Why is this? That tells me the limit is incorrect. Can the setter not
determine the right value?

There are three statments change `sp` value in the while loop `while (sp < srcend && dp < destend)`:
`ctrl = *sp++;`
`off = ((sp[0]) & 0xf0) << 4) | sp[1]; sp += 2;`
`len += *sp++`
Although we make sure `sp` is less than `srcend` when enter while loop, `sp` is likely to 
go beyond the `srcend` in the loop, and we should ensure that `sp` is always smaller than `buf->limit` to avoid
reading unallocated data. So, `srcend` can't be initialized to `buf->limit`. Only one case is exceptional,
we've fetched all data chunks and 'buf->limit' reaches 'buf->capacity', it's imposisble to read unallocated
data via `sp`.

> Giving a four-byte buffer can prevent sp from exceeding the source buffer limit.

Why 4? That's a magic number. Why not 2, or 27?

As I explained above, `sp` may go beyond the `srcend`in the loop, up to the `srcend + 2`.
In theory, it's ok to set the buffer size to be greater than or equal 2. 
 
> If we have read all the chunks, we don't need to be careful to cross the border,
> just make srcend equal to source buffer limit. I've added comments to explain it in patch v6.

That's a good thing to comment on, but it doesn't explain why.

Yes, the current comment is puzzling. I'll improve it.
 
This
logic seems like a band-aid and I think a committer would want this to
be handled in a more principled way.

I don't want to change pglz_decompress logic too much, the iterator should
pay more attention to saving and restoring the original pglz_decompress state.
 
>> Is it possible it's
>> compensating for this bit in init_toast_buffer()?
>>
>> + buf->limit = VARDATA(buf->buf);
>>
>> It seems the initial limit should also depend on whether the datum is
>> compressed, right? Can we just do this:
>>
>> + buf->limit = buf->position;
>
>
> I'm afraid not. buf->position points to the data portion of the buffer, but the beginning of
> the chunks we read may contain header information. For example, for compressed data chunks,
> the first four bytes record the size of raw data, this means that limit is four bytes ahead of position.
> This initialization doesn't cause errors, although the position is less than the limit in other cases.
> Because we always fetch chunks first, then decompress it.

I see what you mean now. This could use a comment or two to explain
the stated constraints may not actually be satisfied at
initialization.

Done.
 
>> b).
>> - while (sp < srcend && dp < destend)
>> ...
>> + while (sp + 1 < srcend && dp < destend &&
>> ...
>>
>> Why is it here "sp + 1"?
>
>
> Ignore it, I set the inactive state of detoast_iter->ctrl to 8 in patch v6 to
> achieve the purpose of parsing ctrl correctly every time.

Please explain further. Was the "sp + 1" correct behavior (and why),
or only for debugging setting ctrl/c correctly?

In patch v5, If the condition is `sp < srcend`, suppose `sp = srcend - 1` before
entering the loop `while (sp < srcend && dp < destend)`, when entering the loop
and read a control byte(sp equals to `srcend` now), the program can't enter the
loop `for (; ctrlc < 8 && sp < srcend && dp < destend; ctrlc++)`, then set `iter->ctrlc` to 0, 
exit the first loop and then this iteration is over. At the next iteration, 
the control byte will be reread since `iter->ctrlc` equals to 0, but the previous control byte
is not used. Changing the condition to `sp + 1 < srcend` avoid only one control byte is read
then the iterator is over.
 
Also, I don't think
the new logic for the ctrl/c variables is an improvement:

1. iter->ctrlc is intialized with '8' (even in the uncompressed case,
which is confusing). Any time you initialize with something not 0 or
1, it's a magic number, and here it's far from where the loop variable
is used. This is harder to read.

`iter->ctrlc` is used to record the value of `ctrl` in pglz_decompress at the end of
the last iteration(or loop). In the pglz_decompress, `ctrlc`’s valid value is 0~7,
When `ctrlc` reaches 8,  a control byte is read from the source
buffer to `ctrl` then set `ctrlc` to 0. And a control bytes should be read from the
source buffer to `ctrlc` on the first iteration. So `iter->ctrlc` should be intialized with '8'.
 
2. First time though the loop, iter->ctrlc = 8, which immediately gets
set back to 0.

As I explained above, `iter->ctrlc = 8` make a control byte be read
from the source buffer to `ctrl` on the first iteration. Besides, `iter->ctrlc = 8`
indicates that the valid value of `ctrlc` at the end of the last iteration was not
recorded, Obviously, there are no other iterations before the first iteration.
 
3. At the end of the loop, iter->ctrl/c are unconditionally set. In
v5, there was a condition which would usually avoid this copying of
values through pointers.

Patch v6 just records the value of `ctrlc` at the end of each iteration(or loop)
whether it is valid (0~7) or 8, and initializes `ctrlc` on the next iteration(or loop) correctly.
I think it is more concise in patch v6.
 

>> 4. Note that varlena.c has a static state variable, and a cleanup
>> function that currently does:
>>
>> static void
>> text_position_cleanup(TextPositionState *state)
>> {
>> /* no cleanup needed */
>> }
>>
>> It seems to be the detoast iterator could be embedded in this state
>> variable, and then free-ing can happen here. That has a possible
>> advantage that the iterator struct would be on the same cache line as
>> the state data. That would also remove the need to pass "iter" as a
>> parameter, since these functions already pass "state". I'm not sure if
>> this would be good for other users of the iterator, so maybe we can
>> hold off on that for now.
>
>
> Good idea. I've implemented it in patch v6.

That's better, and I think we can take it a little bit farther.

1. Notice that TextPositionState is allocated on the stack in
text_position(), which passes both the "state" pointer and the "iter"
pointer to text_position_setup(), and only then sets state->iter =
iter. We can easily set this inside text_position(). That would get
rid of the need for other callers to pass NULL iter to
text_position_setup().

Done.
 
2. DetoastIteratorData is fixed size, so I see no reason to allocate
it on the heap. We could allocate it on the stack in text_pos(), and
pass the pointer to create_detoast_iterator() (in this case maybe a
better name is init_detoast_iterator), which would return a bool to
tell text_pos() whether to pass down the pointer or a NULL. The
allocation of other structs (toast buffer and fetch iterator) probably
can't be changed without more work.

Done

If there is anything else that is not explained clearly, please point it out.

--
Best regards,
Binguo Bao
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Fetter
Дата:
Сообщение: Re: [Patch] Adding CORRESPONDING/CORRESPONDING BY to set operation
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Fix XML handling with DOCTYPE