Re: Batch insert in CTAS/MatView code

Поиск
Список
Период
Сортировка
От Paul Guo
Тема Re: Batch insert in CTAS/MatView code
Дата
Msg-id CAEET0ZH2o95dcRkB26a-hizFP4CXTmcLkPekxRD1kCno8nvYMg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Batch insert in CTAS/MatView code  (Asim R P <apraveen@pivotal.io>)
Список pgsql-hackers

Asim Thanks for the review.

On Wed, Sep 25, 2019 at 6:39 PM Asim R P <apraveen@pivotal.io> wrote:



On Mon, Sep 9, 2019 at 4:02 PM Paul Guo <pguo@pivotal.io> wrote:
>
> So in theory
> we should not worry about additional tuple copy overhead now, and then I tried the patch without setting
> multi-insert threshold as attached.
>

I reviewed your patch today.  It looks good overall.  My concern is that the ExecFetchSlotHeapTuple call does not seem appropriate.  In a generic place such as createas.c, we should be using generic tableam API only.  However, I can also see that there is no better alternative.  We need to compute the size of accumulated tuples so far, in order to decide whether to stop accumulating tuples.  There is no convenient way to obtain the length of the tuple, given a slot.  How about making that decision solely based on number of tuples, so that we can avoid ExecFetchSlotHeapTuple call altogether?

For heapam, ExecFetchSlotHeapTuple() will be called again in heap_multi_insert() to prepare the final multi-insert. if we check ExecFetchSlotHeapTuple(), we could find that calling it multiple time just involves very very few overhead for the BufferHeapTuple case. Note for virtual tuple case the 2nd ExecFetchSlotHeapTuple() call still copies slot contents, but we've called ExecCopySlot(batchslot, slot); to copy to a BufferHeap case so no worries for the virtual tuple case (as a source). 

Previously (long ago) I probably understood the code incorrectly so had the concern also. I used sampling to do that (for variable-length tuple), but now apparently we do not need that.

The multi insert copy code deals with index tuples also, which I don't see in the patch.  Don't we need to consider populating indexes?

create table as/create mat view DDL does not involve index creation for the table/matview. The code seems to be able to used in RefreshMatView also, for that we need to consider if we use multi-insert in that code.
 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: range test for hash index?
Следующее
От: Paul Guo
Дата:
Сообщение: Re: Batch insert in CTAS/MatView code