Re: [POC] Fast COPY FROM command for the table with foreign partitions

Поиск
Список
Период
Сортировка
От Etsuro Fujita
Тема Re: [POC] Fast COPY FROM command for the table with foreign partitions
Дата
Msg-id CAPmGK16_dL+JiyqFYseLPWkfox5CTQ0+QRyxf7BvokB0HJx6xw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [POC] Fast COPY FROM command for the table with foreignpartitions  (Andrey Lepikhov <a.lepikhov@postgrespro.ru>)
Ответы Re: [POC] Fast COPY FROM command for the table with foreignpartitions  (Andrey Lepikhov <a.lepikhov@postgrespro.ru>)
Список pgsql-hackers
On Tue, Jun 2, 2020 at 2:51 PM Andrey Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
> 02.06.2020 05:02, Etsuro Fujita пишет:
> > I think I also thought something similar to this before [1].  Will take a look.

I'm still reviewing the patch, but let me comment on it.

> > [1] https://www.postgresql.org/message-id/23990375-45a6-5823-b0aa-a6a7a6a957f0%40lab.ntt.co.jp

> I have looked into the thread.
> My first version of the patch was like your idea. But when developing
> the “COPY FROM” code, the following features were discovered:
> 1. Two or more partitions can be placed at the same node. We need to
> finish COPY into one partition before start COPY into another partition
> at the same node.
> 2. On any error we need to send EOF to all started "COPY .. FROM STDIN"
> operations. Otherwise FDW can't cancel operation.
>
> Hiding the COPY code under the buffers management machinery allows us to
> generalize buffers machinery, execute one COPY operation on each buffer
> and simplify error handling.

I'm not sure that it's really a good idea that the bulk-insert API is
designed the way it's tightly coupled with the bulk-insert machinery
in the core, because 1) some FDWs might want to send tuples provided
by the core to the remote, one by one, without storing them in a
buffer, or 2) some other FDWs might want to store the tuples in the
buffer and send them in a lump as postgres_fdw in the proposed patch
but might want to do so independently of MAX_BUFFERED_TUPLES and/or
MAX_BUFFERED_BYTES defined in the bulk-insert machinery.

I agree that we would need special handling for cases you mentioned
above if we design this API based on something like the idea I
proposed in that thread.

> As i understand, main idea of the thread, mentioned by you, is to add
> "COPY FROM" support without changes in FDW API.

I don't think so; I think we should introduce new API for this feature
to keep the ExecForeignInsert() API simple.

> All that I can offer in this place now is to introduce one new
> ExecForeignBulkInsert(buf) routine that will execute single "COPY FROM
> STDIN" operation, send tuples and close the operation. We can use the
> ExecForeignInsert() routine for each buffer tuple if
> ExecForeignBulkInsert() is not supported.

Agreed.

> One of main questions here is to use COPY TO machinery for serializing a
> tuple. It is needed (if you will take a look into the patch) to
> transform the CopyTo() routine to an iterative representation:
> start/next/finish. May it be acceptable?

+1 for the general idea.

> In the attachment there is a patch with the correction of a stupid error.

Thanks for the patch!

Sorry for the delay.

Best regards,
Etsuro Fujita



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alexey Kondratov
Дата:
Сообщение: Re: [PATCH] Allow to specify restart_lsn inpg_create_physical_replication_slot()
Следующее
От: Tom Lane
Дата:
Сообщение: Re: More tzdb fun: POSIXRULES is being deprecated upstream