RE: POC: postgres_fdw insert batching

Поиск
Список
Период
Сортировка
От tsunakawa.takay@fujitsu.com
Тема RE: POC: postgres_fdw insert batching
Дата
Msg-id OSBPR01MB2982039EA967F0304CC6A3ECFE0B0@OSBPR01MB2982.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: POC: postgres_fdw insert batching  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: POC: postgres_fdw insert batching  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: POC: postgres_fdw insert batching  (Craig Ringer <craig.ringer@enterprisedb.com>)
Список pgsql-hackers
Hello Tomas san,


Thank you for picking up this.  I'm interested in this topic, too.  (As an aside, we'd like to submit a bulk insert
patchfor ECPG in the near future.) 

As others referred, Andrey-san's fast COPY to foreign partitions is also promising.  But I think your bulk INSERT is a
separatefeature and offers COPY cannot do -- data transformation during loading with INSERT SELECT and CREATE TABLE AS
SELECT.

Is there anything that makes you worry and stops development?  Could I give it a try to implement this (I'm not sure I
can,sorry.  I'm worried if we can change the executor's call chain easily.) 


> 1) Extend the FDW API?

Yes, I think, because FDWs for other DBMSs will benefit from this.  (But it's questionable whether we want users to
transferdata in Postgres database to other DBMSs...) 

MySQL and SQL Server has the same bulk insert syntax as Postgres, i.e., INSERT INTO table VALUES(record1), (record2),
... Oracle doesn't have this syntax, but it can use CTE as follows: 

  INSERT INTO table
  WITH t AS (
    SELECT record1 FROM DUAL UNION ALL
    SELECT record2 FROM DUAL UNION ALL
    ...
  )
  SELECT * FROM t;

And many DBMSs should have CTAS, INSERT SELECT, and INSERT SELECT record1 UNION ALL SELECT record2 ...

The API would simply be:

TupleTableSlot **
ExecForeignMultiInsert(EState *estate,
                  ResultRelInfo *rinfo,
                  TupleTableSlot **slot,
                  TupleTableSlot **planSlot,
                  int numSlots);


> 2) What about the insert results?

I'm wondering if we can report success or failure of each inserted row, because the remote INSERT will fail entirely.
OtherFDWs may be able to do it, so the API can be like above. 

For the same reason, support for RETURNING clause will vary from DBMS to DBMS.


> 3) What about the other DML operations (DELETE/UPDATE)?

I don't think they are necessary for the time being.  If we want them, they will be implemented using the libpq
batch/pipeliningas Andres-san said. 


> 3) Should we do batching for COPY insteads?

I'm thinking of issuing INSERT with multiple records as your patch does, because:

* When the user executed INSERT statements, it would look strange to the user if the remote SQL is displayed as COPY.

* COPY doesn't invoke rules unlike INSERT.  (I don't think the rule is a feature what users care about, though.)  Also,
I'ma bit concerned that there might be, or will be, other differences between INSERT and COPY. 


[1]
Fast COPY FROM command for the table with foreign partitions
https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru


Regards
Takayuki Tsunakawa




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Re: Resetting spilled txn statistics in pg_stat_replication
Следующее
От: "tsunakawa.takay@fujitsu.com"
Дата:
Сообщение: RE: [Patch] Optimize dropping of relation buffers using dlist