Re: postgres_fdw: Use COPY to speed up batch inserts
| От | Matheus Alcantara |
|---|---|
| Тема | Re: postgres_fdw: Use COPY to speed up batch inserts |
| Дата | |
| Msg-id | DE20J06SWW40.IN0FOXQSFUEA@gmail.com обсуждение исходный текст |
| Ответ на | Re: postgres_fdw: Use COPY to speed up batch inserts ("Matheus Alcantara" <matheusssilv97@gmail.com>) |
| Список | pgsql-hackers |
On Fri Oct 31, 2025 at 4:02 PM -03, I wrote: > It's showing a bit complicated to decide at runtime if we should use the > COPY or INSERT for batch insert into a foreign table. Perhaps we could > add a new option on CREATE FOREIGN TABLE to enable this usage or not? We > could document the performance improvements and the limitations so the > user can decide if it should enable or not. > Here is v5 that implement this idea. On this version I've introduced a foreign table and foreign server option "use_copy_for_insert" (I'm open for a better name) that enable the use of the COPY as remote command to execute an INSERT into a foreign table. The COPY can be used if the user enable this option on the foreign table or the foreign server and if the original INSERT statement don't have a RETURNING clause. See the benchmark results: pgbench -n -c 10 -j 10 -t 100 -f bench.sql postgres Master (batch_size = 1 with a single row to insert): tps = 16000.768037 Master (batch_size = 1 with 1000 rows to insert): tps = 133.451518 Master (batch_size = 100 with 1000 rows to insert): tps = 1274.096347 ----------------- Patch(batch_size = 1, use_copy_for_insert = false with single row to insert) tps = 15734.155705 Master (batch_size = 1, use_copy_for_insert = false with 1000 rows to insert): tps = 132.644801 Master (batch_size = 100, use_copy_for_insert = false with 1000 rows to insert): tps = 1245.514591 ----------------- Patch(batch_size = 1, use_copy_for_insert = true with single row to insert) tps = 17604.394057 Master (batch_size = 1, use_copy_for_insert = true with 1000 rows to insert): tps = 88.998804 Master (batch_size = 100, use_copy_for_insert = true with 1000 rows to insert): tps = 2406.009249 ----------------- We can see that when batching inserting with the batch_size configured properly we have a very significant performance improvement and when the "use_copy_for_insert" option is disabled the performance are close compared with master. The problem is when the "batch_size" is 1 (default) and "use_copy_for_insert" is enabled. This is because on this scenario we are sending multiple COPY commands with a single row to the foreign server. One way to fix this would to decide at runtime (at execute_foreign_modify()) if the COPY can be used based on the number of rows being insert. I don't think that I like this option because it would make the EXPLAIN output different when the ANALYZE option is used since during planning time we don't have the number of rows being inserted, so if just EXPLAIN(VERBOSE) is executed we would show the INSERT as remote SQL, and if the ANALYZE is included and we have enough rows to enable the COPY usage, the remote SQL would show the COPY command. Since the new "use_copy_for_insert" option is be disabled by default I think that we could document this limitation and mention the performance improvements when used correctly with the batch_size option. Another option would be to use the COPY command only if the "use_copy_for_insert" is true and also if the "batch_size" is > 1. We would still have the performance issue if the user insert a single row but we would close to less scenarios. The attached 0002 implement this idea. Thoughts? -- Matheus Alcantara EDB: http://www.enterprisedb.com
Вложения
В списке pgsql-hackers по дате отправления: