Re: foreign table batch inserts

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: foreign table batch inserts
Дата
Msg-id CAMsr+YFgDUiJ37DEfPRk8WDBuZ58psdAYJd8iNFSaGxtw=wU3g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: foreign table batch inserts  (Craig Ringer <craig@2ndquadrant.com>)
Ответы Re: foreign table batch inserts  (Craig Ringer <craig@2ndquadrant.com>)
Список pgsql-hackers
On 20 May 2016 at 15:35, Craig Ringer <craig@2ndquadrant.com> wrote:
 

You can, however, omit Sync from between messages and send a series of protocol messages, like

Parse/Bind/Execute/Bind/Execute/Bind/Execute/Sync

to avoid round-trip overheads.


I implemented what I think is a pretty solid proof of concept of this for kicks this evening. Attached, including basic test program. Patch attached. The performance difference over higher latency links is huge, see below.

Demo/test program in src/test/examples/testlibpqbatch.c.




I still need to add the logic for handling an error during a batch by discarding all input until the next Sync, but otherwise I think it's pretty reasonable.

The time difference for 10k inserts on the local host over a unix socket shows a solid improvement:

batch insert elapsed:      0.244293s
sequential insert elapsed: 0.375402s

... but over, say, a connection to a random AWS RDS instance fired up for the purpose that lives about 320ms away the difference is huge:

batch insert elapsed:      9.029995s
sequential insert elapsed: (I got bored after 10 minutes; it should take a bit less then an hour based on the latency numbers)

With 500 rows on the remote AWS RDS instance, once the I/O quota is already saturated:

batch insert elapsed:      1.229024s
sequential insert elapsed: 156.962180s

which is an improvement by a factor of over 120 

I didn't compare vs COPY. I'm sure COPY will be faster, but COPY doesn't let you do INSERT ... ON CONFLICT, do UPDATE, do DELETE, etc. Not without temp tables and a bunch of hoop jumping anyway. If COPY solved everything there'd be no point having pipelining.

No docs yet, but if folks think the interface is reasonable I can add them easily since the comments on each of the new functoins should be easy to adapt into the SGML docs.

With a bit of polishing I think this can probably go in the next CF, though I only wrote it as an experiment. Can I get opinions on the API?

The TL;DR API, using the usual async libpq routines, is:


PQbeginBatchMode(conn);

PQsendQueryParams(conn, "BEGIN", 0, NULL, NULL, NULL, NULL, 0);

PQsendPrepare(conn, "my_update", "UPDATE ...");

PQsetnonblocking(conn, 1);

while (!all_responses_received)
{
   select(...)

   if (can-write)
   {
     if (app-has-more-data-to-send)
     {
       PQsendQueryPrepared(conn, "my_update", params-go-here);
     }
     else if (havent-sent-commit-yet)
     {
       PQsendQueryParams(conn, "COMMIT", ...);
     }
     else if (havent-sent-endbatch-yet)
     {
       PqEndBatch(conn);
     }
     PQflush(conn);
   }

   if (can-read)
   {
     PQconsumeInput(conn);
     if (PQisBusy(conn))
       continue;
     res = PQgetResult(conn);
     if (res == NULL)
     {
       PQgetNextQuery(conn);
       continue;
     }
     /* process results in the same order we sent the commands */
     /* client keeps track of that, libpq just supplies the results */
     ...
   }
}

PQendBatch(conn);




Note that:

* PQsendQuery cannot be used as it uses simple query protocol, use PQsendQueryParams instead;
* Batch supports PQsendQueryParams, PQsendPrepare, PQsendQueryPrepared, PQsendDescribePrepared, PQsendDescribePortal;
* You don't call PQgetResult after dispatching each query
* Multiple batches may be pipelined, you don't have to wait for one to end to start another (an advantage over JDBC's API)
* non-blocking mode isn't required, but is strongly advised

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ronan Dunklau
Дата:
Сообщение: Possible regression regarding estimating relation width in FDWs
Следующее
От: Ildar Musin
Дата:
Сообщение: Re: Declarative partitioning