Re: Pipeline mode and PQpipelineSync()

Поиск
Список
Период
Сортировка
От Boris Kolpackov
Тема Re: Pipeline mode and PQpipelineSync()
Дата
Msg-id boris.20210623100839@codesynthesis.com
обсуждение исходный текст
Ответ на Re: Pipeline mode and PQpipelineSync()  (Alvaro Herrera <alvaro.herrera@2ndquadrant.com>)
Ответы Re: Pipeline mode and PQpipelineSync()  (Boris Kolpackov <boris@codesynthesis.com>)
Re: Pipeline mode and PQpipelineSync()  (Alvaro Herrera <alvaro.herrera@2ndquadrant.com>)
Re: Pipeline mode and PQpipelineSync()  (Alvaro Herrera <alvaro.herrera@2ndquadrant.com>)
Список pgsql-hackers
Alvaro Herrera <alvaro.herrera@2ndquadrant.com> writes:

> > I think always requiring PQpipelineSync() is fine since it also serves
> > as an error recovery boundary. But the fact that the server waits until
> > the sync message to start executing the pipeline is surprising. To me
> > this seems to go contrary to the idea of a "pipeline".
> 
> But does that actually happen? There's a very easy test we can do by
> sending queries that sleep.  If my libpq program sends a "SELECT
> pg_sleep(2)", then PQflush(), then sleep in the client program two more
> seconds without sending the sync; and *then* send the sync, I find that
> the program takes 2 seconds, not four.  This shows that both client and
> server slept in parallel, even though I didn't send the Sync until after
> the client was done sleeping.

Thanks for looking into it. My experiments were with INSERT and I now
was able to try things with larger pipelines. I can now see the server
starts sending results after ~400 statements. So I think you are right,
the server does start executing the pipeline before receiving the sync
message, though there is still something strange going on (but probably
on the client side):

I have a pipeline of say 500 INSERTs. If I "execute" this pipeline by first
sending all the statements and then reading the results, then everything
works as expected. This is the call sequence I am talking about:

PQsendQueryPrepared() # INSERT #1
PQflush()
PQsendQueryPrepared() # INSERT #2
PQflush()
...
PQsendQueryPrepared() # INSERT #500
PQpipelineSync()
PQflush()
PQconsumeInput()
PQgetResult()         # INSERT #1
PQgetResult()         # NULL
PQgetResult()         # INSERT #2
PQgetResult()         # NULL
...
PQgetResult()         # INSERT #500
PQgetResult()         # NULL
PQgetResult()         # PGRES_PIPELINE_SYNC

If, however, I execute it by checking for results before sending the
next INSERT, I get the following call sequence:

PQsendQueryPrepared() # INSERT #1
PQflush()
PQsendQueryPrepared() # INSERT #2
PQflush()
...
PQsendQueryPrepared() # INSERT #~400
PQflush()
PQconsumeInput()      # At this point select() indicates we can read.
PQgetResult()         # NULL (???)
PQgetResult()         # INSERT #1
PQgetResult()         # NULL
PQgetResult()         # INSERT #2
PQgetResult()         # NULL
...


What's strange here is that the first PQgetResult() call (marked with ???)
returns NULL instead of result for INSERT #1 as in the first call sequence.
Interestingly, if I skip it, the rest seems to progress as expected.

Any idea what might be going on here? My hunch is that there is an issue
with libpq's state machine. In particular, in the second case, PQgetResult()
is called before the sync message is sent. Did you have a chance to test
such a scenario (i.e., a large pipeline where the first result is processed
before the PQpipelineSync() call)? Of course, this could very well be a bug
on my side or me misunderstanding something.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: pgbench logging broken by time logic changes
Следующее
От: Fabien COELHO
Дата:
Сообщение: Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors