Обсуждение: pgbnech: allow to cancel queries during benchmark

Поиск

Список

Период

Сортировка

pgbnech: allow to cancel queries during benchmark

От

Yugo NAGATA

Дата:

26 июня 2023 г., 16:46:28

Hello,

This attached patch enables pgbench to cancel queries during benchmark.

Formerly, Ctrl+C during benchmark killed pgbench immediately, but backend
processes executing long queries remained for a while. You can simply
reproduce this problem by cancelling the pgbench running a custom script
executing "SELECT pg_sleep(10)".

The patch fixes this so that cancel requests are sent for all connections on
Ctrl+C, and all running queries are cancelled before pgbench exits.

In thread #0, setup_cancel_handler is called before the loop, the
CancelRequested flag is set when Ctrl+C is issued. In the loop, cancel
requests are sent when the flag is set only in thread #0. SIGINT is
blocked in other threads, but the threads will exit after their query
are cancelled. If thread safety is disabled or OS is Windows, the signal
is not blocked because pthread_sigmask cannot be used. 
(I didn't test the patch on WIndows yet, though.)

I choose the design that the signal handler and the query cancel are
performed only in thread #0 because I wanted to make the behavior as
predicable as possible. However, another design that all thread can
received SIGINT and that the first thread that catches the signal is
responsible to sent cancel requests for all connections may also work.

Also, the array of CState that contains all clients state is changed to
a global variable so that all connections can be accessed within a thread.

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

pgbench_query_cancel.patch

Re: pgbnech: allow to cancel queries during benchmark

От

Kirk Wolak

Дата:

26 июня 2023 г., 19:59:21

On Mon, Jun 26, 2023 at 9:46 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:

Hello,

This attached patch enables pgbench to cancel queries during benchmark.

Formerly, Ctrl+C during benchmark killed pgbench immediately, but backend
processes executing long queries remained for a while. You can simply
reproduce this problem by cancelling the pgbench running a custom script
executing "SELECT pg_sleep(10)".

The patch fixes this so that cancel requests are sent for all connections on
Ctrl+C, and all running queries are cancelled before pgbench exits.

In thread #0, setup_cancel_handler is called before the loop, the
CancelRequested flag is set when Ctrl+C is issued. In the loop, cancel
requests are sent when the flag is set only in thread #0. SIGINT is
blocked in other threads, but the threads will exit after their query
are cancelled. If thread safety is disabled or OS is Windows, the signal
is not blocked because pthread_sigmask cannot be used.
(I didn't test the patch on WIndows yet, though.)

I choose the design that the signal handler and the query cancel are
performed only in thread #0 because I wanted to make the behavior as
predicable as possible. However, another design that all thread can
received SIGINT and that the first thread that catches the signal is
responsible to sent cancel requests for all connections may also work.

Also, the array of CState that contains all clients state is changed to
a global variable so that all connections can be accessed within a thread.

I also like the thread #0 handling design. I have NOT reviewed/tested this yet.

Re: pgbnech: allow to cancel queries during benchmark

От

Yugo NAGATA

Дата:

27 июня 2023 г., 10:48:45

On Mon, 26 Jun 2023 12:59:21 -0400
Kirk Wolak <wolakk@gmail.com> wrote:

> On Mon, Jun 26, 2023 at 9:46 AM Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> 
> > Hello,
> >
> > This attached patch enables pgbench to cancel queries during benchmark.
> >
> > Formerly, Ctrl+C during benchmark killed pgbench immediately, but backend
> > processes executing long queries remained for a while. You can simply
> > reproduce this problem by cancelling the pgbench running a custom script
> > executing "SELECT pg_sleep(10)".
> >
> > The patch fixes this so that cancel requests are sent for all connections
> > on
> > Ctrl+C, and all running queries are cancelled before pgbench exits.
> >
> > In thread #0, setup_cancel_handler is called before the loop, the
> > CancelRequested flag is set when Ctrl+C is issued. In the loop, cancel
> > requests are sent when the flag is set only in thread #0. SIGINT is
> > blocked in other threads, but the threads will exit after their query
> > are cancelled. If thread safety is disabled or OS is Windows, the signal
> > is not blocked because pthread_sigmask cannot be used.
> > (I didn't test the patch on WIndows yet, though.)
> >
> > I choose the design that the signal handler and the query cancel are
> > performed only in thread #0 because I wanted to make the behavior as
> > predicable as possible. However, another design that all thread can
> > received SIGINT and that the first thread that catches the signal is
> > responsible to sent cancel requests for all connections may also work.
> >
> > Also, the array of CState that contains all clients state is changed to
> > a global variable so that all connections can be accessed within a thread.
> >
> >
> > +1
>   I also like the thread #0 handling design.  I have NOT reviewed/tested
> this yet.

Thank you for your comment.

As a side note, actually I thought another design to create a special thread
for handling query cancelling in which SIGINT would be catched by sigwait,
I quit the idea because it would add complexity due to code for Windows and
--disabled-thread-safe.

I would appreciate it if you could kindly review and test the patch!

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: pgbnech: allow to cancel queries during benchmark

От

Fabien COELHO

Дата:

03 июля 2023 г., 15:11:36

Hello Yugo-san,

>>> In thread #0, setup_cancel_handler is called before the loop, the
>>> CancelRequested flag is set when Ctrl+C is issued. In the loop, cancel
>>> requests are sent when the flag is set only in thread #0. SIGINT is
>>> blocked in other threads, but the threads will exit after their query
>>> are cancelled. If thread safety is disabled or OS is Windows, the signal
>>> is not blocked because pthread_sigmask cannot be used.
>>> (I didn't test the patch on WIndows yet, though.)
>>>
>>> I choose the design that the signal handler and the query cancel are
>>> performed only in thread #0 because I wanted to make the behavior as
>>> predicable as possible. However, another design that all thread can
>>> received SIGINT and that the first thread that catches the signal is
>>> responsible to sent cancel requests for all connections may also work.
>>>
>>> Also, the array of CState that contains all clients state is changed to
>>> a global variable so that all connections can be accessed within a thread.

> As a side note, actually I thought another design to create a special thread
> for handling query cancelling in which SIGINT would be catched by sigwait,
> I quit the idea because it would add complexity due to code for Windows and
> --disabled-thread-safe.

I agree that the simpler the better.

> I would appreciate it if you could kindly review and test the patch!

I'll try to have a look at it.

-- 
Fabien.

Re: pgbnech: allow to cancel queries during benchmark

От

Fabien COELHO

Дата:

03 июля 2023 г., 21:39:23

Yugo-san,

Some feedback about v1 of this patch.

Patch applies cleanly, compiles.

There are no tests, could there be one? ISTM that one could be done with a 
"SELECT pg_sleep(...)" script??

The global name "all_state" is quite ambiguous, what about "client_states" 
instead? Or maybe it could be avoided, see below.

Instead of renaming "state" to "all_state" (or client_states as suggested 
above), I'd suggest to minimize the patch by letting "state" inside the 
main and adding a "client_states = state;" just after the allocation, or 
another approach, see below.

Should PQfreeCancel be called on deconnections, in finishCon? I think that 
there may be a memory leak with the current implementation??

Maybe it should check that cancel is not NULL before calling PQcancel?

After looking at the code, I'm very unclear whether they may be some 
underlying race conditions, or not, depending on when the cancel is 
triggered. I think that some race conditions are still likely given the 
current thread 0 implementation, and dealing with them with a barrier or 
whatever is not desirable at all.

In order to work around this issue, ISTM that we should go away from the 
simple and straightforward thread 0 approach, and the only clean way is 
that the cancelation should be managed by each thread for its own client.

I'd suggest to have the advanceState to call PQcancel when CancelRequested 
is set and switch to CSTATE_ABORTED to end properly. This means that there 
would be no need for the global client states pointer, so the patch should 
be smaller and simpler. Possibly there would be some shortcuts added here 
and there to avoid lingering after the control-C, in threadRun.

-- 
Fabien.

Re: pgbnech: allow to cancel queries during benchmark

От

Yugo NAGATA

Дата:

14 июля 2023 г., 14:32:01

Hello Fabien,

Thank you for your review!

On Mon, 3 Jul 2023 20:39:23 +0200 (CEST)
Fabien COELHO <coelho@cri.ensmp.fr> wrote:

> 
> Yugo-san,
> 
> Some feedback about v1 of this patch.
> 
> Patch applies cleanly, compiles.
> 
> There are no tests, could there be one? ISTM that one could be done with a 
> "SELECT pg_sleep(...)" script??

Agreed. I will add the test.

> The global name "all_state" is quite ambiguous, what about "client_states" 
> instead? Or maybe it could be avoided, see below.
> 
> Instead of renaming "state" to "all_state" (or client_states as suggested 
> above), I'd suggest to minimize the patch by letting "state" inside the 
> main and adding a "client_states = state;" just after the allocation, or 
> another approach, see below.

Ok, I'll fix to add a global variable "client_states" and make this point to
"state" instead of changing "state" to global.
 
> Should PQfreeCancel be called on deconnections, in finishCon? I think that 
> there may be a memory leak with the current implementation??

Agreed. I'll fix.
 
> Maybe it should check that cancel is not NULL before calling PQcancel?

I think this is already checked as below, but am I missing something?

+       if (all_state[i].cancel != NULL)
+           (void) PQcancel(all_state[i].cancel, errbuf, sizeof(errbuf));

> After looking at the code, I'm very unclear whether they may be some 
> underlying race conditions, or not, depending on when the cancel is 
> triggered. I think that some race conditions are still likely given the 
> current thread 0 implementation, and dealing with them with a barrier or 
> whatever is not desirable at all.
> 
> In order to work around this issue, ISTM that we should go away from the 
> simple and straightforward thread 0 approach, and the only clean way is 
> that the cancelation should be managed by each thread for its own client.
> 
> I'd suggest to have the advanceState to call PQcancel when CancelRequested 
> is set and switch to CSTATE_ABORTED to end properly. This means that there 
> would be no need for the global client states pointer, so the patch should 
> be smaller and simpler. Possibly there would be some shortcuts added here 
> and there to avoid lingering after the control-C, in threadRun.

I am not sure this approach is simpler than mine. 

In multi-threads, only one thread can catches the signal and other threads
continue to run. Therefore, if Ctrl+C is pressed while threads are waiting
responses from the backend in wait_on_socket_set, only one thread can be
interrupted and return, but other threads will continue to wait and cannot
check CancelRequested. So, for implementing your suggestion, we need any hack
to make all threads return from wait_on_socket_set when the event occurs, but
I don't have idea to do this in simpler way. 

In my patch, all threads can return from wait_on_socket_set at Ctrl+C
because when thread #0 cancels all connections, the following error is
sent to all sessions:

 ERROR:  canceling statement due to user request

and all threads will receive the response from the backend.

Regards,
Yugo Nagata

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: pgbnech: allow to cancel queries during benchmark

От

Yugo NAGATA

Дата:

02 августа 2023 г., 10:37:53

Hello Fabien,

On Fri, 14 Jul 2023 20:32:01 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

I attached the updated patch.

> Hello Fabien,
> 
> Thank you for your review!
> 
> On Mon, 3 Jul 2023 20:39:23 +0200 (CEST)
> Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> 
> > 
> > Yugo-san,
> > 
> > Some feedback about v1 of this patch.
> > 
> > Patch applies cleanly, compiles.
> > 
> > There are no tests, could there be one? ISTM that one could be done with a 
> > "SELECT pg_sleep(...)" script??
> 
> Agreed. I will add the test.

I added a TAP test.

> 
> > The global name "all_state" is quite ambiguous, what about "client_states" 
> > instead? Or maybe it could be avoided, see below.
> > 
> > Instead of renaming "state" to "all_state" (or client_states as suggested 
> > above), I'd suggest to minimize the patch by letting "state" inside the 
> > main and adding a "client_states = state;" just after the allocation, or 
> > another approach, see below.
> 
> Ok, I'll fix to add a global variable "client_states" and make this point to
> "state" instead of changing "state" to global.

Done.

>  
> > Should PQfreeCancel be called on deconnections, in finishCon? I think that 
> > there may be a memory leak with the current implementation??
> 
> Agreed. I'll fix.

Done.

Regards,
Yugo Nagata

>  
> > Maybe it should check that cancel is not NULL before calling PQcancel?
> 
> I think this is already checked as below, but am I missing something?
> 
> +       if (all_state[i].cancel != NULL)
> +           (void) PQcancel(all_state[i].cancel, errbuf, sizeof(errbuf));
> 
> > After looking at the code, I'm very unclear whether they may be some 
> > underlying race conditions, or not, depending on when the cancel is 
> > triggered. I think that some race conditions are still likely given the 
> > current thread 0 implementation, and dealing with them with a barrier or 
> > whatever is not desirable at all.
> > 
> > In order to work around this issue, ISTM that we should go away from the 
> > simple and straightforward thread 0 approach, and the only clean way is 
> > that the cancelation should be managed by each thread for its own client.
> > 
> > I'd suggest to have the advanceState to call PQcancel when CancelRequested 
> > is set and switch to CSTATE_ABORTED to end properly. This means that there 
> > would be no need for the global client states pointer, so the patch should 
> > be smaller and simpler. Possibly there would be some shortcuts added here 
> > and there to avoid lingering after the control-C, in threadRun.
> 
> I am not sure this approach is simpler than mine. 
> 
> In multi-threads, only one thread can catches the signal and other threads
> continue to run. Therefore, if Ctrl+C is pressed while threads are waiting
> responses from the backend in wait_on_socket_set, only one thread can be
> interrupted and return, but other threads will continue to wait and cannot
> check CancelRequested. So, for implementing your suggestion, we need any hack
> to make all threads return from wait_on_socket_set when the event occurs, but
> I don't have idea to do this in simpler way. 
> 
> In my patch, all threads can return from wait_on_socket_set at Ctrl+C
> because when thread #0 cancels all connections, the following error is
> sent to all sessions:
> 
>  ERROR:  canceling statement due to user request
> 
> and all threads will receive the response from the backend.
> 
> Regards,
> Yugo Nagata
> 
> -- 
> Yugo NAGATA <nagata@sraoss.co.jp>
> 
> 


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: pgbnech: allow to cancel queries during benchmark

От

Yugo NAGATA

Дата:

02 августа 2023 г., 13:01:40

On Wed, 2 Aug 2023 16:37:53 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

> Hello Fabien,
> 
> On Fri, 14 Jul 2023 20:32:01 +0900
> Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> 
> I attached the updated patch.

I'm sorry. I forgot to attach the patch.

Regards,
Yugo Nagata

> 
> > Hello Fabien,
> > 
> > Thank you for your review!
> > 
> > On Mon, 3 Jul 2023 20:39:23 +0200 (CEST)
> > Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> > 
> > > 
> > > Yugo-san,
> > > 
> > > Some feedback about v1 of this patch.
> > > 
> > > Patch applies cleanly, compiles.
> > > 
> > > There are no tests, could there be one? ISTM that one could be done with a 
> > > "SELECT pg_sleep(...)" script??
> > 
> > Agreed. I will add the test.
> 
> I added a TAP test.
> 
> > 
> > > The global name "all_state" is quite ambiguous, what about "client_states" 
> > > instead? Or maybe it could be avoided, see below.
> > > 
> > > Instead of renaming "state" to "all_state" (or client_states as suggested 
> > > above), I'd suggest to minimize the patch by letting "state" inside the 
> > > main and adding a "client_states = state;" just after the allocation, or 
> > > another approach, see below.
> > 
> > Ok, I'll fix to add a global variable "client_states" and make this point to
> > "state" instead of changing "state" to global.
> 
> Done.
> 
> >  
> > > Should PQfreeCancel be called on deconnections, in finishCon? I think that 
> > > there may be a memory leak with the current implementation??
> > 
> > Agreed. I'll fix.
> 
> Done.
> 
> Regards,
> Yugo Nagata
> 
> >  
> > > Maybe it should check that cancel is not NULL before calling PQcancel?
> > 
> > I think this is already checked as below, but am I missing something?
> > 
> > +       if (all_state[i].cancel != NULL)
> > +           (void) PQcancel(all_state[i].cancel, errbuf, sizeof(errbuf));
> > 
> > > After looking at the code, I'm very unclear whether they may be some 
> > > underlying race conditions, or not, depending on when the cancel is 
> > > triggered. I think that some race conditions are still likely given the 
> > > current thread 0 implementation, and dealing with them with a barrier or 
> > > whatever is not desirable at all.
> > > 
> > > In order to work around this issue, ISTM that we should go away from the 
> > > simple and straightforward thread 0 approach, and the only clean way is 
> > > that the cancelation should be managed by each thread for its own client.
> > > 
> > > I'd suggest to have the advanceState to call PQcancel when CancelRequested 
> > > is set and switch to CSTATE_ABORTED to end properly. This means that there 
> > > would be no need for the global client states pointer, so the patch should 
> > > be smaller and simpler. Possibly there would be some shortcuts added here 
> > > and there to avoid lingering after the control-C, in threadRun.
> > 
> > I am not sure this approach is simpler than mine. 
> > 
> > In multi-threads, only one thread can catches the signal and other threads
> > continue to run. Therefore, if Ctrl+C is pressed while threads are waiting
> > responses from the backend in wait_on_socket_set, only one thread can be
> > interrupted and return, but other threads will continue to wait and cannot
> > check CancelRequested. So, for implementing your suggestion, we need any hack
> > to make all threads return from wait_on_socket_set when the event occurs, but
> > I don't have idea to do this in simpler way. 
> > 
> > In my patch, all threads can return from wait_on_socket_set at Ctrl+C
> > because when thread #0 cancels all connections, the following error is
> > sent to all sessions:
> > 
> >  ERROR:  canceling statement due to user request
> > 
> > and all threads will receive the response from the backend.
> > 
> > Regards,
> > Yugo Nagata
> > 
> > -- 
> > Yugo NAGATA <nagata@sraoss.co.jp>
> > 
> > 
> 
> 
> -- 
> Yugo NAGATA <nagata@sraoss.co.jp>
> 
> 


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

v2_pgbench_query_cancel.patch

Re: pgbnech: allow to cancel queries during benchmark

От

Fabien COELHO

Дата:

09 августа 2023 г., 12:06:24

Hello Yugo-san,

Some feedback about v2.

There is some dead code (&& false) which should be removed.

>>>> Maybe it should check that cancel is not NULL before calling PQcancel?
>>>
>>> I think this is already checked as below, but am I missing something?
>>>
>>> +       if (all_state[i].cancel != NULL)
>>> +           (void) PQcancel(all_state[i].cancel, errbuf, sizeof(errbuf));
>>>
>>>> After looking at the code, I'm very unclear whether they may be some
>>>> underlying race conditions, or not, depending on when the cancel is
>>>> triggered. I think that some race conditions are still likely given the
>>>> current thread 0 implementation, and dealing with them with a barrier or
>>>> whatever is not desirable at all.
>>>>
>>>> In order to work around this issue, ISTM that we should go away from the
>>>> simple and straightforward thread 0 approach, and the only clean way is
>>>> that the cancelation should be managed by each thread for its own client.
>>>>
>>>> I'd suggest to have the advanceState to call PQcancel when CancelRequested
>>>> is set and switch to CSTATE_ABORTED to end properly. This means that there
>>>> would be no need for the global client states pointer, so the patch should
>>>> be smaller and simpler. Possibly there would be some shortcuts added here
>>>> and there to avoid lingering after the control-C, in threadRun.
>>>
>>> I am not sure this approach is simpler than mine.

My argument is more about latent race conditions and inter-thread 
interference than code simplicity.

>>> In multi-threads, only one thread can catches the signal and other threads
>>> continue to run.

Yep. This why I see plenty uncontrolled race conditions if thread 0 
cancels clients which are managed in parallel by other threads and may be 
active. I'm not really motivated to delve into libpq internals to check 
whether there are possibly bad issues or not, but if two threads write 
message at the same time in the same socket, I assume that this can be 
bad if you are unlucky.

ISTM that the rational convention should be that each thread cancels its 
own clients, which ensures that there is no bad interaction between 
threads.

>>> Therefore, if Ctrl+C is pressed while threads are waiting
>>> responses from the backend in wait_on_socket_set, only one thread can be
>>> interrupted and return, but other threads will continue to wait and cannot
>>> check CancelRequested.

>>> So, for implementing your suggestion, we need any hack
>>> to make all threads return from wait_on_socket_set when the event occurs, but
>>> I don't have idea to do this in simpler way.

>>> In my patch, all threads can return from wait_on_socket_set at Ctrl+C
>>> because when thread #0 cancels all connections, the following error is
>>> sent to all sessions:
>>>
>>>  ERROR:  canceling statement due to user request
>>>
>>> and all threads will receive the response from the backend.

Hmmm.

I understand that the underlying issue you are raising is that other 
threads may be stuck while waiting on socket events and that with your 
approach they will be cleared somehow by socket 0.

I'll say that (1) this point does not address potential race condition 
issues with several thread interacting directly on the same client ;
(2) thread 0 may also be stuck waiting for events so the cancel is only 
taken into account when it is woken up.

If we accept that each thread cancels its clients when it is woken up, 
which may imply some (usually small) delay, then it is not so different 
from the current version because the code must wait for 0 to wake up 
anyway, and it solves (1). The current version does not address potential 
thread interactions.

-- 
Fabien.

Re: pgbnech: allow to cancel queries during benchmark

От

Fabien COELHO

Дата:

09 августа 2023 г., 12:18:43

I forgot, about the test:

I think that it should be integrated in the existing 
"001_pgbench_with_server.pl" script, because a TAP script is pretty 
expensive as it creates a cluster, starts it… before running the test.

I'm surprise that IPC::Run does not give any access to the process number. 
Anyway, its object interface seems to allow sending signal:

     $h->signal("...")

So the code could be simplified to use that after a small delay.

-- 
Fabien.

Re: pgbnech: allow to cancel queries during benchmark

От

Yugo NAGATA

Дата:

10 августа 2023 г., 06:23:11

Hello Fabien,

On Wed, 9 Aug 2023 11:06:24 +0200 (CEST)
Fabien COELHO <coelho@cri.ensmp.fr> wrote:

> 
> Hello Yugo-san,
> 
> Some feedback about v2.
> 
> There is some dead code (&& false) which should be removed.

I forgot to remove the debug code. I'll remove it.

> >>> In multi-threads, only one thread can catches the signal and other threads
> >>> continue to run.
> 
> Yep. This why I see plenty uncontrolled race conditions if thread 0 
> cancels clients which are managed in parallel by other threads and may be 
> active. I'm not really motivated to delve into libpq internals to check 
> whether there are possibly bad issues or not, but if two threads write 
> message at the same time in the same socket, I assume that this can be 
> bad if you are unlucky.
> 
> ISTM that the rational convention should be that each thread cancels its 
> own clients, which ensures that there is no bad interaction between 
> threads.

Actually, thread #0 and other threads never write message at the same time
in the same socket. When thread #0 sends cancel requests, they are not sent
to sockets that other threads are reading or writing. Rather, new another
socket for cancel is created for each client, and the backend PID and cancel
request are sent to the socket. PostgreSQL establishes a new connection for
the cancel request, and sent a cancel signal to the specified backend.

Therefore, thread #0 and other threads don't access any resources in the same
time except to CancelRequested. Is still there any concern about race condition?

> >>> Therefore, if Ctrl+C is pressed while threads are waiting
> >>> responses from the backend in wait_on_socket_set, only one thread can be
> >>> interrupted and return, but other threads will continue to wait and cannot
> >>> check CancelRequested.
> 
> >>> So, for implementing your suggestion, we need any hack
> >>> to make all threads return from wait_on_socket_set when the event occurs, but
> >>> I don't have idea to do this in simpler way.
> 
> >>> In my patch, all threads can return from wait_on_socket_set at Ctrl+C
> >>> because when thread #0 cancels all connections, the following error is
> >>> sent to all sessions:
> >>>
> >>>  ERROR:  canceling statement due to user request
> >>>
> >>> and all threads will receive the response from the backend.
> 
> Hmmm.
> 
> I understand that the underlying issue you are raising is that other 
> threads may be stuck while waiting on socket events and that with your 
> approach they will be cleared somehow by socket 0.
> 
> I'll say that (1) this point does not address potential race condition 
> issues with several thread interacting directly on the same client ;
> (2) thread 0 may also be stuck waiting for events so the cancel is only 
> taken into account when it is woken up.

I answered to (1) the consern about race condition above.

And, as to (2), the SIGINT signal is handle only in thread #0 because it is
blocked in other threads. So, when SIGINT is delivered, thread #0 will be
interrupted and woken up immediately from waiting on socket, returning EINTR. 
Therefore, thread #0 would not be stuck.
 
Regards,
Yugo Nagata

> If we accept that each thread cancels its clients when it is woken up, 
> which may imply some (usually small) delay, then it is not so different 
> from the current version because the code must wait for 0 to wake up 
> anyway, and it solves (1). The current version does not address potential 
> thread interactions.
> 
> -- 
> Fabien.


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: pgbnech: allow to cancel queries during benchmark

От

Yugo NAGATA

Дата:

10 августа 2023 г., 06:32:44

On Wed, 9 Aug 2023 11:18:43 +0200 (CEST)
Fabien COELHO <coelho@cri.ensmp.fr> wrote:

> 
> I forgot, about the test:
> 
> I think that it should be integrated in the existing 
> "001_pgbench_with_server.pl" script, because a TAP script is pretty 
> expensive as it creates a cluster, starts it… before running the test.

Ok. I'll integrate the test into 001.

> I'm surprise that IPC::Run does not give any access to the process number. 
> Anyway, its object interface seems to allow sending signal:
> 
>      $h->signal("...")
> 
> So the code could be simplified to use that after a small delay.

Thank you for your information.

I didn't know $h->signal() and  I mimicked the way of
src/bin/psql/t/020_cancel.pl to send SIGINT to a running program. I don't
know why the psql test doesn't use the interface, I'll investigate whether
this can be used in our purpose, anyway.

Regards,
Yugo Nagata

> 
> -- 
> Fabien.

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: pgbnech: allow to cancel queries during benchmark

От

Yugo NAGATA

Дата:

06 сентября 2023 г., 14:13:34

Hi Fabien,

On Thu, 10 Aug 2023 12:32:44 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

> On Wed, 9 Aug 2023 11:18:43 +0200 (CEST)
> Fabien COELHO <coelho@cri.ensmp.fr> wrote:
> 
> > 
> > I forgot, about the test:
> > 
> > I think that it should be integrated in the existing 
> > "001_pgbench_with_server.pl" script, because a TAP script is pretty 
> > expensive as it creates a cluster, starts it… before running the test.
> 
> Ok. I'll integrate the test into 001.
>  
> > I'm surprise that IPC::Run does not give any access to the process number. 
> > Anyway, its object interface seems to allow sending signal:
> > 
> >      $h->signal("...")
> > 
> > So the code could be simplified to use that after a small delay.
> 
> Thank you for your information.
> 
> I didn't know $h->signal() and  I mimicked the way of
> src/bin/psql/t/020_cancel.pl to send SIGINT to a running program. I don't
> know why the psql test doesn't use the interface, I'll investigate whether
> this can be used in our purpose, anyway.

I attached the updated patch v3. The changes since the previous
patch includes the following;

I removed the unnecessary condition (&& false) that you
pointed out in [1].

The test was rewritten by using IPC::Run signal() and integrated
to "001_pgbench_with_server.pl". This test is skipped on Windows
because SIGINT causes to terminate the test itself as discussed
in [2] about query cancellation test in psql.

I added some comments to describe how query cancellation is
handled as I explained in [1].

Also, I found the previous patch didn't work on Windows so fixed it.
On non-Windows system, a thread waiting a response of long query can
be interrupted by SIGINT, but on Windows, threads do not return from
waiting until queries they are running are cancelled. This is because,
when the signal is received, the system just creates a new thread to
execute the callback function specified by setup_cancel_handler, and
other thread continue to run[3]. Therefore, the queries have to be
cancelled in the callback function.

[1] https://www.postgresql.org/message-id/a58388ac-5411-4760-ea46-71324d8324cb%40mines-paristech.fr
[2] https://www.postgresql.org/message-id/20230906004524.2fd6ee049f8a6c6f2690b99c%40sraoss.co.jp
[3] https://learn.microsoft.com/en-us/windows/console/handlerroutine

Regards,
Yugo Nagata

> 
> -- 
> Yugo NAGATA <nagata@sraoss.co.jp>
> 
> 

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

v3_pgbench_query_cancel.patch

Re: pgbnech: allow to cancel queries during benchmark

От

Yugo NAGATA

Дата:

19 сентября 2023 г., 11:30:11

On Wed, 6 Sep 2023 20:13:34 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:
 
> I attached the updated patch v3. The changes since the previous
> patch includes the following;
> 
> I removed the unnecessary condition (&& false) that you
> pointed out in [1].
> 
> The test was rewritten by using IPC::Run signal() and integrated
> to "001_pgbench_with_server.pl". This test is skipped on Windows
> because SIGINT causes to terminate the test itself as discussed
> in [2] about query cancellation test in psql.
> 
> I added some comments to describe how query cancellation is
> handled as I explained in [1].
> 
> Also, I found the previous patch didn't work on Windows so fixed it.
> On non-Windows system, a thread waiting a response of long query can
> be interrupted by SIGINT, but on Windows, threads do not return from
> waiting until queries they are running are cancelled. This is because,
> when the signal is received, the system just creates a new thread to
> execute the callback function specified by setup_cancel_handler, and
> other thread continue to run[3]. Therefore, the queries have to be
> cancelled in the callback function.
> 
> [1] https://www.postgresql.org/message-id/a58388ac-5411-4760-ea46-71324d8324cb%40mines-paristech.fr
> [2] https://www.postgresql.org/message-id/20230906004524.2fd6ee049f8a6c6f2690b99c%40sraoss.co.jp
> [3] https://learn.microsoft.com/en-us/windows/console/handlerroutine

I found that --disable-thread-safety option was removed in 68a4b58eca0329.
So, I removed codes involving ENABLE_THREAD_SAFETY from the patch.

Also, I wrote a commit log draft.

Attached is the updated version, v4.

Regards,
Yugo Nagata


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

v4-0001-Allow-pgbnech-to-cancel-queries-during-benchmark.patch

Re: pgbnech: allow to cancel queries during benchmark

От

Tatsuo Ishii

Дата:

15 января, 10:49:44

> On Wed, 6 Sep 2023 20:13:34 +0900
> Yugo NAGATA <nagata@sraoss.co.jp> wrote:
>  
>> I attached the updated patch v3. The changes since the previous
>> patch includes the following;
>> 
>> I removed the unnecessary condition (&& false) that you
>> pointed out in [1].
>> 
>> The test was rewritten by using IPC::Run signal() and integrated
>> to "001_pgbench_with_server.pl". This test is skipped on Windows
>> because SIGINT causes to terminate the test itself as discussed
>> in [2] about query cancellation test in psql.
>> 
>> I added some comments to describe how query cancellation is
>> handled as I explained in [1].
>> 
>> Also, I found the previous patch didn't work on Windows so fixed it.
>> On non-Windows system, a thread waiting a response of long query can
>> be interrupted by SIGINT, but on Windows, threads do not return from
>> waiting until queries they are running are cancelled. This is because,
>> when the signal is received, the system just creates a new thread to
>> execute the callback function specified by setup_cancel_handler, and
>> other thread continue to run[3]. Therefore, the queries have to be
>> cancelled in the callback function.
>> 
>> [1] https://www.postgresql.org/message-id/a58388ac-5411-4760-ea46-71324d8324cb%40mines-paristech.fr
>> [2] https://www.postgresql.org/message-id/20230906004524.2fd6ee049f8a6c6f2690b99c%40sraoss.co.jp
>> [3] https://learn.microsoft.com/en-us/windows/console/handlerroutine
> 
> I found that --disable-thread-safety option was removed in 68a4b58eca0329.
> So, I removed codes involving ENABLE_THREAD_SAFETY from the patch.
> 
> Also, I wrote a commit log draft.

> Previously, Ctrl+C during benchmark killed pgbench immediately,
> but queries running at that time were not cancelled.

Better to mention explicitely that queries keep on running on the
backend. What about this?

Previously, Ctrl+C during benchmark killed pgbench immediately, but
queries were not canceled and they keep on running on the backend
until they tried to send the result to pgbench.

> The commit
> fixes this so that cancel requests are sent for all connections
> before pgbench exits.

"sent for" -> "sent to"

> Attached is the updated version, v4.

+/* send cancel requests to all connections */
+static void
+cancel_all()
+{
+    for (int i = 0; i < nclients; i++)
+    {
+        char errbuf[1];
+        if (client_states[i].cancel != NULL)
+            (void) PQcancel(client_states[i].cancel, errbuf, sizeof(errbuf));
+    }
+}
+

Why in case of errors from PQCancel the error message is neglected? I
think it's better to print out the error message in case of error.

+     * On non-Windows, any callback function is not set. When SIGINT is
+     * received, CancelRequested is just set, and only thread #0 is interrupted
+     * and returns from waiting input from the backend. After that, the thread
+     * sends cancel requests to all benchmark queries.

The second line is a little bit long according to the coding
standard. Fix like this?

     * On non-Windows, any callback function is not set. When SIGINT is
     * received, CancelRequested is just set, and only thread #0 is
     * interrupted and returns from waiting input from the backend. After
     * that, the thread sends cancel requests to all benchmark queries.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: pgbnech: allow to cancel queries during benchmark

От

Yugo NAGATA

Дата:

19 января, 10:51:20

On Mon, 15 Jan 2024 16:49:44 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

> > On Wed, 6 Sep 2023 20:13:34 +0900
> > Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> >  
> >> I attached the updated patch v3. The changes since the previous
> >> patch includes the following;
> >> 
> >> I removed the unnecessary condition (&& false) that you
> >> pointed out in [1].
> >> 
> >> The test was rewritten by using IPC::Run signal() and integrated
> >> to "001_pgbench_with_server.pl". This test is skipped on Windows
> >> because SIGINT causes to terminate the test itself as discussed
> >> in [2] about query cancellation test in psql.
> >> 
> >> I added some comments to describe how query cancellation is
> >> handled as I explained in [1].
> >> 
> >> Also, I found the previous patch didn't work on Windows so fixed it.
> >> On non-Windows system, a thread waiting a response of long query can
> >> be interrupted by SIGINT, but on Windows, threads do not return from
> >> waiting until queries they are running are cancelled. This is because,
> >> when the signal is received, the system just creates a new thread to
> >> execute the callback function specified by setup_cancel_handler, and
> >> other thread continue to run[3]. Therefore, the queries have to be
> >> cancelled in the callback function.
> >> 
> >> [1] https://www.postgresql.org/message-id/a58388ac-5411-4760-ea46-71324d8324cb%40mines-paristech.fr
> >> [2] https://www.postgresql.org/message-id/20230906004524.2fd6ee049f8a6c6f2690b99c%40sraoss.co.jp
> >> [3] https://learn.microsoft.com/en-us/windows/console/handlerroutine
> > 
> > I found that --disable-thread-safety option was removed in 68a4b58eca0329.
> > So, I removed codes involving ENABLE_THREAD_SAFETY from the patch.
> > 
> > Also, I wrote a commit log draft.
> 
> > Previously, Ctrl+C during benchmark killed pgbench immediately,
> > but queries running at that time were not cancelled.
> 
> Better to mention explicitely that queries keep on running on the
> backend. What about this?
> 
> Previously, Ctrl+C during benchmark killed pgbench immediately, but
> queries were not canceled and they keep on running on the backend
> until they tried to send the result to pgbench.

Thank you for your comments. I agree with you, so I fixed the message
as your suggestion.

> > The commit
> > fixes this so that cancel requests are sent for all connections
> > before pgbench exits.
> 
> "sent for" -> "sent to"

Fixed.

> > Attached is the updated version, v4.
> 
> +/* send cancel requests to all connections */
> +static void
> +cancel_all()
> +{
> +    for (int i = 0; i < nclients; i++)
> +    {
> +        char errbuf[1];
> +        if (client_states[i].cancel != NULL)
> +            (void) PQcancel(client_states[i].cancel, errbuf, sizeof(errbuf));
> +    }
> +}
> +
> 
> Why in case of errors from PQCancel the error message is neglected? I
> think it's better to print out the error message in case of error.

Is the message useful for pgbench users? I saw the error is ignored
in pg_dump, for example in bin/pg_dump/parallel.c

        /*
         * Send QueryCancel to leader connection, if enabled.  Ignore errors,
         * there's not much we can do about them anyway.
         */
        if (signal_info.myAH != NULL && signal_info.myAH->connCancel != NULL)
            (void) PQcancel(signal_info.myAH->connCancel,
                            errbuf, sizeof(errbuf));


> +     * On non-Windows, any callback function is not set. When SIGINT is
> +     * received, CancelRequested is just set, and only thread #0 is interrupted
> +     * and returns from waiting input from the backend. After that, the thread
> +     * sends cancel requests to all benchmark queries.
> 
> The second line is a little bit long according to the coding
> standard. Fix like this?
> 
>      * On non-Windows, any callback function is not set. When SIGINT is
>      * received, CancelRequested is just set, and only thread #0 is
>      * interrupted and returns from waiting input from the backend. After
>      * that, the thread sends cancel requests to all benchmark queries.

Fixed.

The attached is the updated patch, v5.

Regards,
Yugo Nagata

> Best reagards,
> --
> Tatsuo Ishii
> SRA OSS LLC
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

v5-0001-Allow-pgbnech-to-cancel-queries-during-benchmark.patch

Re: pgbnech: allow to cancel queries during benchmark

От

Tatsuo Ishii

Дата:

19 января, 11:46:03

>> +/* send cancel requests to all connections */
>> +static void
>> +cancel_all()
>> +{
>> +    for (int i = 0; i < nclients; i++)
>> +    {
>> +        char errbuf[1];
>> +        if (client_states[i].cancel != NULL)
>> +            (void) PQcancel(client_states[i].cancel, errbuf, sizeof(errbuf));
>> +    }
>> +}
>> +
>> 
>> Why in case of errors from PQCancel the error message is neglected? I
>> think it's better to print out the error message in case of error.
> 
> Is the message useful for pgbench users? I saw the error is ignored
> in pg_dump, for example in bin/pg_dump/parallel.c

I think the situation is different from pg_dump.  Unlike pg_dump, if
PQcancel does not work, users can fix the problem by using
pg_terminate_backend or kill command. In order to make this work, an
appropriate error message is essential.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: pgbnech: allow to cancel queries during benchmark

От

Yugo NAGATA

Дата:

24 января, 16:17:44

On Fri, 19 Jan 2024 17:46:03 +0900 (JST)
Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

> >> +/* send cancel requests to all connections */
> >> +static void
> >> +cancel_all()
> >> +{
> >> +    for (int i = 0; i < nclients; i++)
> >> +    {
> >> +        char errbuf[1];
> >> +        if (client_states[i].cancel != NULL)
> >> +            (void) PQcancel(client_states[i].cancel, errbuf, sizeof(errbuf));
> >> +    }
> >> +}
> >> +
> >> 
> >> Why in case of errors from PQCancel the error message is neglected? I
> >> think it's better to print out the error message in case of error.
> > 
> > Is the message useful for pgbench users? I saw the error is ignored
> > in pg_dump, for example in bin/pg_dump/parallel.c
> 
> I think the situation is different from pg_dump.  Unlike pg_dump, if
> PQcancel does not work, users can fix the problem by using
> pg_terminate_backend or kill command. In order to make this work, an
> appropriate error message is essential.

Makes sense. I fixed to emit an error message when PQcancel fails.

Also, I added some comments about the signal handling on Windows
to explain why the different way than non-Windows is required;

+    * On Windows, a callback function is set in which query cancel requests
+    * are sent to all benchmark queries running in the backend. This is
+    * required because all threads running queries continue to run without
+    * interrupted even when the signal is received.
+    *

Attached is the updated patch, v6.

> Best reagards,
> --
> Tatsuo Ishii
> SRA OSS LLC
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp
> 
> 


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вложения

v6-0001-Allow-pgbnech-to-cancel-queries-during-benchmark.patch

Re: pgbnech: allow to cancel queries during benchmark

От

Yugo NAGATA

Дата:

07 февраля, 04:19:03

On Wed, 24 Jan 2024 22:17:44 +0900
Yugo NAGATA <nagata@sraoss.co.jp> wrote:

> Attached is the updated patch, v6.

Currently, on non-Windows, SIGINT is received only by thread #0. 
CancelRequested is checked during the loop in the thread, and
queries are cancelled if it is set. However, once thread #0 exits
the loop due to some runtime error and starts waiting in pthread_join,
there is no opportunity to cancel queries run by other threads. 

In addition, if -C option is specified, connections are created for
each transaction, so cancel objects (PGcancel) also have to be
recreated at each time in each thread. However, these cancel objects
are used  in a specific thread to perform cancel for all queries,
which is not safe because a thread refers to objects updated by other
threads.

I think the first problem would be addressed by any of followings.

(1a) Perform cancels in the signal handler. The signal handler will be
called even while the thread 0 is blocked in pthread_join. This is safe
because PQcancel is callable from a signal handler.

(1b) Create an additional dedicated thread  that calls sigwait on SIGINT
and performs query cancel. As far as I studied, creating such dedicated
thread calling sigwait is a typical way to handle signal in multi-threaded
programming.

(1c) Each thread performs cancel for queries run by each own, rather than
that thread 0 cancels all queries. For the purpose, pthread_kill might be
used to interrupt threads blocked in wait_on_socket_set. 

The second one would be addressed by any of followings. 

(2a) Use critical section when accessing PGcancel( e.g by using
pthread_mutex (non-Windows) or EnterCriticalSection (Windows)). On
non-Windows, we cannot use this way when calling PQcancel in a signal
handler ((1a) above) because acquiring a lock is not re-entrant.

(2b) Perform query cancel in each thread that has created the connection
(same as (1c) above).

Considering both, possible combination would be either (1b)&(2a) or
(1c)&(2b). I would prefer the former way, because creating the
dedicated　thread handling SIGINT signal and canceling all queries seems
simpler　and safer than calling pthread_kill in the SIGINT signal handler
to　send another signal to other threads. I'll update the patch in
this way soon.

Regards,
Yugo Nagata

> 
> > Best reagards,
> > --
> > Tatsuo Ishii
> > SRA OSS LLC
> > English: http://www.sraoss.co.jp/index_en/
> > Japanese:http://www.sraoss.co.jp
> > 
> > 
> 
> 
> -- 
> Yugo NAGATA <nagata@sraoss.co.jp>

-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Re: pgbnech: allow to cancel queries during benchmark

От

Alvaro Herrera

Дата:

30 марта, 16:35:37

Due to commit 61461a300c1c, this patch needs to be reworked.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"La fuerza no está en los medios físicos
sino que reside en una voluntad indomable" (Gandhi)

Re: pgbnech: allow to cancel queries during benchmark

От

Yugo NAGATA

Дата:

31 марта, 16:25:02

On Sat, 30 Mar 2024 14:35:37 +0100
Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

> Due to commit 61461a300c1c, this patch needs to be reworked.

Thank you for pointing out this.

Although cfbot doesn't report any failure, but PQcancel is now
deprecated and insecure. I'll consider it too while fixing a
problem I found in [1].

[1] https://www.postgresql.org/message-id/20240207101903.b5846c25808f64a91ee2e7a2%40sraoss.co.jp

Regards,
Yugo Nagata

> -- 
> Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
> "La fuerza no está en los medios físicos
> sino que reside en una voluntad indomable" (Gandhi)


-- 
Yugo NAGATA <nagata@sraoss.co.jp>

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: pgbnech: allow to cancel queries during benchmark

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения