Обсуждение: Wait events for delayed checkpoints
Hi, You can't tell if your checkpointer is spending a lot of time waiting around for flags in delayChkptFlags to clear. Trivial patch to add that. I've managed to see it a few times when checkpointing repeatedly with a heavy pgbench workload. I had to stop and think for a moment about whether these events belong under "WaitEventIPC", "waiting for notification from another process" or under "WaitEventTimeout", "waiting for a timeout to expire". I mean, both? It's using sleep-and-poll instead of (say) a CV due to the economics, we want to make the other side as cheap as possible, so we don't care about making the checkpointer take some micro-naps in this case. I feel like the key point here is that it's waiting for another process to do stuff and unblock it.
Вложения
On Wed, Oct 11, 2023 at 9:13 PM Thomas Munro <thomas.munro@gmail.com> wrote: > You can't tell if your checkpointer is spending a lot of time waiting > around for flags in delayChkptFlags to clear. Trivial patch to add > that. I've managed to see it a few times when checkpointing > repeatedly with a heavy pgbench workload. > > I had to stop and think for a moment about whether these events belong > under "WaitEventIPC", "waiting for notification from another process" > or under "WaitEventTimeout", "waiting for a timeout to expire". I > mean, both? It's using sleep-and-poll instead of (say) a CV due to > the economics, we want to make the other side as cheap as possible, so > we don't care about making the checkpointer take some micro-naps in > this case. I feel like the key point here is that it's waiting for > another process to do stuff and unblock it. IPC seems right to me. Yeah, a timeout is being used, but as you say, that's an implementation detail. +1 for the idea, too. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Oct 12, 2023 at 01:32:29PM -0400, Robert Haas wrote:
> IPC seems right to me. Yeah, a timeout is being used, but as you say,
> that's an implementation detail.
>
> +1 for the idea, too.
Agreed that timeout makes little sense in this context, and IPC looks
correct.
+ pgstat_report_wait_start(WAIT_EVENT_CHECKPOINT_DELAY_START);
do
{
pg_usleep(10000L); /* wait for 10 msec */
} while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids,
DELAY_CHKPT_START));
+ pgstat_report_wait_end();
HaveVirtualXIDsDelayingChkpt() does immediately a LWLockAcquire()
which would itself report a wait event for ProcArrayLock, overwriting
this new one, no?
--
Michael
Вложения
On Thu, Oct 12, 2023 at 7:09 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Thu, Oct 12, 2023 at 01:32:29PM -0400, Robert Haas wrote:
> > IPC seems right to me. Yeah, a timeout is being used, but as you say,
> > that's an implementation detail.
> >
> > +1 for the idea, too.
>
> Agreed that timeout makes little sense in this context, and IPC looks
> correct.
>
> + pgstat_report_wait_start(WAIT_EVENT_CHECKPOINT_DELAY_START);
> do
> {
> pg_usleep(10000L); /* wait for 10 msec */
> } while (HaveVirtualXIDsDelayingChkpt(vxids, nvxids,
> DELAY_CHKPT_START));
> + pgstat_report_wait_end();
>
> HaveVirtualXIDsDelayingChkpt() does immediately a LWLockAcquire()
> which would itself report a wait event for ProcArrayLock, overwriting
> this new one, no?
Ah, right: the wait event should be set and cleared around pg_usleep,
not the whole loop.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Fri, Oct 13, 2023 at 2:19 PM Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Oct 12, 2023 at 7:09 PM Michael Paquier <michael@paquier.xyz> wrote: > > HaveVirtualXIDsDelayingChkpt() does immediately a LWLockAcquire() > > which would itself report a wait event for ProcArrayLock, overwriting > > this new one, no? > > Ah, right: the wait event should be set and cleared around pg_usleep, > not the whole loop. Duh. Yeah. Pushed like that. Thanks both.