Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)
Дата
Msg-id CA+hUKGK3iuXde4N1qHY0z+ZBd8+c0AOMk6g-e0cSjSfBiUEkNg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Mon, Jan 30, 2023 at 6:36 PM Andres Freund <andres@anarazel.de> wrote:
> On 2023-01-30 15:22:34 +1300, Thomas Munro wrote:
> > On Mon, Jan 30, 2023 at 6:26 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > > out-of-order hazard
> >
> > I've been trying to understand how that could happen, but my CPU-fu is
> > weak.  Let me try to write an argument for why it can't happen, so
> > that later I can look back at how stupid and naive I was.  We have A
> > B, and if the CPU sees no dependency and decides to execute B A
> > (pipelined), shouldn't an interrupt either wait for the whole
> > schemozzle to commit first (if not in a hurry), or nuke it, handle the
> > IPI and restart, or something?
>
> In a core local view, yes, I think so. But I don't think that's how it can
> work on multi-core, and even more so, multi-socket machines. Imagine how it'd
> influence latency if every interrupt on any CPU would prevent all out-of-order
> execution on any CPU.

Good.  Yeah, I was talking only about a single thread/core.

> > After an hour of reviewing randoma
> > slides from classes on out-of-order execution and reorder buffers and
> > the like, I think the term for making sure that interrupts run with
> > the illusion of in-order execution maintained is called "precise
> > interrupts", and it is expected in all modern architectures, after the
> > early OoO pioneers lost their minds trying to program without it.  I
> > guess generally you want that because it would otherwise run your
> > interrupt handler in a completely uncertain environment, and
> > specifically in this case it would reach our signal handler which
> > reads A's output (waiting) and writes to B's input (is_set), so B IPI
> > A surely shouldn't be allowed?
>
> Userspace signals aren't delivered synchronously during hardware interrupts
> afaik - and I don't think they even possibly could be (after all the process
> possibly isn't scheduled).

Yeah, they're not synchronous and the target might not even be
running.  BUT if a suitable thread is running then AFAICT an IPI is
delivered to that sucker to get it running the handler ASAP, at least
on the three OSes I looked at.  (See breadcrumbs below).

> I think what you're talking about with precise interrupts above is purely
> about the single-core view, and mostly about hardware interrupts for faults
> etc. The CPU will unwind state from speculatively executed code etc on
> interrupt, sure - but I think that's separate from guaranteeing that you can't
> have stale cache contents *due to work by another CPU*.

Yeah.  I get the cache problem, a separate issue that does indeed look
pretty dodgy.  I guess I wrote my email out-of-order: at the end I
speculated that cache coherency probably can't explain this failure at
least in THAT bit of the source, because of that funky extra
self-SetLatch().  I just got spooked by the mention of out-of-order
execution and I wanted to chase it down and straighten out my
understanding.

> I'm not even sure that userspace signals are generally delivered via an
> immediate hardware interrupt, or whether they're processed at the next
> scheduler tick. After all, we know that multiple signals are coalesced, which
> certainly isn't compatible with synchronous execution. But it could be that
> that just happens when the target of a signal is not currently scheduled.

FreeBSD: By default, they are when possible, eg if the process is
currently running a suitable thread.  You can set sysctl
kern.smp.forward_signal_enabled=0 to turn that off, and then it works
more like the way you imagined (checking for pending signals at
various arbitrary times, not sure).  See tdsigwakeup() ->
forward_signal() -> ipi_cpu().

Linux: Well it certainly smells approximately similar.  See
signal_wake_up_state() -> kick_process() -> smp_send_reschedule() ->
smp_cross_call() -> __ipi_send_mask().  The comment for kick_process()
explains that it's using the scheduler IPI to get signals handled
ASAP.

Darwin: ... -> cpu_signal() -> something that talks about IPIs

Coalescing is happening not only at the pending signal level (an
invention of the OS), and then for the inter-processor wakeups there
is also interrupt coalescing.  It's latches all the way down.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bharath Rupireddy
Дата:
Сообщение: Re: Syncrep and improving latency due to WAL throttling
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Time delayed LR (WAS Re: logical replication restrictions)