Re: parallel mode and parallel contexts

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: parallel mode and parallel contexts
Дата
Msg-id CA+TgmoZCsirzt5Hg=18sbA=joMXS=VaAF0h11=TXSv+z5K=wtg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: parallel mode and parallel contexts  (Noah Misch <noah@leadboat.com>)
Ответы Re: parallel mode and parallel contexts  (Noah Misch <noah@leadboat.com>)
Список pgsql-hackers
On Sat, May 2, 2015 at 12:35 PM, Noah Misch <noah@leadboat.com> wrote:
> Perhaps, rather than model it as master M waiting on worker list W1|W2|W3,
> model it with queue-nonempty and queue-nonfull events, one pair per queue.

Each individual queue has only a single reader and a single writer.
In your example, there would be three queues, not just one.  Of
course, one could design a new shared memory data structure
representing a collection of related queues, but I still don't see
exactly how we get around the problem of that requiring
O(MaxBackends^2) storage space.

> M
> subscribes to queue0-nonempty, and each of W1,W2,W3 publish to it.  M
> publishes to queue0-nonfull, and the workers subscribe.  An edge forms in the
> deadlock detector's graph when all of an event's subscribers wait on it.  (One
> can approximate that in userspace with a pair of advisory locks.)

An edge between which two processes?

>> After thinking about it a bit more, I realized that even if we settle
>> on some solution to that problem, there's another issues: the
>> wait-edges created by this system don't really have the same semantics
>> as regular lock waits.  Suppose A is waiting on a lock held by B and B
>> is waiting for a lock held by A; that's a deadlock.  But if A is
>> waiting for B to write to a tuple queue and B is waiting for A to read
>> from a tuple queue, that's not a deadlock if the queues in question
>> are the same.
>
> I do see a deadlock.  B wants to block until A reads from the queue, so it
> advertises that and sleeps.  Waking up deadlock_timeout later, it notices that
> A is waiting for B to write something.  B will not spontaneously suspend
> waiting to write to the queue, nor will A suspend waiting to read from the
> queue.  Thus, the deadlock is valid.  This assumes the deadlock detector
> reasons from an authoritative picture of pending waits and that we reliably
> wake up a process when the condition it sought has arrived.  We have that for
> heavyweight lock acquisitions.  The assumption may be incompatible with
> Andres's hope, quoted above, of avoiding the full lock acquisition procedure.

I don't understand.  What's typically going to happen here is that the
queue will initially be empty, and A will block.  Suppose B has a
large message to write to the queue, let's say 100x the queue size.
It will write the first 1% of the message into the queue, set A's
latch, and go to sleep.  A will now wake up, drain the queue, set B's
latch, and go to sleep.  B will then wake up, write the next chunk of
the message, set A's latch again, and go to sleep.  They'll go back
and forth like this until the entire message has been transmitted.
There's no deadlock here: everything is working as designed.  But
there may be instants where both A and B are waiting, because (for
example) A may set B's latch and finish going to sleep before B gets
around to waking up.

That's probably something that we could patch around, but I think it's
missing the larger point.  When you're dealing with locks, there is
one operation that causes blocking (acquiring a lock) and another
operation that unblocks other processes (releasing a lock).  With
message queues, there are still two operations (reading and writing),
but either of them can either cause you to block yourself, or to
unblock another process.  To put that another way, locks enforce
mutual exclusion; message queues force mutual *inclusion*.  Locks
cause blocking when two processes try to operate on the same object at
the same time; message queues cause blocking *unless* two processes
operate on the same object at the same time.  That difference seems to
me to be quite fundamental.

>> So then
>> you have to wonder why we're not solving problem #1, because the
>> deadlock was just as certain before we generated the maximum possible
>> number of tuples locally as it was afterwards.
>
> The "1." above reads like a benefit, not a problem.  What might you solve
> about it?

Sorry, that wasn't very clear.  Normally, it is a benefit, but in some
cases, it could result in a long delay in reporting an inevitable
deadlock.

What I mean is: suppose we construct a working mechanism where the
deadlock detector knows about tuple queue waits.  Suppose further that
we have a parallel leader and two workers cooperating to do a task
that takes 300 s and can be parallelized with perfect efficiency so
that, working together, those three processes can get the job done in
just 100 s.  If the leader tries to take a lock held in a conflicting
mode by one of the workers, the worker will fill up the tuple queue -
probably quite quickly - and wait.  We now detect a deadlock.  Our
detection is timely, and life is good.  On the other hand, suppose
that one of the *workers* tries to take a lock held in a conflicting
mode by the other worker or by the leader.  There is no immediate
deadlock, because the leader is not waiting.  Instead, we'll finish
the computation with the remaining worker and the leader, which will
take 150 seconds since we only have two processes instead of three,
and *then* the leader waits.  I predict that users will be unhappy
about doing the whole computation - and only then detecting, well
after the time the query would normally have finished, a deadlock that
was inevitable from the beginning.

That problem is actually not too hard to avoid if you're OK with
extremely frequent lock manager manipulations.  It's not a deadlock if
a worker is waiting for a lock (say, a relation extension lock) held
by the leader, because the leader might release that lock quickly.
But if the leader gets to the top of the funnel node's main loop and
one of its workers is still waiting for the lock at that point, that
is almost certainly a deadlock.  Locks might get taken and released
during a single pass through that loop, but any lock held across
iterations is presumably going to be held until end-of-transaction, so
we're hosed.  But checking for deadlock at the top of every main loop
iteration is far too expensive to contemplate.  Even doing some much
lighter-weight manipulation of the lock manager data structures at the
top of every loop iteration is probably going to recreate the same
kind of lock manager contention problems that I tried to solve with
the fast-path locking stuff in 9.2.

To put all of my cards on the table, I've never really believed in
this approach.  The reason we have a deadlock detector in the first
place is because we can't prevent users from making fundamentally
incompatible locking requests, like locking two tables in incompatible
orderings in two different sessions.  But we *don't* have deadlock
detection for lwlocks because we (rightly) view it as our job to write
the code in such a way that deadlocks don't happen in the first place.
So we take locks in a predictable order, even in cases (like updating
a tuple) where that involves some fancy gymnastics.  We could have
update always lock the old-tuple buffer before the new-tuple buffer
and make it the deadlock detector's job to sort that out, but that's
pretty obviously inferior.  We can expect users to tolerate a deadlock
when it is their own actions that have precipitated the deadlock, but
it's a whole different thing to ask them to accept that what from the
user's point of view is an indivisible action (like an UPDATE, or a
parallel query, or perhaps an UPSERT) occasionally deadlocks for some
reason that they can neither understand nor prevent.  Tuple queues are
an implementation detail.  They shouldn't deadlock for the same reason
that lwlock acquisition shouldn't deadlock, and this whole project
should be entirely unnecessary.

The way we backed into worrying about exposing tuple queues to the
deadlock detector is that, if you let one backend in a parallel group
get and keep a lock on some resource and then another backend in the
same group tries to lock that resource and can't get the lock, you
will eventually an undetected deadlock.  Andres's view is that we
should fix the deadlock detector to detect and report such cases, but
I think that's wrong.  If a process in a parallel group can take and
retain until end of transaction a lock on some resource without which
other processes in the same parallel group cannot do useful work,
that's a BUG.  We shouldn't need the deadlock detector to report that
problem for the same reason we shouldn't need the deadlock detector to
report lwlock-based deadlocks: if they are ever happening, you need to
fix your code, not the deadlock detector.

I think where this project went off the rails is when we made the
decision as to how to fix the problems in the original group locking
approach.  In the original concept, locks between processes in the
same parallel group just didn't ever conflict; two
otherwise-conflicting locks held by different backends within the same
group were regarded as those processes sharing that lock.  Andres and
possibly others pointed out that for stuff like relation locks, that's
probably not the right behavior.  I agree with that.  It was also
suggested that this would be less scary if we limited ourselves to
sharing the locks held at the beginning of parallelism.  That had the
further advantage of making things like relation extension locks,
which won't be held at the starting of paralellism, unshared, while
relation locks would, in most cases, be shared.  That felt pretty
good, so I did it, but I now think that was a mistake, because it
creates edge cases where parallel groups can self-deadlock.  If we
instead regard relation locks between parallel group members as NEVER
conflicting, rather than conflicting only when both locks were
acquired after the start of parallelism, those edge cases go away.
The only downside is that if a worker A manages to do something to a
relation R that makes it unsafe for worker B to access, and worker B
then gets the lock anyway, we get no-holds-barred insanity instead of
a deadlock.  But I am unconvinced that downside amounts to much,
because I believe the only way that A can make it unsafe for B to
access the relation is by doing something that CheckTableNotInUse()
will already prohibit in parallel mode.  If it turns out there is some
oversight there, I don't think it'll be hard to plug.  In contrast,
exposing this tuple queue wait information to the deadlock detector is
looking quite complex.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Fixing busted citext function declarations
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Fixing busted citext function declarations