Обсуждение: Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

Поиск
Список
Период
Сортировка

Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

От
Robert Haas
Дата:
On Sat, Feb 13, 2016 at 1:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <rhaas@postgresql.org> writes:
>> Introduce group locking to prevent parallel processes from deadlocking.
>
> I'm fairly unhappy about this patch, because it has introduced a large
> amount of new complexity into the lock manager with effectively no
> documentation.  Yeah, there's several paragraphs added to lmgr/README,
> but they're just a handwavy apologia for the high-level decision to allow
> exclusive locks taken by parallel workers to not conflict.  I see nothing
> whatever explaining how it works, that is any useful docs about the new
> data structures or invariants.  For example, I can't figure out whether
> you have broken the pg_locks display (by causing relevant lock state to
> not get displayed, or get displayed in a useless fashion because one
> can no longer tell which entries block which other entries).  I can't
> even tell for sure if it works at all: the README says only that locks
> taken by members of the same parallel group don't conflict, but the
> implementation looks more like some operations get applied to the group
> leader rather than to followers, which is something completely different.

OK.  I don't know exactly what new documentation needs to be written,
but I'm happy to try to work to improve it.  Obviously, it was clear
enough to me, but that's not saying a lot.  Let me try to answer some
of your questions here via email, and then we can decide what of that
should go into comments, the lmgr README, or someplace else.

First, the overall concept here is that processes can either be a
member of a lock group or a member of no lock group.  The concept of a
lock group is formally separate from the concept of a parallel group
created by a ParallelContext, but it is not clear that there will ever
be any other context in which a lock group will be a good idea.  It is
not impossible to imagine: for example, suppose you had a group of
backends that all had TCP client connections, and those processes all
wanted to ingest data and stuff it in a table but without allowing any
unrelated process to touch the table, say because it was going to be
inconsistent during the operation and until indexes were afterwards
rebuilt.  I don't have any plans to implement anything like that but I
felt it was better to keep the concept of a lock group - which is a
group of processes that cooperate so closely that their locks need not
conflict - from the concept of a parallel context - which is a leader
process that is most likely connected to a user plus a bunch of
ephemeral background workers that aren't.  That way, if somebody later
wants to try to reuse the lock grouping stuff for something else,
nothing will get in the way of that; if not, no harm done, but keeping
the two things decoupled is at least easier to understand, IMHO.

Second, I can review the data structure changes and the associated
invariants.  Each PGPROC has four new members:
lockGroupLeaderIdentifier, lockGroupLeader, lockGroupMembers, and
lockGroupLink.  The first is simply a safety mechanism.  A newly
started parallel worker has to try to join the leader's lock group,
but it has no guarantee that the group leader is still alive by the
time it gets started.  I tried to ensure that the parallel leader dies
after all workers in normal cases, but also that the system could
survive relatively intact if that somehow fails to happen.  This is
one of the precautions against such a scenario: the leader relays its
PGPROC and also its PID to the worker, and the worker fails to join
the lock group unless the given PGPROC still has the same PID.  I
assume that PIDs are not recycled quickly enough for this interlock to
fail.

lockGroupLeader is NULL for processes not involved in parallel query.
When a process wants to cooperate with parallel workers, it becomes a
lock group leader, which means setting this field back to point to its
own PGPROC.  When a parallel worker starts up, it points this field at
the leader, with the above-mentioned interlock.  The lockGroupMembers
field is only used in the leader; it is a list of the workers.  The
lockGroupLink field is used to link the leader and all workers into
the leader's list.  All of these fields are protected by the lock
manager locks; the lock manager lock that protects the fields in a
given PGPROC is chosen by taking pgprocno modulo the number of lock
manager partitions.  This unusual arrangement has a major advantage:
the deadlock detector can count on the fact that no lockGroupLeader
field can change while the deadlock detector is running, because it
knows that it holds all the lock manager locks.  Earlier versions of
the patch protected these fields with the PGPROC's backendLock, but it
was very hard to be confident that this wouldn't lead to deadlock
where one process acquires a lock manager lock and then a backendLock
and another process does the reverse.  Even if that didn't happen, it
was inconvenient not to be able to access the fields without
additional locking.

Each PROCLOCK now has a new groupLeader flag.  While a PGPROC's
lockGroupLeader may be NULL if that process is not involved in group
locking, a PROCLOCK's groupLeader always has a non-NULL value; it
points back to the PGPROC that acquired the lock.  There's no horribly
principled reason for this different contract; it just turned out to
be more convenient.  PGPROCs need to distinguish whether they are
involved in group locking or not, and using NULL as a sentinel value
for lockGroupLeader is a convenient way to do that; but it turns out
we don't really need to know that for a PROCLOCK, so sending
groupLeader equal to myProc when there's no group locking involved is
more convenient.

Within the deadlock detector, an EDGE now represents a constraint
between two lock groups rather than between two processes; therefore,
the waiter and blocker are the respective group leaders rather than
the processes actually waiting.  A LOCK * is added to the EDGE data
structure for this reason -- different processes in the lock group
might be waiting for different locks, and we need to be sure about
which lock we're constraining.   It's possible for there to be
multiple processes on the same wait queue for the same LOCK; in fact,
this is probably the most common scenario.  Imagine that the leader
acquires a lock, probably AccessShareLock, but then before the workers
start up, someone queues up for an AccessExclusiveLock.  The workers
will all be soft-blocked by that other process, but must go ahead of
it; the leader won't release it's lock until some time after the query
finishes, and that won't happen until the workers finish.  TopoSort()
handles this by emitting all processes in the same lock group adjacent
to each other in the output wait ordering.  As the comment explains:
       /*        * Output everything in the lock group.  There's no point in outputing        * an ordering where
membersof the same lock group are not        * consecutive on the wait queue: if some other waiter is between two
* requests that belong to the same group, then either it conflicts        * with both of them and is certainly not a
solution;or it conflicts        * with at most one of them and is thus isomorphic to an ordering        * where the
groupmembers are consecutive.        */
 

There are a number of other new comments in TopoSort() which I believe
to be helpful for understanding how it goes about its mission.

With respect to pg_locks - and for that matter also pg_stat_activity -
I think you are right that improvement is needed.  I really haven't
really done anything about those during the whole course of developing
parallel query, so the information that they show for parallel workers
is precisely the same information that they would show for any other
background worker.   Now, background workers do show up in both
places, but it's not straightforward to identify which background
workers go with which parallel leader or whether group locking is in
use.  The simplest thing we could do to make that easier is, in
pg_stat_activity, have parallel workers advertise the PID that
launched them in a new field; and in pg_locks, have members of a lock
group advertise the leader's PID in a new field.  I think that would
make it pretty simple to figure out what you want to know.  Although
such an improvement is probably necessary, I have my doubts about
whether it's entirely sufficient - I feel like there might be other
improvements we should also make, but I think somebody (or somebodys)
who is (or are) at arm's length from the project needs to try it out
and provide some input on what would make it more usable.  I don't
think that it's probably technically hard to implement any good design
we might agree on - but figuring out what a good design would look
like may not be so easy.

By the way, I'm pretty sure that you being stupid is not the problem
here.  At the same time, I did make an effort to document the things
that seemed to me to be the key points.  I think I've missed some
things and that should be fixed, but that doesn't mean that I didn't
make an effort.  It just means that code is always easier to
understand when you wrote it yourself, which for you is true of the
old code in this file and false of the new code.  I think that the
preexisting code here is pretty hard to understand as things stand -
it took me over a year of study to figure out exactly what to change
and how.  The README is basically just a cut-and-paste of an email you
posted to pgsql-hackers, and there is a heck of a lot of complexity
there that's not really explained anywhere; you just have to reverse
engineer it from what the code does.  Moreover, this code has been
around for ten years with zero test coverage.  If we get some
buildfarm members set up with force_parallel_mode=regress and
max_parallel_degree>0, the new code will have more test coverage as of
that instant than the existing code has ever had.  It's wholly unclear
how some of the code in deadlock.c has been tested by anyone,
anywhere, ever.  I wouldn't be surprised if the savedList = false case
in TestConfiguration() has been hit zero times outside of somebody
manually modifying the source tree to hit it.  If that's ever
happened, it was probably only on your machine during the development
of this code.  My point is that I'm completely willing to admit that I
have not done everything perfectly here, but I also don't want to
overstate the level of perfection coming in.  This code is
battle-tested and it will be really bad if I've broken that and we
don't find it before release, but the existing state of affairs did
not make avoiding breakage as straightforward as it might have been.

I hope this helps and please let me know what more you think I should do.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

От
Craig Ringer
Дата:
On 14 February 2016 at 08:05, Robert Haas <robertmhaas@gmail.com> wrote:
 
First, the overall concept here is that processes can either be a
member of a lock group or a member of no lock group.  The concept of a
lock group is formally separate from the concept of a parallel group
created by a ParallelContext, but it is not clear that there will ever
be any other context in which a lock group will be a good idea.  It is
not impossible to imagine: for example, suppose you had a group of
backends that all had TCP client connections, and those processes all
wanted to ingest data and stuff it in a table but without allowing any
unrelated process to touch the table, say because it was going to be
inconsistent during the operation and until indexes were afterwards
rebuilt.

The case that comes to mind for me is in logical decoding, for decoding prepared xacts. Being able to make the prepared xact a member of a "lock group" along with the decoding session's xact may provide a solution to the locking-related challenges there.

I haven't looked closely at what's involved in the decoding prepared xact locking issues yet, just an idea.

To do this it'd have to be possible to add an existing session/xact to a lock group (or make it the leader of a new lock group then join that group). Do you think that's practical with your design?
 
I don't have any plans to implement anything like that but I
felt it was better to keep the concept of a lock group - which is a
group of processes that cooperate so closely that their locks need not
conflict - from the concept of a parallel context - which is a leader
process that is most likely connected to a user plus a bunch of
ephemeral background workers that aren't.  That way, if somebody later
wants to try to reuse the lock grouping stuff for something else,
nothing will get in the way of that; if not, no harm done, but keeping
the two things decoupled is at least easier to understand, IMHO.

Yeah, strong +1

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

От
Stephen Frost
Дата:
* Craig Ringer (craig@2ndquadrant.com) wrote:
> On 14 February 2016 at 08:05, Robert Haas <robertmhaas@gmail.com> wrote:
> > First, the overall concept here is that processes can either be a
> > member of a lock group or a member of no lock group.  The concept of a
> > lock group is formally separate from the concept of a parallel group
> > created by a ParallelContext, but it is not clear that there will ever
> > be any other context in which a lock group will be a good idea.  It is
> > not impossible to imagine: for example, suppose you had a group of
> > backends that all had TCP client connections, and those processes all
> > wanted to ingest data and stuff it in a table but without allowing any
> > unrelated process to touch the table, say because it was going to be
> > inconsistent during the operation and until indexes were afterwards
> > rebuilt.
>
> The case that comes to mind for me is in logical decoding, for decoding
> prepared xacts. Being able to make the prepared xact a member of a "lock
> group" along with the decoding session's xact may provide a solution to the
> locking-related challenges there.

I was thinking this would be the way to address the current issues with
parallel pg_dump and having a shared snpashot.

> I haven't looked closely at what's involved in the decoding prepared xact
> locking issues yet, just an idea.

Yeah, don't have much more than it being an idea to use this for the
pg_dump case.  I do know that's a case which has been brought up a
couple of times before.

> To do this it'd have to be possible to add an existing session/xact to a
> lock group (or make it the leader of a new lock group then join that
> group). Do you think that's practical with your design?

Seems like the same question applies to the pg_dump case.

Thanks!

Stephen

Re: Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

От
Robert Haas
Дата:
On Sat, Feb 13, 2016 at 9:20 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> The case that comes to mind for me is in logical decoding, for decoding
> prepared xacts. Being able to make the prepared xact a member of a "lock
> group" along with the decoding session's xact may provide a solution to the
> locking-related challenges there.
>
> I haven't looked closely at what's involved in the decoding prepared xact
> locking issues yet, just an idea.
>
> To do this it'd have to be possible to add an existing session/xact to a
> lock group (or make it the leader of a new lock group then join that group).
> Do you think that's practical with your design?

I doubt it.  It's only safe to join a locking group if you don't yet
hold any heavyweight locks.  I'm not going to say it'd be impossible
to lift that restriction, but it'd be pretty complex, because doing so
could either create or remove deadlocks that didn't exist before.  For
example, suppose A wanting AccessExclusiveLock waits for B wanting
AccessExclusvieLock waits for C holding AccessShareLock.  Then, C
joins A's lock group.  If A's lock request can't be granted
immediately - say D also holds AccessShareLock on the object - this is
a deadlock.   Moreover, C can't detect the deadlock in the normal
course of things because C is not waiting.  Sorting this out does not
sound simple.

It could possibly work if the decoding transaction holds no locks at
all, joins the prepared xact's locking group, does stuff, and then,
when it again reaches a point where it holds no locks, leaves the lock
group.  I wonder, though, what happens if you deadlock.  The decoding
transaction get killed, but you can't kill the prepared transaction,
so any locks it held would be retained.  Maybe that's OK, but I have a
sneaking suspicion there might be situations where we kill the
decoding transaction without resolving the deadlock.  Sharing locks
with a prepared transaction is not really what this was designed for.

I don't really understand what problem you are trying to solve here,
but I suspect there is a better solution than group locking.  The
thing is, in the normal course of events, heavyweight locking prevents
a lot of bad stuff from happening.  When you become a member of a lock
group, you're on your own recognizance to prevent that bad stuff.  The
parallel code does that (or hopefully does that, anyway) by imposing
severe restrictions on what you can do while in parallel mode; those
restrictions include "no writes whatsoever" and "no DDL".  If you
wanted to allow either of those things, you would need to think very,
very carefully about that, and then if you decided that it was going
to be safe, you'd need to think carefully about it a second time.

As I mentioned to Simon on another thread a while back, Thomas Munro
is working on a hash table that uses dynamic shared memory, and as
part of that work, he is taking the allocator work that I did a year
or two ago and turning that into a full-fledged allocator for dynamic
shared memory.  Once we have a real allocator and a real hash table
for DSM, I believe we'll be able to solve some of the problems that
currently require that parallel query - and probably anything that
uses group locking - be strictly read-only.  For example, we can use
that infrastructure to store a combo CID hash table that can grow
arbitrarily in a data structure that all cooperating processes can
share.  Unfortunately, it does not look like that work will be ready
in time to be included in PostgreSQL 9.6.  I think he will be in a
position to submit it in time for PostgreSQL 9.7, though.

>> I don't have any plans to implement anything like that but I
>> felt it was better to keep the concept of a lock group - which is a
>> group of processes that cooperate so closely that their locks need not
>> conflict - from the concept of a parallel context - which is a leader
>> process that is most likely connected to a user plus a bunch of
>> ephemeral background workers that aren't.  That way, if somebody later
>> wants to try to reuse the lock grouping stuff for something else,
>> nothing will get in the way of that; if not, no harm done, but keeping
>> the two things decoupled is at least easier to understand, IMHO.
>
> Yeah, strong +1

Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> First, the overall concept here is that processes can either be a
> member of a lock group or a member of no lock group.

Check.

> Second, I can review the data structure changes and the associated
> invariants.

The data structure additions seemed relatively straightforward, though
I did wonder why you added groupLeader to PROCLOCK.  Why is that not
redundant with tag.myProc->lockGroupLeader?  If it isn't redundant,
it's inadequately documented.  If it is, seems better to lose it.

Also, isn't the comment on this PGPROC field incorrect:
   PGPROC     *lockGroupLeader;    /* lock group leader, if I'm a follower */

ie shouldn't you s/follower/member/ ?

> ... the lock manager lock that protects the fields in a
> given PGPROC is chosen by taking pgprocno modulo the number of lock
> manager partitions.

pgprocno of the specific PGPROC, or of the group leader?  If it's
the former, I'm pretty suspicious that there are deadlock and/or
linked-list-corruption hazards here.  If it's the latter, at least
the comments around this are misleading.

> Each PROCLOCK now has a new groupLeader flag.  While a PGPROC's
> lockGroupLeader may be NULL if that process is not involved in group
> locking, a PROCLOCK's groupLeader always has a non-NULL value; it
> points back to the PGPROC that acquired the lock.

This is not what the comment on it says; and your prose explanation
here implies that it should actually be equal to tag.myProc, or else
you are using some definition of "acquired the lock" that I don't
follow at all.  There could be lots of PGPROCs that have acquired
a lock in one sense or another; which one do you mean?

> With respect to pg_locks - and for that matter also pg_stat_activity -
> I think you are right that improvement is needed.

This is really the core of my concern at the moment.  I think that
isolationtester is probably outright broken for any situation where the
queries-under-test are being parallel executed, and so will be any other
client that's trying to identify who blocks whom from pg_locks.

> The simplest thing we could do to make that easier is, in
> pg_stat_activity, have parallel workers advertise the PID that
> launched them in a new field; and in pg_locks, have members of a lock
> group advertise the leader's PID in a new field.

That would be simple for us, but it would break every existing client-side
query that tries to identify blockers/blockees; and not only would those
queries need work but they would become very substantially more complex
and slower (probably at least 4-way joins not 2-way).  We already know
that isolationtester's query has performance problems in the buildfarm.
I think more thought is needed here, and that this area should be
considered a release blocker.  It's certainly a blocker for any thought
of turning parallel query on by default.

> I hope this helps and please let me know what more you think I should do.

At the least you need to make another pass over the comments and README
addition.  I highlighted above a few critical inaccuracies, and it's not
hard to find a lot of less-critical typos in the comments in that commit.
Transposing some of what you've written here into the README would likely
be a good thing too.
        regards, tom lane



Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

От
Robert Haas
Дата:
On Sun, Feb 14, 2016 at 1:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> First, the overall concept here is that processes can either be a
>> member of a lock group or a member of no lock group.
>
> Check.
>
>> Second, I can review the data structure changes and the associated
>> invariants.
>
> The data structure additions seemed relatively straightforward, though
> I did wonder why you added groupLeader to PROCLOCK.  Why is that not
> redundant with tag.myProc->lockGroupLeader?  If it isn't redundant,
> it's inadequately documented.  If it is, seems better to lose it.

That's a good question.  There are locking considerations: the lock
manager lock for a given PROCLOCK isn't necessarily the same one that
protects tag.myProc->lockGroupLeader.  I can go and look back through
the contexts where we use the PROCLOCK field and see if we can get by
without this, but I suspect it's too much of a pain.

> Also, isn't the comment on this PGPROC field incorrect:
>
>     PGPROC     *lockGroupLeader;    /* lock group leader, if I'm a follower */
>
> ie shouldn't you s/follower/member/ ?

Yes, that would be better.

>> ... the lock manager lock that protects the fields in a
>> given PGPROC is chosen by taking pgprocno modulo the number of lock
>> manager partitions.
>
> pgprocno of the specific PGPROC, or of the group leader?  If it's
> the former, I'm pretty suspicious that there are deadlock and/or
> linked-list-corruption hazards here.  If it's the latter, at least
> the comments around this are misleading.

Leader.  The other way would be nuts.

>> Each PROCLOCK now has a new groupLeader flag.  While a PGPROC's
>> lockGroupLeader may be NULL if that process is not involved in group
>> locking, a PROCLOCK's groupLeader always has a non-NULL value; it
>> points back to the PGPROC that acquired the lock.
>
> This is not what the comment on it says; and your prose explanation
> here implies that it should actually be equal to tag.myProc, or else
> you are using some definition of "acquired the lock" that I don't
> follow at all.  There could be lots of PGPROCs that have acquired
> a lock in one sense or another; which one do you mean?

The PROCLOCK's groupLeader will be equal to
tag.myProc->lockGroupLeader unless that is NULL.  In that case it will
be equal to tag.myProc.  Or at least that's the intention, modulo
bugs.

>> With respect to pg_locks - and for that matter also pg_stat_activity -
>> I think you are right that improvement is needed.
>
> This is really the core of my concern at the moment.  I think that
> isolationtester is probably outright broken for any situation where the
> queries-under-test are being parallel executed, and so will be any other
> client that's trying to identify who blocks whom from pg_locks.

OK.

>> The simplest thing we could do to make that easier is, in
>> pg_stat_activity, have parallel workers advertise the PID that
>> launched them in a new field; and in pg_locks, have members of a lock
>> group advertise the leader's PID in a new field.
>
> That would be simple for us, but it would break every existing client-side
> query that tries to identify blockers/blockees; and not only would those
> queries need work but they would become very substantially more complex
> and slower (probably at least 4-way joins not 2-way).  We already know
> that isolationtester's query has performance problems in the buildfarm.
> I think more thought is needed here, and that this area should be
> considered a release blocker.  It's certainly a blocker for any thought
> of turning parallel query on by default.

I don't quite see why this would cause the joins to get more complex.
It's also not really clear what the alternative is.  You could show
all the locks under the leader's PID, but that's horribly confusing
because there might be duplicates, and because if somebody's waiting
you really want to be able to tell which process to target with gdb.
You can show everybody under their own PIDs as now, but then you can't
tell which processes are in groups together.  So I don't see what
there is except to show both values.  Maybe there's some whole-new way
of displaying this information that would be more clear but I don't
really see what it is.

>> I hope this helps and please let me know what more you think I should do.
>
> At the least you need to make another pass over the comments and README
> addition.  I highlighted above a few critical inaccuracies, and it's not
> hard to find a lot of less-critical typos in the comments in that commit.
> Transposing some of what you've written here into the README would likely
> be a good thing too.

I'm getting on a long plane flight shortly but go back over this when
I can find some time to do that in an appropriately careful way.
Please feel free to make such corrections to comments and so forth as
may seem urgent to you in the meanwhile.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, Feb 14, 2016 at 1:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> ... the lock manager lock that protects the fields in a
>>> given PGPROC is chosen by taking pgprocno modulo the number of lock
>>> manager partitions.

>> pgprocno of the specific PGPROC, or of the group leader?  If it's
>> the former, I'm pretty suspicious that there are deadlock and/or
>> linked-list-corruption hazards here.  If it's the latter, at least
>> the comments around this are misleading.

> Leader.  The other way would be nuts.

On closer inspection, that's actually still somewhat problematic,
because that essentially means that no one can inspect another backend's
lockGroupLeader etc fields unless they take *all* the lock partition
LWLocks first.  You can't take just the relevant one because you don't
know which one that is, at least not without dereferencing lockGroupLeader
which is entirely unsafe if you haven't got the appropriate lock.

This isn't a problem for members of the lock group, since they already
know who the leader is and hence which partition lock to use.  And it's
not a problem for the deadlock detector either since it'll take all those
partition locks anyway.  But it makes life difficult for status inspection
code.  I'm finding that the only way to write a lock-group-aware function
that tells whether A is blocked by B is to hold both the ProcArrayLock and
*all* of the lock partition locks throughout inspection of the data
structures.  I'd hoped to be able to restrict it to locking just the
partition holding the lock A is blocked on, but that ain't working.

Maybe this is all okay, but it makes me nervous that the data structures
may have been over-optimized.
        regards, tom lane



Re: Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, Feb 14, 2016 at 1:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> pgprocno of the specific PGPROC, or of the group leader?  If it's
>> the former, I'm pretty suspicious that there are deadlock and/or
>> linked-list-corruption hazards here.  If it's the latter, at least
>> the comments around this are misleading.

> Leader.  The other way would be nuts.

... and btw, either BecomeLockGroupMember() has got this backwards, or
I'm still fundamentally confused.

Also, after fixing that it would be good to add a comment explaining why
it's not fundamentally unsafe for BecomeLockGroupMember() to examine
leader->pgprocno without having any relevant lock.  AFAICS, that's only
okay because the pgprocno fields are never changed after InitProcGlobal,
even when a PGPROC is recycled.
        regards, tom lane



Re: Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

От
Robert Haas
Дата:
On Sun, Feb 14, 2016 at 5:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sun, Feb 14, 2016 at 1:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> pgprocno of the specific PGPROC, or of the group leader?  If it's
>>> the former, I'm pretty suspicious that there are deadlock and/or
>>> linked-list-corruption hazards here.  If it's the latter, at least
>>> the comments around this are misleading.
>
>> Leader.  The other way would be nuts.
>
> ... and btw, either BecomeLockGroupMember() has got this backwards, or
> I'm still fundamentally confused.

Yeah, you're right.  Attached is a draft patch that tries to clean up
that and a bunch of other things that you raised.  It hasn't really
been tested yet, so it might be buggy; I wrote it during my long plane
flight.  Fixes include:

1. It removes the groupLeader flag from PGPROC.  You're right: we don't need it.

2. It fixes this problem with BecomeLockGroupMember().  You're right:
the old way was backwards.

3. It fixes TopoSort() to be less inefficient by checking whether the
EDGE is for the correct lock before doing anything else.  I realized
this while updating comments related to EDGE.

4. It adds some text to the lmgr README.

5. It reflows the existing text in the lmgr README to fit within 80 columns.

6. It adjusts some other comments.

Possibly this should be broken up into multiple patches, but I'm just
sending it along all together for now.

> Also, after fixing that it would be good to add a comment explaining why
> it's not fundamentally unsafe for BecomeLockGroupMember() to examine
> leader->pgprocno without having any relevant lock.  AFAICS, that's only
> okay because the pgprocno fields are never changed after InitProcGlobal,
> even when a PGPROC is recycled.

I am sort of disinclined to think that this needs a comment.  Do we
really have a comment every other place that pgprocno is referenced
without a lock?  Or maybe there are none, but it seems like overkill
to me to comment on that at every site of use.  Better to comment on
the pgprocno definition itself and say "never changes".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

Re: Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, Feb 14, 2016 at 5:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Also, after fixing that it would be good to add a comment explaining why
>> it's not fundamentally unsafe for BecomeLockGroupMember() to examine
>> leader->pgprocno without having any relevant lock.  AFAICS, that's only
>> okay because the pgprocno fields are never changed after InitProcGlobal,
>> even when a PGPROC is recycled.

> I am sort of disinclined to think that this needs a comment.

I might not either, except that the entire point of that piece of code is
to protect against the possibility that the leader PGPROC vanishes before
we can get this lock.  Since you've got extensive comments addressing that
point, it seems appropriate to explain why this one line doesn't invalidate
the whole design ... because it's right on the hairy edge of doing so.
If we did something like memset a PGPROC to all zeroes when we free it,
which in general would seem like a perfectly reasonable thing to do, this
code would be broken (and not in an easy to detect way; it would indeed
be indistinguishable from the way in which it's broken right now).

> Do we
> really have a comment every other place that pgprocno is referenced
> without a lock?

I suspect strongly that there is no other place that attempts to touch any
part of an invalid (freed) PGPROC.  If there is, more than likely it's
unsafe.

I don't have time right now to read the patch in detail, but will look
tomorrow.  In the meantime, thanks for addressing my concerns.
        regards, tom lane



Re: Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

От
Robert Haas
Дата:
On Mon, Feb 15, 2016 at 7:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sun, Feb 14, 2016 at 5:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Also, after fixing that it would be good to add a comment explaining why
>>> it's not fundamentally unsafe for BecomeLockGroupMember() to examine
>>> leader->pgprocno without having any relevant lock.  AFAICS, that's only
>>> okay because the pgprocno fields are never changed after InitProcGlobal,
>>> even when a PGPROC is recycled.
>
>> I am sort of disinclined to think that this needs a comment.
>
> I might not either, except that the entire point of that piece of code is
> to protect against the possibility that the leader PGPROC vanishes before
> we can get this lock.  Since you've got extensive comments addressing that
> point, it seems appropriate to explain why this one line doesn't invalidate
> the whole design ... because it's right on the hairy edge of doing so.
> If we did something like memset a PGPROC to all zeroes when we free it,
> which in general would seem like a perfectly reasonable thing to do, this
> code would be broken (and not in an easy to detect way; it would indeed
> be indistinguishable from the way in which it's broken right now).

OK.  Well, I'm happy to have you add a comment in a patch of your own,
or suggest something to include in mine.  I'm less sure I can write
something independently that you'll like.

>> Do we
>> really have a comment every other place that pgprocno is referenced
>> without a lock?
>
> I suspect strongly that there is no other place that attempts to touch any
> part of an invalid (freed) PGPROC.  If there is, more than likely it's
> unsafe.
>
> I don't have time right now to read the patch in detail, but will look
> tomorrow.  In the meantime, thanks for addressing my concerns.

Sure thing.  I appreciate the review.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> Yeah, you're right.  Attached is a draft patch that tries to clean up
> that and a bunch of other things that you raised.

I've been reviewing this patch, and I had a question: why do we need to
bother with a lockGroupLeaderIdentifier field?  It seems totally redundant
with the leader's pid field, ie, why doesn't BecomeLockGroupMember simply
compare leader->pid with the PID it's passed?  For some more safety, it
could also verify that leader->lockGroupLeader == leader; but I don't
see what the extra field is buying us.
        regards, tom lane



Re: Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> Yeah, you're right.  Attached is a draft patch that tries to clean up
> that and a bunch of other things that you raised.

I reviewed this patch.  I don't have any particular comment on the changes
in deadlock.c; I haven't studied the code closely enough to know if those
are right.  I did think that the commentary elsewhere could be improved
some more, so attached is a delta patch on top of yours that shows what
I suggest.

I've not done anything here about removing lockGroupLeaderIdentifier,
although I will be happy to send a patch for that unless you know of
some reason I missed why it's necessary.

            regards, tom lane

diff --git a/src/backend/storage/lmgr/README b/src/backend/storage/lmgr/README
index 8eaa91c..5a62c8f 100644
*** a/src/backend/storage/lmgr/README
--- b/src/backend/storage/lmgr/README
*************** acquire an AccessShareLock while the oth
*** 603,609 ****
  This might seem dangerous and could be in some cases (more on that below), but
  if we didn't do this then parallel query would be extremely prone to
  self-deadlock.  For example, a parallel query against a relation on which the
! leader had already AccessExclusiveLock would hang, because the workers would
  try to lock the same relation and be blocked by the leader; yet the leader
  can't finish until it receives completion indications from all workers.  An
  undetected deadlock results.  This is far from the only scenario where such a
--- 603,609 ----
  This might seem dangerous and could be in some cases (more on that below), but
  if we didn't do this then parallel query would be extremely prone to
  self-deadlock.  For example, a parallel query against a relation on which the
! leader already had AccessExclusiveLock would hang, because the workers would
  try to lock the same relation and be blocked by the leader; yet the leader
  can't finish until it receives completion indications from all workers.  An
  undetected deadlock results.  This is far from the only scenario where such a
*************** quickly enough for this interlock to fai
*** 664,690 ****

  A PGPROC's lockGroupLeader is NULL for processes not involved in parallel
  query. When a process wants to cooperate with parallel workers, it becomes a
! lock group leader, which means setting this field back to point to its own
  PGPROC. When a parallel worker starts up, it points this field at the leader,
  with the above-mentioned interlock. The lockGroupMembers field is only used in
! the leader; it is a list of the workers. The lockGroupLink field is used to
! link the leader and all workers into the leader's list. All of these fields
! are protected by the lock manager locks; the lock manager lock that protects
! the lockGroupLeaderIdentifier, lockGroupLeader, and lockGroupMembers fields in
! a given PGPROC is chosen by taking pgprocno modulo the number of lock manager
! partitions. This unusual arrangement has a major advantage: the deadlock
! detector can count on the fact that no lockGroupLeader field can change while
! the deadlock detector is running, because it knows that it holds all the lock
! manager locks.  A PGPROC's lockGroupLink is protected by the lock manager
! partition lock for the group of which it is a part.  If it's not part of any
! group, this field is unused and can only be examined or modified by the
! process that owns the PGPROC.

  We institute a further coding rule that a process cannot join or leave a lock
  group while owning any PROCLOCK.  Therefore, given a lock manager lock
! sufficient to examine PROCLOCK *proclock, it also safe to examine
! proclock->tag.myProc->lockGroupLeader (but not the other fields mentioned in
! the previous paragraph).

  User Locks (Advisory Locks)
  ---------------------------
--- 664,689 ----

  A PGPROC's lockGroupLeader is NULL for processes not involved in parallel
  query. When a process wants to cooperate with parallel workers, it becomes a
! lock group leader, which means setting this field to point to its own
  PGPROC. When a parallel worker starts up, it points this field at the leader,
  with the above-mentioned interlock. The lockGroupMembers field is only used in
! the leader; it is a list of the member PGPROCs of the lock group (the leader
! and all workers). The lockGroupLink field is the list link for this list.
!
! All four of these fields are considered to be protected by a lock manager
! partition lock.  The partition lock that protects these fields within a given
! lock group is chosen by taking the leader's pgprocno modulo the number of lock
! manager partitions.  This unusual arrangement has a major advantage: the
! deadlock detector can count on the fact that no lockGroupLeader field can
! change while the deadlock detector is running, because it knows that it holds
! all the lock manager locks.  Also, holding this single lock allows safe
! manipulation of the lockGroupMembers list for the lock group.

  We institute a further coding rule that a process cannot join or leave a lock
  group while owning any PROCLOCK.  Therefore, given a lock manager lock
! sufficient to examine a PROCLOCK *proclock, it is also safe to examine
! proclock->tag.myProc->lockGroupLeader (but not the other lock-group-related
! fields of the PGPROC).

  User Locks (Advisory Locks)
  ---------------------------
diff --git a/src/backend/storage/lmgr/deadlock.c b/src/backend/storage/lmgr/deadlock.c
index d81746d..21f92dd 100644
*** a/src/backend/storage/lmgr/deadlock.c
--- b/src/backend/storage/lmgr/deadlock.c
***************
*** 40,47 ****
   * is, it will be the leader rather than any other member of the lock group.
   * The group leaders act as representatives of the whole group even though
   * those particular processes need not be waiting at all.  There will be at
!  * least one member of the group on the wait queue for the given lock, maybe
!  * more.
   */
  typedef struct
  {
--- 40,47 ----
   * is, it will be the leader rather than any other member of the lock group.
   * The group leaders act as representatives of the whole group even though
   * those particular processes need not be waiting at all.  There will be at
!  * least one member of the waiter's lock group on the wait queue for the given
!  * lock, maybe more.
   */
  typedef struct
  {
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index cff8c08..deb88c7 100644
*** a/src/backend/storage/lmgr/proc.c
--- b/src/backend/storage/lmgr/proc.c
*************** ProcKill(int code, Datum arg)
*** 826,835 ****
              leader->lockGroupLeader = NULL;
              if (leader != MyProc)
              {
-                 procgloballist = leader->procgloballist;
-
                  /* Leader exited first; return its PGPROC. */
                  SpinLockAcquire(ProcStructLock);
                  leader->links.next = (SHM_QUEUE *) *procgloballist;
                  *procgloballist = leader;
                  SpinLockRelease(ProcStructLock);
--- 826,834 ----
              leader->lockGroupLeader = NULL;
              if (leader != MyProc)
              {
                  /* Leader exited first; return its PGPROC. */
                  SpinLockAcquire(ProcStructLock);
+                 procgloballist = leader->procgloballist;
                  leader->links.next = (SHM_QUEUE *) *procgloballist;
                  *procgloballist = leader;
                  SpinLockRelease(ProcStructLock);
*************** ProcSleep(LOCALLOCK *locallock, LockMeth
*** 1004,1010 ****
      int            i;

      /*
!      * If group locking is in use, locks held my members of my locking group
       * need to be included in myHeldLocks.
       */
      if (leader != NULL)
--- 1003,1009 ----
      int            i;

      /*
!      * If group locking is in use, locks held by members of my locking group
       * need to be included in myHeldLocks.
       */
      if (leader != NULL)
*************** BecomeLockGroupMember(PGPROC *leader, in
*** 1802,1810 ****
      /* PID must be valid. */
      Assert(pid != 0);

!     /* Try to join the group. */
      leader_lwlock = LockHashPartitionLockByProc(leader);
      LWLockAcquire(leader_lwlock, LW_EXCLUSIVE);
      if (leader->lockGroupLeaderIdentifier == pid)
      {
          ok = true;
--- 1801,1817 ----
      /* PID must be valid. */
      Assert(pid != 0);

!     /*
!      * Get lock protecting the group fields.  Note LockHashPartitionLockByProc
!      * accesses leader->pgprocno in a PGPROC that might be free.  This is safe
!      * because all PGPROCs' pgprocno fields are set during shared memory
!      * initialization and never change thereafter; so we will acquire the
!      * correct lock even if the leader PGPROC is in process of being recycled.
!      */
      leader_lwlock = LockHashPartitionLockByProc(leader);
      LWLockAcquire(leader_lwlock, LW_EXCLUSIVE);
+
+     /* Try to join the group */
      if (leader->lockGroupLeaderIdentifier == pid)
      {
          ok = true;
diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
index a3e19e1..2053bfe 100644
*** a/src/include/storage/lock.h
--- b/src/include/storage/lock.h
*************** typedef enum
*** 477,486 ****
   * by one of the lock hash partition locks.  Since the deadlock detector
   * acquires all such locks anyway, this makes it safe for it to access these
   * fields without doing anything extra.  To avoid contention as much as
!  * possible, we map different PGPROCs to different partition locks.
   */
! #define LockHashPartitionLockByProc(p) \
!     LockHashPartitionLock((p)->pgprocno)

  /*
   * function prototypes
--- 477,487 ----
   * by one of the lock hash partition locks.  Since the deadlock detector
   * acquires all such locks anyway, this makes it safe for it to access these
   * fields without doing anything extra.  To avoid contention as much as
!  * possible, we map different PGPROCs to different partition locks.  The lock
!  * used for a given lock group is determined by the group leader's pgprocno.
   */
! #define LockHashPartitionLockByProc(leader_pgproc) \
!     LockHashPartitionLock((leader_pgproc)->pgprocno)

  /*
   * function prototypes
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 81bddbb..a9405ce 100644
*** a/src/include/storage/proc.h
--- b/src/include/storage/proc.h
*************** struct PGPROC
*** 152,158 ****
       */
      TransactionId    procArrayGroupMemberXid;

!     /* Per-backend LWLock.  Protects fields below. */
      LWLock        backendLock;

      /* Lock manager data, recording fast-path locks taken by this backend. */
--- 152,158 ----
       */
      TransactionId    procArrayGroupMemberXid;

!     /* Per-backend LWLock.  Protects fields below (but not group fields). */
      LWLock        backendLock;

      /* Lock manager data, recording fast-path locks taken by this backend. */
*************** struct PGPROC
*** 163,173 ****
                                                   * lock */

      /*
!      * Support for lock groups.  Use LockHashPartitionLockByProc to get the
!      * LWLock protecting these fields.
       */
      int            lockGroupLeaderIdentifier;    /* MyProcPid, if I'm a leader */
!     PGPROC       *lockGroupLeader;    /* lock group leader, if I'm a follower */
      dlist_head    lockGroupMembers;    /* list of members, if I'm a leader */
      dlist_node  lockGroupLink;        /* my member link, if I'm a member */
  };
--- 163,173 ----
                                                   * lock */

      /*
!      * Support for lock groups.  Use LockHashPartitionLockByProc on the group
!      * leader to get the LWLock protecting these fields.
       */
      int            lockGroupLeaderIdentifier;    /* MyProcPid, if I'm a leader */
!     PGPROC       *lockGroupLeader;    /* lock group leader, if I'm a member */
      dlist_head    lockGroupMembers;    /* list of members, if I'm a leader */
      dlist_node  lockGroupLink;        /* my member link, if I'm a member */
  };

Re: Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> 1. It removes the groupLeader flag from PGPROC.  You're right: we don't need it.

It occurs to me that this claim is bogus:

> We institute a further coding rule that a process cannot join or leave a lock
> group while owning any PROCLOCK.  Therefore, given a lock manager lock
> sufficient to examine PROCLOCK *proclock, it also safe to examine
> proclock->tag.myProc->lockGroupLeader (but not the other fields mentioned in
> the previous paragraph).

Yes, we can legislate that workers have to join before taking any lock,
and if processes only leave the group at death then that end of it is not
a problem either; but it is patently not the case that a process will hold
no locks when it calls BecomeLockGroupLeader().

We may be able to salvage this simplification anyway, but it will require
a tighter specification of when it's safe to look at
proclock->tag.myProc->lockGroupLeader.

Another possibility is to forget about executor-time
BecomeLockGroupLeader(), and just say that any process that isn't a worker
always becomes its own group leader at startup.  Then the above para is
true as written.  I don't know what downsides this might have; do your
changes in the lock manager try to optimize the null-lockGroupLeader case?
        regards, tom lane



Re: Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

От
Craig Ringer
Дата:
On 14 February 2016 at 08:05, Robert Haas <robertmhaas@gmail.com> wrote:
 

The concept of a
lock group is formally separate from the concept of a parallel group
created by a ParallelContext, but it is not clear that there will ever
be any other context in which a lock group will be a good idea.

Just coming back to this in terms of what Stephen and I raised: Robert, do you think this design as it stands can handle cases where a normal standalone backend gets promoted to a lock-group leader that others can then join? 

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

От
Robert Haas
Дата:
On Tue, Feb 16, 2016 at 8:37 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 14 February 2016 at 08:05, Robert Haas <robertmhaas@gmail.com> wrote:
>> The concept of a
>> lock group is formally separate from the concept of a parallel group
>> created by a ParallelContext, but it is not clear that there will ever
>> be any other context in which a lock group will be a good idea.
>
> Just coming back to this in terms of what Stephen and I raised: Robert, do
> you think this design as it stands can handle cases where a normal
> standalone backend gets promoted to a lock-group leader that others can then
> join?

That's the exact intended purpose of it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

От
Robert Haas
Дата:
On Tue, Feb 16, 2016 at 3:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> 1. It removes the groupLeader flag from PGPROC.  You're right: we don't need it.
>
> It occurs to me that this claim is bogus:
>
>> We institute a further coding rule that a process cannot join or leave a lock
>> group while owning any PROCLOCK.  Therefore, given a lock manager lock
>> sufficient to examine PROCLOCK *proclock, it also safe to examine
>> proclock->tag.myProc->lockGroupLeader (but not the other fields mentioned in
>> the previous paragraph).
>
> Yes, we can legislate that workers have to join before taking any lock,
> and if processes only leave the group at death then that end of it is not
> a problem either; but it is patently not the case that a process will hold
> no locks when it calls BecomeLockGroupLeader().
>
> We may be able to salvage this simplification anyway, but it will require
> a tighter specification of when it's safe to look at
> proclock->tag.myProc->lockGroupLeader.
>
> Another possibility is to forget about executor-time
> BecomeLockGroupLeader(), and just say that any process that isn't a worker
> always becomes its own group leader at startup.  Then the above para is
> true as written.  I don't know what downsides this might have; do your
> changes in the lock manager try to optimize the null-lockGroupLeader case?

Hmm, that's true.  I don't think it actually matters all that much,
because proclock->tag.myProc->lockGroupLeader == NULL has pretty much
the same effect as if proclock->tag.myProc->lockGroupLeader ==
proclock->tag.myProc.  But not completely.  One problem is that we
don't currently assume that 8-byte writes are atomic, so somebody
might see the group leader field half-set, which would be bad.

But, yes, there are some optimizations for that case.  For example, in
LockCheckConflicts:
   /* If no group locking, it's definitely a conflict. */   if (leader == NULL)   {
PROCLOCK_PRINT("LockCheckConflicts:conflicting (simple)",                      proclock);       return STATUS_FOUND;
}

ProcSleep also has a check of this sort.  In theory those
optimizations are quite important, because they can avoid extra passes
over the lock queue which theoretically turn O(N) algorithms into
O(N^2) or O(N^2) into O(N^3) or whatever.  But in practice I'm not
sure that the lock queues are ever long enough for that to become an
issue.  And if they are, the fact that your system is lock-bound is
probably a much bigger problem that a few extra CPU cycles inside the
lock manager to figure that out.  But I don't want to be too flip
about that analysis - there might be some scenario where the extra
cycles do hurt.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> Hmm, that's true.  I don't think it actually matters all that much,
> because proclock->tag.myProc->lockGroupLeader == NULL has pretty much
> the same effect as if proclock->tag.myProc->lockGroupLeader ==
> proclock->tag.myProc.  But not completely.  One problem is that we
> don't currently assume that 8-byte writes are atomic, so somebody
> might see the group leader field half-set, which would be bad.

Yes, exactly.  I'm not certain if there are any real platforms where
a pointer-sized write wouldn't be atomic (it sure sounds inefficient
for that to be true), but we have not assumed that to date and I'd
just as soon not start here.
        regards, tom lane



Re: Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

От
Robert Haas
Дата:
On Wed, Feb 17, 2016 at 10:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Hmm, that's true.  I don't think it actually matters all that much,
>> because proclock->tag.myProc->lockGroupLeader == NULL has pretty much
>> the same effect as if proclock->tag.myProc->lockGroupLeader ==
>> proclock->tag.myProc.  But not completely.  One problem is that we
>> don't currently assume that 8-byte writes are atomic, so somebody
>> might see the group leader field half-set, which would be bad.
>
> Yes, exactly.  I'm not certain if there are any real platforms where
> a pointer-sized write wouldn't be atomic (it sure sounds inefficient
> for that to be true), but we have not assumed that to date and I'd
> just as soon not start here.

I guess it's worth pointing out that storing the groupLeader in the
PROCLOCK, as we do currently, avoids this issue entirely.  The
deadlock detector can safely follow lockGroupLeader pointers because
it holds all the lock manager partition locks, and proc.c/lock.c don't
need to do so if groupLeader is present in the PROCLOCK.  Nor does
that field ever need to be updated, because it's defined to always be
non-NULL: if a process becomes a group leader, the value stored in the
PROCLOCKs does not change.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

От
Andres Freund
Дата:
On 2016-02-16 23:33:47 -0500, Tom Lane wrote:
> Yes, exactly.  I'm not certain if there are any real platforms where
> a pointer-sized write wouldn't be atomic (it sure sounds inefficient
> for that to be true), but we have not assumed that to date and I'd
> just as soon not start here.

FWIW, there's no sizeof(void*) == 8 platform supported by PG where
aligned 8 byte writes aren't atomic. There are a number of supported
platforms with sizeof(void*) == 4 without atomic 8 byte writes though.



Re: Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> I guess it's worth pointing out that storing the groupLeader in the
> PROCLOCK, as we do currently, avoids this issue entirely.

Right, that's why I framed it as "can we salvage this simplification"
(referring to removal of that field).  It may be best to just fix the
existing bug and docs deficiencies before returning to that idea.
        regards, tom lane



Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

От
Amit Kapila
Дата:
On Mon, Feb 15, 2016 at 12:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Robert Haas <robertmhaas@gmail.com> writes:
>
> > With respect to pg_locks - and for that matter also pg_stat_activity -
> > I think you are right that improvement is needed.
>
> This is really the core of my concern at the moment.  I think that
> isolationtester is probably outright broken for any situation where the
> queries-under-test are being parallel executed, and so will be any other
> client that's trying to identify who blocks whom from pg_locks.
>
> > The simplest thing we could do to make that easier is, in
> > pg_stat_activity, have parallel workers advertise the PID that
> > launched them in a new field; and in pg_locks, have members of a lock
> > group advertise the leader's PID in a new field.
> >

The lock information for parallel query which uses one parallel worker and 
another unrelated backend trying to acquire Access Exclusive lock on the
table which parallel query is using will be displayed as below by using lock
monitoring query [1]

 blocked_pid | blocking_pid |            blocked_statement            | current_
statement_in_blocking_process
-------------+--------------+-----------------------------------------+---------
-------------------------------
        5052 |         3464 | Lock table t1 in Access Exclusive Mode; |
        5052 |         4128 | Lock table t1 in Access Exclusive Mode; | select c
ount(*) from t1 where c1 < 10;
(2 rows)

Here, backend 5052 is waiting for acquiring Access Exclusive lock on t1
which is held in Access Share mode by master backend 4128 and parallel
worker 3464.  Now, I think it is tricky for user to find what exactly
is going on by using current lock monitoring queries.  Do you think by
adding additional leader pid column, we can eliminate duplicity in above
case and all such cases without having additional join or making above
query more complex?

Can we think of not having lock information in pg_locks for parallel worker
for certain cases where parallel worker acquires lock in same
mode on same resource as leader backend?

Currently, pg_stat_activity displays NULL for both query and state for
parallel workers.  I think we should call pgstat_report_activity() in
ParallelWorkerMain() or ParallelQueryMain() to display the information.
One thing which needs some thought is what query should we display
for parallel worker, currently while forming query descriptor in parallel
worker we use "<parallel query>", so one idea is to just display the same
or display the query the used by master backend (if currently not
available, then we might need to pass it from master backend).

>
> That would be simple for us, but it would break every existing client-side
> query that tries to identify blockers/blockees; and not only would those
> queries need work but they would become very substantially more complex
> and slower (probably at least 4-way joins not 2-way).  We already know
> that isolationtester's query has performance problems in the buildfarm.
> I think more thought is needed here,

Agreed.  I think before deciding what exactly to add or change in pg_locks
or lock monitoring queries, it might be helpful if we define how we want
to convey the information to user/dba when group of parallel processes
are involved as blockers/blockees.  Some of the cases are:
a) a standalone backend waits for group of parallel processes holding
lock in same mode on same resource.
b) group of parallel workers are waiting for lock held by standalone
backend or some other parallel group
c) a standalone backend waits for group of parallel processes holding
lock in different mode on same resource.
and other similar cases.

Before jumping into discussion about solution, I would like to know
do you and or Robert also have above kind of cases in mind where you
want a better way to display information for user or something else?



[1] - 
SELECT blocked_locks.pid     AS blocked_pid,
         blocking_locks.pid     AS blocking_pid,
         blocked_activity.query    AS blocked_statement,
         blocking_activity.query   AS current_statement_in_blocking_process
   FROM  pg_catalog.pg_locks         blocked_locks
    JOIN pg_catalog.pg_stat_activity blocked_activity  ON blocked_activity.pid = blocked_locks.pid
    JOIN pg_catalog.pg_locks         blocking_locks 
        ON blocking_locks.locktype = blocked_locks.locktype
        AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
        AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
        AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
        AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
        AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
        AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
        AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
        AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
        AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
        AND blocking_locks.pid != blocked_locks.pid
 
    JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
   WHERE NOT blocked_locks.GRANTED;

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl

От
Tom Lane
Дата:
I wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Yeah, you're right.  Attached is a draft patch that tries to clean up
>> that and a bunch of other things that you raised.

> I've been reviewing this patch, and I had a question: why do we need to
> bother with a lockGroupLeaderIdentifier field?  It seems totally redundant
> with the leader's pid field, ie, why doesn't BecomeLockGroupMember simply
> compare leader->pid with the PID it's passed?  For some more safety, it
> could also verify that leader->lockGroupLeader == leader; but I don't
> see what the extra field is buying us.

Here is a proposed patch that gets rid of lockGroupLeaderIdentifier.

I concluded that the "leader->lockGroupLeader == leader" test is
necessary, because we don't ever reset the pid field of a dead PGPROC
until it's re-used.  However, with that check, this seems isomorphic to
the existing code because lockGroupLeader is reset to NULL at all the same
places that lockGroupLeaderIdentifier is reset.  So we do not need both of
those fields.

            regards, tom lane

diff --git a/src/backend/storage/lmgr/README b/src/backend/storage/lmgr/README
index c399618..56b0a12 100644
*** a/src/backend/storage/lmgr/README
--- b/src/backend/storage/lmgr/README
*************** those cases so that they no longer use h
*** 650,676 ****
  (which is not a crazy idea, given that such lock acquisitions are not expected
  to deadlock and that heavyweight lock acquisition is fairly slow anyway).

! Group locking adds four new members to each PGPROC: lockGroupLeaderIdentifier,
! lockGroupLeader, lockGroupMembers, and lockGroupLink. The first is simply a
! safety mechanism. A newly started parallel worker has to try to join the
! leader's lock group, but it has no guarantee that the group leader is still
! alive by the time it gets started. We try to ensure that the parallel leader
! dies after all workers in normal cases, but also that the system could survive
! relatively intact if that somehow fails to happen. This is one of the
! precautions against such a scenario: the leader relays its PGPROC and also its
! PID to the worker, and the worker fails to join the lock group unless the
! given PGPROC still has the same PID. We assume that PIDs are not recycled
! quickly enough for this interlock to fail.
!
! A PGPROC's lockGroupLeader is NULL for processes not involved in parallel
! query. When a process wants to cooperate with parallel workers, it becomes a
! lock group leader, which means setting this field to point to its own
! PGPROC. When a parallel worker starts up, it points this field at the leader,
! with the above-mentioned interlock. The lockGroupMembers field is only used in
  the leader; it is a list of the member PGPROCs of the lock group (the leader
  and all workers). The lockGroupLink field is the list link for this list.

! All four of these fields are considered to be protected by a lock manager
  partition lock.  The partition lock that protects these fields within a given
  lock group is chosen by taking the leader's pgprocno modulo the number of lock
  manager partitions.  This unusual arrangement has a major advantage: the
--- 650,665 ----
  (which is not a crazy idea, given that such lock acquisitions are not expected
  to deadlock and that heavyweight lock acquisition is fairly slow anyway).

! Group locking adds three new members to each PGPROC: lockGroupLeader,
! lockGroupMembers, and lockGroupLink. A PGPROC's lockGroupLeader is NULL for
! processes not involved in parallel query. When a process wants to cooperate
! with parallel workers, it becomes a lock group leader, which means setting
! this field to point to its own PGPROC. When a parallel worker starts up, it
! points this field at the leader. The lockGroupMembers field is only used in
  the leader; it is a list of the member PGPROCs of the lock group (the leader
  and all workers). The lockGroupLink field is the list link for this list.

! All three of these fields are considered to be protected by a lock manager
  partition lock.  The partition lock that protects these fields within a given
  lock group is chosen by taking the leader's pgprocno modulo the number of lock
  manager partitions.  This unusual arrangement has a major advantage: the
*************** change while the deadlock detector is ru
*** 679,684 ****
--- 668,685 ----
  all the lock manager locks.  Also, holding this single lock allows safe
  manipulation of the lockGroupMembers list for the lock group.

+ We need an additional interlock when setting these fields, because a newly
+ started parallel worker has to try to join the leader's lock group, but it
+ has no guarantee that the group leader is still alive by the time it gets
+ started.  We try to ensure that the parallel leader dies after all workers
+ in normal cases, but also that the system could survive relatively intact
+ if that somehow fails to happen.  This is one of the precautions against
+ such a scenario: the leader relays its PGPROC and also its PID to the
+ worker, and the worker fails to join the lock group unless the given PGPROC
+ still has the same PID and is still a lock group leader.  We assume that
+ PIDs are not recycled quickly enough for this interlock to fail.
+
+
  User Locks (Advisory Locks)
  ---------------------------

diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 114fba0..00787d4 100644
*** a/src/backend/storage/lmgr/proc.c
--- b/src/backend/storage/lmgr/proc.c
*************** InitProcess(void)
*** 401,407 ****
      pg_atomic_init_u32(&MyProc->procArrayGroupNext, INVALID_PGPROCNO);

      /* Check that group locking fields are in a proper initial state. */
-     Assert(MyProc->lockGroupLeaderIdentifier == 0);
      Assert(MyProc->lockGroupLeader == NULL);
      Assert(dlist_is_empty(&MyProc->lockGroupMembers));

--- 401,406 ----
*************** InitAuxiliaryProcess(void)
*** 565,571 ****
      SwitchToSharedLatch();

      /* Check that group locking fields are in a proper initial state. */
-     Assert(MyProc->lockGroupLeaderIdentifier == 0);
      Assert(MyProc->lockGroupLeader == NULL);
      Assert(dlist_is_empty(&MyProc->lockGroupMembers));

--- 564,569 ----
*************** ProcKill(int code, Datum arg)
*** 822,828 ****
          dlist_delete(&MyProc->lockGroupLink);
          if (dlist_is_empty(&leader->lockGroupMembers))
          {
-             leader->lockGroupLeaderIdentifier = 0;
              leader->lockGroupLeader = NULL;
              if (leader != MyProc)
              {
--- 820,825 ----
*************** BecomeLockGroupLeader(void)
*** 1771,1777 ****
      leader_lwlock = LockHashPartitionLockByProc(MyProc);
      LWLockAcquire(leader_lwlock, LW_EXCLUSIVE);
      MyProc->lockGroupLeader = MyProc;
-     MyProc->lockGroupLeaderIdentifier = MyProcPid;
      dlist_push_head(&MyProc->lockGroupMembers, &MyProc->lockGroupLink);
      LWLockRelease(leader_lwlock);
  }
--- 1768,1773 ----
*************** BecomeLockGroupMember(PGPROC *leader, in
*** 1795,1800 ****
--- 1791,1799 ----
      /* Group leader can't become member of group */
      Assert(MyProc != leader);

+     /* Can't already be a member of a group */
+     Assert(MyProc->lockGroupLeader == NULL);
+
      /* PID must be valid. */
      Assert(pid != 0);

*************** BecomeLockGroupMember(PGPROC *leader, in
*** 1809,1815 ****
      LWLockAcquire(leader_lwlock, LW_EXCLUSIVE);

      /* Try to join the group */
!     if (leader->lockGroupLeaderIdentifier == pid)
      {
          ok = true;
          MyProc->lockGroupLeader = leader;
--- 1808,1814 ----
      LWLockAcquire(leader_lwlock, LW_EXCLUSIVE);

      /* Try to join the group */
!     if (leader->pid == pid && leader->lockGroupLeader == leader)
      {
          ok = true;
          MyProc->lockGroupLeader = leader;
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index a9405ce..dbcdd3f 100644
*** a/src/include/storage/proc.h
--- b/src/include/storage/proc.h
*************** struct PGPROC
*** 166,172 ****
       * Support for lock groups.  Use LockHashPartitionLockByProc on the group
       * leader to get the LWLock protecting these fields.
       */
-     int            lockGroupLeaderIdentifier;    /* MyProcPid, if I'm a leader */
      PGPROC       *lockGroupLeader;    /* lock group leader, if I'm a member */
      dlist_head    lockGroupMembers;    /* list of members, if I'm a leader */
      dlist_node  lockGroupLink;        /* my member link, if I'm a member */
--- 166,171 ----