Re: Idea for fixing parallel pg_dump's lock acquisition problem

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Idea for fixing parallel pg_dump's lock acquisition problem
Дата
Msg-id CA+TgmoYd3Msyfuo-1K52dPUeHMNZu5-sxwKVvqo8K13EsSMpcQ@mail.gmail.com
обсуждение исходный текст
Ответ на Idea for fixing parallel pg_dump's lock acquisition problem  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Idea for fixing parallel pg_dump's lock acquisition problem  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Wed, Apr 17, 2019 at 11:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> While thinking about that, it occurred to me that we could close the
> gap if the server somehow understood that the master was waiting for
> the worker.  And it's actually not that hard to make that happen:
> we could use advisory locks.  Consider a dance like the following:
>
> 1. Master has A.S. lock on table t, and dispatches a dump job for t
> to worker.
>
> 2. Worker chooses some key k, does pg_advisory_lock(k), and sends k
> back to master.
>
> 3. Worker attempts to get A.S. lock on t.  This might block, if some
> other session has a pending lock request on t.
>
> 4. Upon receipt of message from worker, master also does
> pg_advisory_lock(k).  This blocks, but now the server can see that a
> deadlock exists, and after deadlock_timeout elapses it will fix the
> deadlock by letting the worker bypass the other pending lock request.
>
> 5. Once worker has the lock on table t, it does pg_advisory_unlock(k)
> to release the master.
>
> 6. Master also does pg_advisory_unlock(k), and goes on about its business.

Neat idea.

> I've tested that the server side of this works, for either order of steps
> 3 and 4.  It seems like mostly just a small matter of programming to teach
> pg_dump to do this, although there are some questions to resolve, mainly
> how we choose the advisory lock keys.  If there are user applications
> running that also use advisory locks, there could be unwanted
> interference.  One easy improvement is to use pg_try_advisory_lock(k) in
> step 2, and just choose a different k if the lock's in use.  Perhaps,
> since we don't expect that the locks would be held long, that's
> sufficient --- but I suspect that users might wish for some pg_dump
> options to restrict the set of keys it could use.

This seems like a pretty significant wart.  I think we probably need a
better solution, but I'm not sure what it is.  I guess we could define
a new lock space that is specifically intended for this kind of
inter-process coordination, where it's expected that the key is a PID.

> Another point is that the whole dance is unnecessary in the normal
> case, so maybe we should only do this if an initial attempt to get
> the lock on table t fails.  However, LOCK TABLE NOWAIT throws an
> error if it can't get the lock, so this would require using a
> subtransaction or starting a whole new transaction in the worker,
> so maybe that's more trouble than it's worth.  Some performance
> testing might be called for.  There's also the point that the code
> path would go almost entirely untested if it's not exercised always.

Seems like it might make sense just to do it always.

> Lastly, we'd probably want both the master and worker processes to
> run with small values of deadlock_timeout, so as to reduce the
> time wasted before the server breaks the deadlock.  Are there any
> downsides to that?  (If so, again maybe it's not worth the trouble,
> since the typical case is that no wait is needed.)

I think we shouldn't do this part.  It's true that reducing
deadlock_timeout prevents time from being wasted if a deadlock occurs,
but that problem is not confined to this case; that's what
deadlock_timeout does in general.  I can't see why we should
substitute our judgement regarding the proper value of
deadlock_timeout for that of the DBA in this one case.  I'm a little
fuzzy-headed at the moment but it seems to me that no deadlock will
occur unless the worker fails to get the lock, which should be rare,
and even then I wonder if we couldn't somehow jigger things so that
the special case in ProcSleep ("Determine where to add myself in the
wait queue.") rescues us.  Even if not, a 1 second delay in a rare
case doesn't seem like a huge problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: TM format can mix encodings in to_char()
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Idea for fixing parallel pg_dump's lock acquisition problem