Re: Idea for fixing parallel pg_dump's lock acquisition problem
От | Robert Haas |
---|---|
Тема | Re: Idea for fixing parallel pg_dump's lock acquisition problem |
Дата | |
Msg-id | CA+TgmoYd3Msyfuo-1K52dPUeHMNZu5-sxwKVvqo8K13EsSMpcQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Idea for fixing parallel pg_dump's lock acquisition problem (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Idea for fixing parallel pg_dump's lock acquisition problem
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-hackers |
On Wed, Apr 17, 2019 at 11:34 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > While thinking about that, it occurred to me that we could close the > gap if the server somehow understood that the master was waiting for > the worker. And it's actually not that hard to make that happen: > we could use advisory locks. Consider a dance like the following: > > 1. Master has A.S. lock on table t, and dispatches a dump job for t > to worker. > > 2. Worker chooses some key k, does pg_advisory_lock(k), and sends k > back to master. > > 3. Worker attempts to get A.S. lock on t. This might block, if some > other session has a pending lock request on t. > > 4. Upon receipt of message from worker, master also does > pg_advisory_lock(k). This blocks, but now the server can see that a > deadlock exists, and after deadlock_timeout elapses it will fix the > deadlock by letting the worker bypass the other pending lock request. > > 5. Once worker has the lock on table t, it does pg_advisory_unlock(k) > to release the master. > > 6. Master also does pg_advisory_unlock(k), and goes on about its business. Neat idea. > I've tested that the server side of this works, for either order of steps > 3 and 4. It seems like mostly just a small matter of programming to teach > pg_dump to do this, although there are some questions to resolve, mainly > how we choose the advisory lock keys. If there are user applications > running that also use advisory locks, there could be unwanted > interference. One easy improvement is to use pg_try_advisory_lock(k) in > step 2, and just choose a different k if the lock's in use. Perhaps, > since we don't expect that the locks would be held long, that's > sufficient --- but I suspect that users might wish for some pg_dump > options to restrict the set of keys it could use. This seems like a pretty significant wart. I think we probably need a better solution, but I'm not sure what it is. I guess we could define a new lock space that is specifically intended for this kind of inter-process coordination, where it's expected that the key is a PID. > Another point is that the whole dance is unnecessary in the normal > case, so maybe we should only do this if an initial attempt to get > the lock on table t fails. However, LOCK TABLE NOWAIT throws an > error if it can't get the lock, so this would require using a > subtransaction or starting a whole new transaction in the worker, > so maybe that's more trouble than it's worth. Some performance > testing might be called for. There's also the point that the code > path would go almost entirely untested if it's not exercised always. Seems like it might make sense just to do it always. > Lastly, we'd probably want both the master and worker processes to > run with small values of deadlock_timeout, so as to reduce the > time wasted before the server breaks the deadlock. Are there any > downsides to that? (If so, again maybe it's not worth the trouble, > since the typical case is that no wait is needed.) I think we shouldn't do this part. It's true that reducing deadlock_timeout prevents time from being wasted if a deadlock occurs, but that problem is not confined to this case; that's what deadlock_timeout does in general. I can't see why we should substitute our judgement regarding the proper value of deadlock_timeout for that of the DBA in this one case. I'm a little fuzzy-headed at the moment but it seems to me that no deadlock will occur unless the worker fails to get the lock, which should be rare, and even then I wonder if we couldn't somehow jigger things so that the special case in ProcSleep ("Determine where to add myself in the wait queue.") rescues us. Even if not, a 1 second delay in a rare case doesn't seem like a huge problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления:
Следующее
От: Tom LaneДата:
Сообщение: Re: Idea for fixing parallel pg_dump's lock acquisition problem