Обсуждение: Preventing deadlock on parallel backup

Поиск
Список
Период
Сортировка

Preventing deadlock on parallel backup

От
Lucas
Дата:
<p dir="ltr">People,<p dir="ltr">I made a small modification in pg_dump to prevent parallel backup failures due to
exclusivelock requests made by other tasks. <p dir="ltr">The modification I made take shared locks for each parallel
backupworker at the very beginning of the job. That way, any other job that attempts to acquire exclusive locks will
waitfor the backup to finish.<p dir="ltr">In my case, each server was taking a day to complete the backup, now with
parallelbackup one is taking 3 hours and the others less than a hour.<p dir="ltr">The code below is not very elegant,
butit works for me. My whishlist for the backup is:<p dir="ltr">1) replace plpgsql by c code reading the backup toc and
assemblingthe lock commands.<br /> 2) create an timeout to the locks.<br /> 3) broadcast the end of copy to every
workerin order to release the locks as early as possible;<br /> 4) create a monitor thread that prioritize an copy job
basedon a exclusive lock acquired;<br /> 5) grant the lock for other connection of the same distributed transaction if
itis held by any connection of the same distributed transaction. There is some sideefect I can't see on that?<p
dir="ltr">1to 4 are within my capabilities and I may do it in the future. 4 is to advanced for me and I do not dare to
messwith something so fundamental rights now.<p dir="ltr">Anyone else is working on that?<br /><p dir="ltr">On,
Parallel.c,void RunWorker(...), add:<p dir="ltr">PQExpBuffer query;<br /> PGresult   *res;<p dir="ltr">query =
createPQExpBuffer();           <br /> resetPQExpBuffer(query);<br /> appendPQExpBuffer(query,<br /> "do language
'plpgsql'$$"<br /> " declare "<br /> "    x record;"<br /> " begin"<br /> "    for x in select * from pg_tables where
schemanamenot in ('pg_catalog','information_schema') loop"<br /> "        raise info 'lock table %.%', x.schemaname,
x.tablename;"<br/> "        execute 'LOCK TABLE '||quote_ident(x.schemaname)||'.'||quote_ident(x.tablename)||' IN
ACCESSSHARE MODE NOWAIT';"<br /> "    end loop;"<br /> "end"<br /> "$$" );<p dir="ltr">res = PQexec(AH->connection,
query->data);<pdir="ltr">if (!res || PQresultStatus(res) != PGRES_COMMAND_OK)<br />        
exit_horribly(modulename,"Couldnot lock the tables to begin the work\n\n");<br /> PQclear(res);<br />
destroyPQExpBuffer(query);<br/> 

Re: Preventing deadlock on parallel backup

От
Tom Lane
Дата:
Lucas <lucas75@gmail.com> writes:
> I made a small modification in pg_dump to prevent parallel backup failures
> due to exclusive lock requests made by other tasks.

> The modification I made take shared locks for each parallel backup worker
> at the very beginning of the job. That way, any other job that attempts to
> acquire exclusive locks will wait for the backup to finish.

I do not think this would eliminate the problem; all it's doing is making
the window for trouble a bit narrower.  Also, it implies taking out many
locks that would never be used, since no worker process will be touching
all of the tables.

I think a real solution involves teaching the backend to allow a worker
process to acquire a lock as long as its master already has the same lock.
There's already queue-jumping logic of that sort in the lock manager, but
it doesn't fire because we don't see that there's a potential deadlock.
What needs to be worked out, mostly, is how we can do that without
creating security hazards (since the backend would have to accept a
command enabling this behavior from the client).  Maybe it's good enough
to insist that leader and follower be same user ID, or maybe not.

There's some related problems in parallel query, which AFAIK we just have
an ugly kluge solution for ATM.  It'd be better if there were a clear
model of when to allow a parallel worker to get a lock out-of-turn.
        regards, tom lane



Re: Preventing deadlock on parallel backup

От
Lucas
Дата:
<p dir="ltr">I agree. It is an ugly hack. <p dir="ltr">But to me, the reduced window for failure is important. And that
wayan failure will happen right away to be submitted to my operators as soon as possible. <p dir="ltr">The queue
jumpinglogic can not use the distributed transaction id? <p dir="ltr">On my logic, if a connection requests a shared
lockthat is already granted to another connection in the same distributed transaction it should be granted right
away...make sense?<div class="gmail_extra"><br /><div class="gmail_quote">Em 08/09/2016 4:15 PM, "Tom Lane" <<a
href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>>escreveu:<br type="attribution" /><blockquote
class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Lucas <<a
href="mailto:lucas75@gmail.com">lucas75@gmail.com</a>>writes:<br /> > I made a small modification in pg_dump to
preventparallel backup failures<br /> > due to exclusive lock requests made by other tasks.<br /><br /> > The
modificationI made take shared locks for each parallel backup worker<br /> > at the very beginning of the job. That
way,any other job that attempts to<br /> > acquire exclusive locks will wait for the backup to finish.<br /><br /> I
donot think this would eliminate the problem; all it's doing is making<br /> the window for trouble a bit narrower. 
Also,it implies taking out many<br /> locks that would never be used, since no worker process will be touching<br />
allof the tables.<br /><br /> I think a real solution involves teaching the backend to allow a worker<br /> process to
acquirea lock as long as its master already has the same lock.<br /> There's already queue-jumping logic of that sort
inthe lock manager, but<br /> it doesn't fire because we don't see that there's a potential deadlock.<br /> What needs
tobe worked out, mostly, is how we can do that without<br /> creating security hazards (since the backend would have to
accepta<br /> command enabling this behavior from the client).  Maybe it's good enough<br /> to insist that leader and
followerbe same user ID, or maybe not.<br /><br /> There's some related problems in parallel query, which AFAIK we just
have<br/> an ugly kluge solution for ATM.  It'd be better if there were a clear<br /> model of when to allow a parallel
workerto get a lock out-of-turn.<br /><br />                         regards, tom lane<br /></blockquote></div></div> 

Re: Preventing deadlock on parallel backup

От
Tom Lane
Дата:
Lucas <lucas75@gmail.com> writes:
> The queue jumping logic can not use the distributed transaction id?

If we had such a thing as a distributed transaction id, maybe the
answer could be yes.  We don't.

I did wonder whether using a shared snapshot might be a workable proxy
for that, but haven't pursued it.
        regards, tom lane



Re: Preventing deadlock on parallel backup

От
Lucas
Дата:
<p dir="ltr">Tom,<p dir="ltr">Yes, it is what I mean. Is what pg_dump uses to get things synchronized. It seems to me a
clearmarker that the same task is using more than one connection to accomplish the one job.<br /><div
class="gmail_extra"><br/><div class="gmail_quote">Em 08/09/2016 6:34 PM, "Tom Lane" <<a
href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>>escreveu:<br type="attribution" /><blockquote
class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Lucas <<a
href="mailto:lucas75@gmail.com">lucas75@gmail.com</a>>writes:<br /> > The queue jumping logic can not use the
distributedtransaction id?<br /><br /> If we had such a thing as a distributed transaction id, maybe the<br /> answer
couldbe yes.  We don't.<br /><br /> I did wonder whether using a shared snapshot might be a workable proxy<br /> for
that,but haven't pursued it.<br /><br />                         regards, tom lane<br /></blockquote></div></div>