Обсуждение: Questionable coding in proc.c & lock.c

Поиск
Список
Период
Сортировка

Questionable coding in proc.c & lock.c

От
Tom Lane
Дата:
I spent some time looking around for possible causes of the recently
reported deadlock conditions.  I couldn't find any smoking gun, but
I found a couple of things that look fishy:

1. I notice that ProcReleaseAll does not act quite the same way that
ProcRelease does.  Look in particular at the logic for handling
wakeupNeeded in each one.  There is some useless code in ProcRelease,
but the bottom line is that it *will* call ProcLockWakeup() unless it
finds there are no remaining holders (and deletes the lock instead).
OTOH, ProcReleaseAll will only call ProcLockWakeup() if this test
succeeds for some lockmode:
               if (!wakeupNeeded && xidLook->holders[i] > 0 &&                   lockMethodTable->ctl->conflictTab[i] &
lock->waitMask)                  wakeupNeeded = true;
 

I spent some time trying to see a way that this might fail to wake up
a waiter who needs to be woken up.  I don't see one, but either this
code is broken or ProcRelease is doing unnecessary work.


2. The loop in ProcLockWakeup() has been broken since the day it was
written.  It's trying to scan through the list of waiting processes
and awaken all that can legitimately be awoken.  However, as soon as
it finds one that cannot be woken, it sets last_locktype = proc->token
and then continues the loop *without advancing proc*.  All subsequent
iterations will compare proc->token == last_locktype, find they are
equal, and then continue --- again without advancing proc.  It's not
an infinite loop because there's a list-length counter, but none of
the proc entries beyond the first unawakenable one will be examined.

The net effect is that the behavior is the same as it was in 6.3 or so,
when the loop was just "while (LockResolveConflicts(...) succeeds) do
remove-and-awaken-first-entry".

So the question is, is there ever a case where it is necessary to wake
processes that are in the list beyond one that can't be woken?
I haven't been able to conjure up an example, but I am suspicious.

This error cannot explain any new hangs seen in 7.0, since the effective
behavior of ProcLockWakeup() hasn't changed since Postgre95.  But
perhaps someone made a change elsewhere relying on the assumption that
ProcLockWakeup() would wake up any available waiter.


Comments anyone?
        regards, tom lane


RE: Questionable coding in proc.c & lock.c

От
"Hiroshi Inoue"
Дата:
> -----Original Message-----
> From: pgsql-hackers-owner@hub.org [mailto:pgsql-hackers-owner@hub.org]On
> Behalf Of Tom Lane
> 
> 2. The loop in ProcLockWakeup() has been broken since the day it was
> written.  It's trying to scan through the list of waiting processes
> and awaken all that can legitimately be awoken.  However, as soon as
> it finds one that cannot be woken, it sets last_locktype = proc->token
> and then continues the loop *without advancing proc*.  All subsequent
> iterations will compare proc->token == last_locktype, find they are

I pointed out it once before 6.5 and it was one of the cause of lock
freezing then. I've known it has not been changed but neglected to tell it,
sorry. However the freezing was resolved by Vadim's change and I've
thought it doesn't cause freezing. Hmm,I don't remember well now how
the freezing occured.

The following is my posting about the freezing.
Hope that helps,

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp

> There is no row-level locks: all locks over tables are
> table-level ones, btree & hash use page-level locks, but
> never do page->table level lock escalation.
>
> However, I'm not sure that proposed changes will help in the next case:
>
> session-1 => begin;
> session-1 => insert into tt values (1);    --RowExclusiveLock
>
> session-2 => begin;
> session-2 => insert into tt values (2);    --RowExclusiveLock
>
> session-3 => begin;
> session-3 => lock table tt;            --AccessExclusiveLock
>                 (conflicts with 1 & 2)
>                                 ^
> session-1 => lock table tt in share mode;    --ShareLock
>                 (conflicts with 2 & 3)
>                                     ^
> This is deadlock situation and must be handled by
> DeadLockCheck().
>

It's really a deadlock ?
Certainly end/abort of session-2 doesn't wakeup session-1/session3.
I think it's due to the following code in ProcLockWakeup().
      while ((queue_size--) && (proc))       {
               /*                * This proc will conflict as the previous one did, don't
even                * try.                */               if (proc->token == last_locktype)
continue;
               /*                * This proc conflicts with locks held by others, ignored.                */
  if (LockResolveConflicts(lockmethod,                                                                lock,
 

proc->token,                                                                proc->xid,

(XIDLookupEnt *
) NULL) != STATUS_OK)               {                       last_locktype = proc->token;
continue;              }
 

Once LockResolveConflicts() doesn't return STATUS_OK,proc
is not changed and only queue_size-- is executed(never try
to wakeup other procs).

After inserting the code such asproc = (PROC *) MAKE_PTR(proc->links.prev);
before continue statements,ProcLockWakeup() triggerd
by end/abort of session-2 could try to wakeup session-1.


RE: Questionable coding in proc.c & lock.c

От
"Hiroshi Inoue"
Дата:
> -----Original Message-----
> From: pgsql-hackers-owner@hub.org [mailto:pgsql-hackers-owner@hub.org]On
> Behalf Of Tom Lane
> 
> I spent some time looking around for possible causes of the recently
> reported deadlock conditions.  I couldn't find any smoking gun, but
> I found a couple of things that look fishy:
> 

Oops,I've forgotten another freezing issue reported by Alfred Perlstein.
We know the cause(db access in abort transaction state) of it.
Seems xact.c has been pretty changed after I examined it 2 months
ago. Have you already fixed it ? If not,I would examine it again and
fix the bug. OK ?

Here's a reproducible example.

Session-1# begin;BEGIN=# lock t;LOCK TABLE

Session-2=# begin;BEGIN=# lock t; [blocked] ^CCancel request sentERROR:  Query cancel requested while waiting
lockreindex=#select * from t;[blocked]
 

Session-1=# commit;COMMIT

Session-2ERROR:  LockRelation: LockAcquire failed=# abort;ROLLBACK=# lock t;[blocked]

Regards.

Hiroshi Inoue 


Re: Questionable coding in proc.c & lock.c

От
Tom Lane
Дата:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> Oops,I've forgotten another freezing issue reported by Alfred Perlstein.
> We know the cause(db access in abort transaction state) of it.
> Seems xact.c has been pretty changed after I examined it 2 months
> ago. Have you already fixed it ?

That example seems to work better than before, but the generic problem
is still there: we should avoid running the planner analysis phase and
the rewriter when we are in abort state.

We also need to do something about holding locks on relations clear
through from parsing till execution.  It occurs to me that there is
a problem closely related to the abort problem here: what if there
are transaction boundaries in the query string?  Suppose the query
string is
begin; select * from foo; end; select * from bar;

Currently, even if the parser did grab a lock on bar, it'd get dropped
during execution of the "end".

I think maybe what needs to be done to fix all this is to restructure
postgres.c's interface to the parser/rewriter.  What we want is to
run just the yacc grammar initially to produce a list of raw parse
trees (which is enough to detect begin/commit/rollback, no?)  Then
postgres.c walks down that list, and for each element, if it is
commit/rollback OR we are not in abort state, do parse analysis,
rewrite, planning, and execution.  (Thomas, any comments here?)
        regards, tom lane


Re: Questionable coding in proc.c & lock.c

От
Thomas Lockhart
Дата:
> I think maybe what needs to be done to fix all this is to restructure
> postgres.c's interface to the parser/rewriter.  What we want is to
> run just the yacc grammar initially to produce a list of raw parse
> trees (which is enough to detect begin/commit/rollback, no?)  Then
> postgres.c walks down that list, and for each element, if it is
> commit/rollback OR we are not in abort state, do parse analysis,
> rewrite, planning, and execution.  (Thomas, any comments here?)

Sure, why not (restructure postgres.c that is)? I was just thinking
about how to implement "autocommit" and was considering doing a hack in
analyze.c which just plops a "BEGIN" in front of the existing query. But
restructuring a bit higher up will let us make this a real feature, not
a hack (I hope ;)

btw, even gram.y does touch some of the heap cache (for pg_type) to look
for type existance; don't know if that will be a problem but maybe that
needs to be rethought also...
                  - Thomas


Re: Questionable coding in proc.c & lock.c

От
Bruce Momjian
Дата:
> > I think maybe what needs to be done to fix all this is to restructure
> > postgres.c's interface to the parser/rewriter.  What we want is to
> > run just the yacc grammar initially to produce a list of raw parse
> > trees (which is enough to detect begin/commit/rollback, no?)  Then
> > postgres.c walks down that list, and for each element, if it is
> > commit/rollback OR we are not in abort state, do parse analysis,
> > rewrite, planning, and execution.  (Thomas, any comments here?)
> 
> Sure, why not (restructure postgres.c that is)? I was just thinking
> about how to implement "autocommit" and was considering doing a hack in
> analyze.c which just plops a "BEGIN" in front of the existing query. But

Man, that is something I would do.  :-)

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Questionable coding in proc.c & lock.c

От
Tom Lane
Дата:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
> btw, even gram.y does touch some of the heap cache (for pg_type) to look
> for type existance; don't know if that will be a problem but maybe that
> needs to be rethought also...

We'd need to postpone that processing till parse analysis, else we still
have the underlying problem.  Fortunately we are not parsing C ;-) so
it seems to me it shouldn't be necessary to do any table lookups during
initial parsing...

I assume you are looking at the 'setof' processing?  Offhand it seems to
me that this code is broken anyway: use of a relation type should refer
to the tuple type, but should *not* imply SETOF, at least IMHO.
        regards, tom lane


Re: Questionable coding in proc.c & lock.c

От
Thomas Lockhart
Дата:
> I assume you are looking at the 'setof' processing?  Offhand it seems to
> me that this code is broken anyway: use of a relation type should refer
> to the tuple type, but should *not* imply SETOF, at least IMHO.

No, there is another routine (not remembering the name right now) which
is involved, I *think* from within gram.y, which barfs when called with
"opaque" as an argument (among other things). Can look up more info
later (I'm away this weekend).
                   - Thomas