Re: postmaster 8.2 eternally hangs in sempaphore lock acquiring

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: postmaster 8.2 eternally hangs in sempaphore lock acquiring
Дата
Msg-id 460CC2D0.7040806@enterprisedb.com
обсуждение исходный текст
Ответ на Re: postmaster 8.2 eternally hangs in sempaphore lock acquiring  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: postmaster 8.2 eternally hangs in sempaphore lock acquiring  (Martin Pitt <martin@piware.de>)
Re: postmaster 8.2 eternally hangs in sempaphore lock acquiring  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
Tom Lane wrote:
> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>> Ok, I think I know what's happening. In btbulkdelete we have a
>> PG_TRY-CATCH block. In the try-block, we call _bt_start_vacuum which
>> acquires and releases the BtreeVacuumLock. Under certain error
>> conditions, _bt_start_vacuum calls elog(ERROR) while holding the
>> BtreeVacuumLock. The PG_CATCH block calls _bt_end_vacuum which also
>> tries to acquire BtreeVacuumLock.
>
> This is definitely a bug (I unfortunately didn't see your message until
> after I'd replicated your reasoning...) but the word from Shuttleworth
> is that he doesn't see either of those messages in his postmaster log.
> So it seems we need another theory.  I haven't a clue at the moment though.

The error message never makes it to the log. The deadlock occurs in the
PG_CATCH-block, before rethrowing and printing the error. I added an
unconditional elog(ERROR) in _bt_start_vacuum to test it, and I'm
getting the same hang with no message in the log.

The unsafe elog while holding a lwlock pattern in _bt_vacuum_start needs
to be fixed, patch attached. We still need to figure out what's causing
the error in the first place. With the patch, we should at least get a
proper error message and not hang when the error occurs.

Martin: Would it be possible for you to reproduce the problem with a
patched version?

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
Index: src/backend/access/nbtree/nbtutils.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/nbtree/nbtutils.c,v
retrieving revision 1.79
diff -c -r1.79 nbtutils.c
*** src/backend/access/nbtree/nbtutils.c    4 Oct 2006 00:29:49 -0000    1.79
--- src/backend/access/nbtree/nbtutils.c    30 Mar 2007 07:55:36 -0000
***************
*** 998,1016 ****
--- 998,1023 ----
          vac = &btvacinfo->vacuums[i];
          if (vac->relid.relId == rel->rd_lockInfo.lockRelId.relId &&
              vac->relid.dbId == rel->rd_lockInfo.lockRelId.dbId)
+         {
+             LWLockRelease(BtreeVacuumLock);
              elog(ERROR, "multiple active vacuums for index \"%s\"",
                   RelationGetRelationName(rel));
+         }
      }

      /* OK, add an entry */
      if (btvacinfo->num_vacuums >= btvacinfo->max_vacuums)
+     {
+         LWLockRelease(BtreeVacuumLock);
          elog(ERROR, "out of btvacinfo slots");
+     }
      vac = &btvacinfo->vacuums[btvacinfo->num_vacuums];
      vac->relid = rel->rd_lockInfo.lockRelId;
      vac->cycleid = result;
      btvacinfo->num_vacuums++;

      LWLockRelease(BtreeVacuumLock);
+
      return result;
  }


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: postmaster 8.2 eternally hangs in sempaphore lock acquiring
Следующее
От: Martin Pitt
Дата:
Сообщение: Re: postmaster 8.2 eternally hangs in sempaphore lock acquiring