Re: ALTER TABLE lock strength reduction patch is unsafe

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: ALTER TABLE lock strength reduction patch is unsafe
Дата
Msg-id BANLkTikpkyKBSY41A_cLDW2QE7rbGLqv=g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: ALTER TABLE lock strength reduction patch is unsafe  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: ALTER TABLE lock strength reduction patch is unsafe  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Mon, Jun 20, 2011 at 5:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sun, Jun 19, 2011 at 5:13 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> We scan pg_class in two ways: to rebuild a relcache entry based on a
>>> relation's oid (easy fix). We also scan pg_class to resolve the name
>>> to oid mapping. The name to oid mapping is performed *without* a lock
>>> on the relation, since we don't know which relation to lock. So the
>>> name lookup can fail if we are in the middle of a pg_class update.
>>> This is an existing potential bug in Postgres unrelated to my patch.
>
>> If this is a pre-existing bug, then it's not clear to me why we need
>> to do anything about it at all right now.
>
> Yeah.  This behavior has been there since day zero, and there have been
> very few complaints about it.  But note that there's only a risk for
> pg_class updates, not any other catalog, and there is exactly one kind
> of failure with very predictable consequences.  The ALTER TABLE patch
> has greatly expanded the scope of the issue, and that *is* a regression
> compared to prior releases.

It's not entirely clear to me how many additional failure cases we've
bought ourselves with this patch.  The particular one you've
demonstrated seems pretty similar to the on we already had, although
possibly the window for it is wider.  Did you run the same test you
used on 9.1 on 9.0 for comparison?

> BTW, it seems to me that this issue is closely related to what Noah is
> trying to fix here:
> http://archives.postgresql.org/message-id/20110612191843.GF21098@tornado.leadboat.com

I think there are two general problems here.  One, we use SnapshotNow
to scan system catalogs, and SnapshotNow semantics are a mess.  Two,
even if we used an MVCC snapshot to scan the system catalogs, it's not
necessarily safe or correct to latch onto an old row version, because
the row update is combined with other actions that are not MVCC-safe,
like removing files on disk.  Breaking it down a bit more:

A. The problem with using a DDL lock level < AccessExclusiveLock,
AFAICS, is entirely due to SnapshotNow semantics.  Any row version we
can possibly see is OK, but we had better see exactly one.  Or at
least not less than one.

B. The problem with name lookups failing in the middle of a pg_class
update is also entirely due to SnapshotNow semantics.

C. The problem Noah is complaining about is NOT due to SnapshotNow
semantics.  If someone does BEGIN; DROP TABLE foo; CREATE TABLE foo();
COMMIT, it is no longer OK for anyone to be looking at the old foo.
An MVCC snapshot is enough to guarantee that we'll see some row, but
examining the old one won't cut it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: Re: [COMMITTERS] pgsql: Fixed string in German translation that causes segfault.
Следующее
От: Robert Haas
Дата:
Сообщение: Re: ALTER TABLE lock strength reduction patch is unsafe