Re: ALTER TABLE lock strength reduction patch is unsafe

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: ALTER TABLE lock strength reduction patch is unsafe
Дата
Msg-id CA+U5nMJ_noDR-YZMVuXb5NaXf=GeYyMUkhDj+dGhOA1E7XvPjQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: ALTER TABLE lock strength reduction patch is unsafe  (Noah Misch <noah@leadboat.com>)
Ответы Re: ALTER TABLE lock strength reduction patch is unsafe  (Peter Geoghegan <pg@heroku.com>)
Список pgsql-hackers
On 1 August 2013 01:53, Noah Misch <noah@leadboat.com> wrote:
> On Mon, Jul 15, 2013 at 05:50:40PM +0100, Simon Riggs wrote:
>> On 15 July 2013 15:06, Robert Haas <robertmhaas@gmail.com> wrote:
>> > Generally, the question on the table is: to what extent do MVCC
>> > catalog scans make the world safe for concurrent DDL, or put
>> > negatively, what hazards remain?
>>
>> On Tom's test I've been unable to find a single problem.
>>
>> > Noah discovered an interesting one recently: apparently, the relcache
>> > machinery has some logic in it that depends on the use of
>> > AccessExclusiveLock in subtle ways.  I'm going to attempt to explain
>> > it, and maybe he can jump in and explain it better.  Essentially, the
>> > problem is that when a relcache reload occurs, certain data structures
>> > (like the tuple descriptor, but there are others) are compared against
>> > the old version of the same data structure.  If there are no changes,
>> > we do nothing; else, we free the old one and install the new one.  The
>> > reason why we don't free the old one and install the new one
>> > unconditionally is because other parts of the backend might have
>> > pointers to the old data structure, so just replacing it all the time
>> > would result in crashes.  Since DDL requires AccessExclusiveLock +
>> > CheckTableNotInUse(), any actual change to the data structure will
>> > happen when we haven't got any pointers anyway.
>
>> If you look at this as a generalised problem you probably can find
>> some issues, but that doesn't mean they apply in the specific cases
>> we're addressing.
>>
>> The lock reductions we are discussing in all cases have nothing at all
>> to do with structure and only relate to various options. Except in the
>> case of constraints, though even there I see no issues as yet.
>
> I was able to distill the above hypothesis into an actual crash with
> reduce_lock_levels.v13.patch.  Test recipe:
>
> 1. Build with --enable-cassert and with -DCATCACHE_FORCE_RELEASE=1.  An
> AcceptInvalidationMessages() will then happen at nearly every syscache lookup,
> making it far easier to hit a problem of this sort.
>
> 2. Run these commands as setup:
>   create table root (c int);
>   create table part (check (c > 0), check (c > 0)) inherits (root);
>
> 3. Attach a debugger to your session and set a breakpoint at plancat.c:660 (as
> of commit 16f38f72ab2b8a3b2d45ba727d213bb31111cea4).
>
> 4. Run this in your session; the breakpoint will trip:
>   select * from root where c = -1;
>
> 5. Start another session and run:
>   alter table part add check (c > 0);
>
> 6. Exit the debugger to release the first session.  It will crash.
>
> plancache.c:657 stashes a pointer to memory belonging to the rd_att of a
> relcache entry.  It then proceeds to call eval_const_expressions(), which
> performs a syscache lookup in its simplify_function() subroutine.  Under
> CATCACHE_FORCE_RELEASE, the syscache lookup reliably prompts an
> AcceptInvalidationMessages().  The ensuing RelationClearRelation() against
> "part" notices the new constraint, decides keep_tupdesc = false, and frees the
> existing tupdesc.  plancache.c is now left holding a pointer into freed
> memory.  The next loop iteration will crash when it dereferences a pointer
> stored within that freed memory at plancat.c:671.
>
>
> A remediation strategy that seemed attractive when I last contemplated this
> problem is to repoint rd_att immediately but arrange to free the obsolete
> TupleDesc in AtEOXact_RelationCache().

v15 to fix the above problem.

Looking at other potential problems around plancache but nothing found as yet.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "MauMau"
Дата:
Сообщение: Re: [bug fix] PostgreSQL fails to start on Windows if it crashes after tablespace creation
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance