Re: BUG #13985: Segmentation fault on PREPARE TRANSACTION

Поиск
Список
Период
Сортировка
От Shulgin, Oleksandr
Тема Re: BUG #13985: Segmentation fault on PREPARE TRANSACTION
Дата
Msg-id CACACo5RdOOZLS2fRELPO77Ob0jK+2k8mhFVJ2UBUS157FTp7rA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #13985: Segmentation fault on PREPARE TRANSACTION  (Andres Freund <andres@anarazel.de>)
Ответы Re: BUG #13985: Segmentation fault on PREPARE TRANSACTION
Re: BUG #13985: Segmentation fault on PREPARE TRANSACTION
Список pgsql-bugs
On Wed, Feb 24, 2016 at 10:52 PM, Andres Freund <andres@anarazel.de> wrote:

> On 2016-02-24 17:52:37 -0300, Alvaro Herrera wrote:
> > chris.tessels@inergy.nl wrote:
> >
> > >     Core was generated by `postgres: mailinfo_ow mailinfo_ods
> 10.50.6.6(4188'.
> > >     Program terminated with signal 11, Segmentation fault.
> > >
> > >     #0  MinimumActiveBackends (min=50) at procarray.c:2472
> > >     2472                    if (pgxact->xid == InvalidTransactionId)
> >
> > It's not surprising that you're not able to make this crash
> > consistently, because it looks like the problem might be in concurrent
> > modifications to the PGXACT array.  This routine, MinimumActiveBackends,
> > walks the PGPROC array explicitely without locks.  There are comments
> > indicating that this is safe, but evidently something has slipped in
> > there.
> >
> > Apparently this code is trying to dereference an invalid pgxact, but
> > it's not clear to me how this happens.  Those structs are allocated in
> > advance, and they are referenced in the code via array indexes, so even
> > if the pgxact doesn't actually hold data about a valid transaction,
> > dereferencing the XID shouldn't cause a crash.
>
> Well, that code is pretty, uh, questionable. E.g. for
>                 int                     pgprocno =
> arrayP->pgprocnos[index];
>                 volatile PGPROC *proc = &allProcs[pgprocno];
>                 volatile PGXACT *pgxact = &allPgXact[pgprocno];
> there's no guarantee that pgprocno is actually the same index for both
> lookups and the following
>                 if (pgprocno == -1)
>                         continue;                       /* do not count
> deleted entries */
> check.  It's perfectly reasonable for a compiler to reload pgprocno from
> memory, or just always reference it via memory.
>

Hm...  this is against my understanding of what a compiler could (or
should) do.   Do you have a documentation reference or other evidence?

A naive disassemble dump on-my-box(tm) suggests that pgprocno is stored on
stack (with offset -0x1c) and is referenced from there:
   ...
   0x00000000007f8011 <+53>: mov    -0x18(%rbp),%rax
   0x00000000007f8015 <+57>: mov    -0x20(%rbp),%edx
   0x00000000007f8018 <+60>: movslq %edx,%rdx
   0x00000000007f801b <+63>: add    $0x8,%rdx
   0x00000000007f801f <+67>: mov    0x8(%rax,%rdx,4),%eax
   0x00000000007f8023 <+71>: mov    %eax,-0x1c(%rbp)
   0x00000000007f8026 <+74>: mov    0x67b15b(%rip),%rdx        # 0xe73188
<allProcs>
   0x00000000007f802d <+81>: mov    -0x1c(%rbp),%eax    # <== here
   0x00000000007f8030 <+84>: cltq
   0x00000000007f8032 <+86>: imul   $0x338,%rax,%rax
   0x00000000007f8039 <+93>: add    %rdx,%rax
   0x00000000007f803c <+96>: mov    %rax,-0x10(%rbp)
   0x00000000007f8040 <+100>: mov    0x67b149(%rip),%rcx        # 0xe73190
<allPgXact>
   0x00000000007f8047 <+107>: mov    -0x1c(%rbp),%eax    # <== here
   0x00000000007f804a <+110>: movslq %eax,%rdx
   0x00000000007f804d <+113>: mov    %rdx,%rax
   0x00000000007f8050 <+116>: add    %rax,%rax
   0x00000000007f8053 <+119>: add    %rdx,%rax
   0x00000000007f8056 <+122>: shl    $0x2,%rax
   0x00000000007f805a <+126>: add    %rcx,%rax
   0x00000000007f805d <+129>: mov    %rax,-0x8(%rbp)
   0x00000000007f8061 <+133>: cmpl   $0xffffffff,-0x1c(%rbp)    # <== and
here
   0x00000000007f8065 <+137>: je     0x7f80a4 <MinimumActiveBackends+200>
   ...

I presume what happened here is that initially arrayP->pgprocnos[index]
> was -1, but by the time if (pgprocno == -1) is reached, it changed to a
> different value.
>
> It's also really crummy that we're doing the PGPROC/PGXACT lookups
> before checking whether pgprocno is -1.
>

No doubt here.

At the very least ISTM that we have to make pgprocno volatile (or use a
> memory barrier - but we don't have sufficient support for those in the
> older branches), and move the PGPROC/PGXACT lookups after the == -1
> check.
>

Use of volatile doesn't change the resulting code dramatically for me.

The above is when compiling without optimizations.  With -Og I'm getting
similar code for the variant with volatile and if no volatile, pgprocno is
just loaded into a register as one would expect.

--
Alex

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Brian Ghidinelli
Дата:
Сообщение: Re: BUG #13970: Vacuum hangs on particular table; cannot be terminated - requires `kill -QUIT pid` [WORKAROUND]
Следующее
От: Tom Lane
Дата:
Сообщение: Re: BUG #13985: Segmentation fault on PREPARE TRANSACTION