On 2016-02-24 17:52:37 -0300, Alvaro Herrera wrote:
> chris.tessels@inergy.nl wrote:
>
> > Core was generated by `postgres: mailinfo_ow mailinfo_ods 10.50.6.6(4188'.
> > Program terminated with signal 11, Segmentation fault.
> >
> > #0 MinimumActiveBackends (min=50) at procarray.c:2472
> > 2472 if (pgxact->xid == InvalidTransactionId)
>
> It's not surprising that you're not able to make this crash
> consistently, because it looks like the problem might be in concurrent
> modifications to the PGXACT array. This routine, MinimumActiveBackends,
> walks the PGPROC array explicitely without locks. There are comments
> indicating that this is safe, but evidently something has slipped in
> there.
>
> Apparently this code is trying to dereference an invalid pgxact, but
> it's not clear to me how this happens. Those structs are allocated in
> advance, and they are referenced in the code via array indexes, so even
> if the pgxact doesn't actually hold data about a valid transaction,
> dereferencing the XID shouldn't cause a crash.
Well, that code is pretty, uh, questionable. E.g. for
int pgprocno = arrayP->pgprocnos[index];
volatile PGPROC *proc = &allProcs[pgprocno];
volatile PGXACT *pgxact = &allPgXact[pgprocno];
there's no guarantee that pgprocno is actually the same index for both
lookups and the following
if (pgprocno == -1)
continue; /* do not count deleted entries */
check. It's perfectly reasonable for a compiler to reload pgprocno from
memory, or just always reference it via memory.
I presume what happened here is that initially arrayP->pgprocnos[index]
was -1, but by the time if (pgprocno == -1) is reached, it changed to a
different value.
It's also really crummy that we're doing the PGPROC/PGXACT lookups
before checking whether pgprocno is -1.
At the very least ISTM that we have to make pgprocno volatile (or use a
memory barrier - but we don't have sufficient support for those in the
older branches), and move the PGPROC/PGXACT lookups after the == -1
check.
Andres