Обсуждение: HEAD crashes with assertion and LWLOCK_STATS enabled

Поиск
Список
Период
Сортировка

HEAD crashes with assertion and LWLOCK_STATS enabled

От
Yuto HAYAMIZU
Дата:
Hi hackers,

I found a bug that causes a crash when assertion is enabled and LWLOCK_STATS is defined.
I've tested with Debian 7.5 (3.2.0-4-amd64) on VMware fusion 6, but this bug seems to be platform-independent and
shouldreproduce in other environments. 
A patch to fix the bug is also attached.

## Reproduing a crash

You can reproduce a crash by this way:

    git co a0841ecd2518d4505b96132b764b918ab5d21ad4
    git clean -dfx
    ./configure --enable-cassert CFLAGS='-DLWLOCK_STATS'
    make check

In my environment, the following messages appeared.

    ( omit... )
    ../../../src/test/regress/pg_regress --inputdir=. --temp-install=./tmp_check --top-builddir=../../..    --dlpath=.
--schedule=./parallel_schedule
    ============== creating temporary installation        ==============
    ============== initializing database system           ==============

    pg_regress: initdb failed

and initdb.log contained the following messages.

    reating directory /tmp/pghead/src/test/regress/./tmp_check/data ... ok
    creating subdirectories ... ok
    selecting default max_connections ... 100
    selecting default shared_buffers ... 128MB
    selecting dynamic shared memory implementation ... posix
    creating configuration files ... ok
    creating template1 database in /tmp/pghead/src/test/regress/./tmp_check/data/base/1 ... PID 48239 lwlock main 142:
shacq0 exacq 1 blk 0 spindelay 0 
    ( omit... )
    PID 48247 lwlock main 33058: shacq 0 exacq 1 blk 0 spindelay 0
    PID 48247 lwlock main 33005: shacq 0 exacq 48 blk 0 spindelay 0
    ok
    loading system objects' descriptions ... TRAP: FailedAssertion("!(CritSectionCount == 0 || (context) ==
ErrorContext|| (MyAuxProcType == CheckpointerProcess))", File: "mcxt.c", Line: 594) 
    Aborted (core dumped)
    child process exited with exit code 134
    initdb: data directory "/tmp/pghead/src/test/regress/./tmp_check/data" not removed at user's request

## The cause of crash

The failing assertion is for prohibiting memory allocation in a critical section, which is introduced by commit
4a170ee9on 2014-04-04. 

In my understanding, the root cause of the assertion failure is on-demand allocation of lwlock_stats entry.  For each
LWLock,a lwlock_stats entry is created at the first invocation of LWLockAcquire using MemoryContextAlloc.  If the first
invocationis in a critical section, the assertion fails. 

For 'initdb' case I mentioned above, WALWriteLock locking in XLogFlush function was the problem.
I also confirmed the assertion failure on starting postgres on a correctly initialized database. In this case, locking
CheckpointerCommLockin AbsorbFsyncRequests function was the problem. 

## A solution

In order to avoid memory allocation during critical sections, lwlock_stats hash table should be populated at the
initializationof each process. 
The attached patch populate lwlock_stats entries of MainLWLockArray at the end of CreateLWLocks, InitProcess and
InitAuxiliaryProcess.

With this patch, all regression tests can be passed so far, but I think this patch is not perfect because it does not
coverLWLocks outside of MainLWLockArray.  I'm not sure where is the right place to initialize lwlock_stats entries for
thatlocks.  So I feel it needs some refinements by you hackers. 

Вложения

Re: HEAD crashes with assertion and LWLOCK_STATS enabled

От
Robert Haas
Дата:
On Tue, May 20, 2014 at 4:02 AM, Yuto HAYAMIZU <y.hayamizu@gmail.com> wrote:
> The failing assertion is for prohibiting memory allocation in a critical section, which is introduced by commit
4a170ee9on 2014-04-04. 
>
> In my understanding, the root cause of the assertion failure is on-demand allocation of lwlock_stats entry.  For each
LWLock,a lwlock_stats entry is created at the first invocation of LWLockAcquire using MemoryContextAlloc.  If the first
invocationis in a critical section, the assertion fails. 
>
> For 'initdb' case I mentioned above, WALWriteLock locking in XLogFlush function was the problem.
> I also confirmed the assertion failure on starting postgres on a correctly initialized database. In this case,
lockingCheckpointerCommLock in AbsorbFsyncRequests function was the problem. 
>
> ## A solution
>
> In order to avoid memory allocation during critical sections, lwlock_stats hash table should be populated at the
initializationof each process. 
> The attached patch populate lwlock_stats entries of MainLWLockArray at the end of CreateLWLocks, InitProcess and
InitAuxiliaryProcess.
>
> With this patch, all regression tests can be passed so far, but I think this patch is not perfect because it does not
coverLWLocks outside of MainLWLockArray.  I'm not sure where is the right place to initialize lwlock_stats entries for
thatlocks.  So I feel it needs some refinements by you hackers. 

Prior to my commit ea9df812d8502fff74e7bc37d61bdc7d66d77a7f, which
introduced LWLockTranche, we used to allocate enough storage for all
of the LWLOCK_STATS entries in the system; that commit changed things
so that we allocate entries for particular LWLocks on an as-needed
basis.  Although that wasn't the main point of that patch, I thought
it was a nice idea, since it might save you quite a bit of memory if
you have a lot of backends that don't touch very many LWLocks.  But
maybe we need to give up on that in view of this report.

I don't think we should adopt the approach proposed in this patch,
though, because if we're going to preallocate all of the entries
anyway there's little reason to use a hash table instead of an array.
If we're going to go with the approach of preallocating all the
entries, maybe we should change the definition of LWLockTranche to
include the number of lwlocks in the tranche.  We could then add
another array parallel to LWLockTrancheArray which would point to an
appropriately-sized array of lwlock_stats objects for each tranche.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: HEAD crashes with assertion and LWLOCK_STATS enabled

От
Fujii Masao
Дата:
On Fri, May 23, 2014 at 9:38 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, May 20, 2014 at 4:02 AM, Yuto HAYAMIZU <y.hayamizu@gmail.com> wrote:
>> The failing assertion is for prohibiting memory allocation in a critical section, which is introduced by commit
4a170ee9on 2014-04-04.
 

This issue is still in open item list for 9.4. But the commit which had
caused this issue was reverted by d27d493. So I removed this from the
Open Item list.

Regards,

-- 
Fujii Masao