HEAD crashes with assertion and LWLOCK_STATS enabled

Поиск
Список
Период
Сортировка
От Yuto HAYAMIZU
Тема HEAD crashes with assertion and LWLOCK_STATS enabled
Дата
Msg-id 537B0C2E.6090706@gmail.com
обсуждение исходный текст
Ответы Re: HEAD crashes with assertion and LWLOCK_STATS enabled
Список pgsql-hackers
Hi hackers,

I found a bug that causes a crash when assertion is enabled and LWLOCK_STATS is defined.
I've tested with Debian 7.5 (3.2.0-4-amd64) on VMware fusion 6, but this bug seems to be platform-independent and
shouldreproduce in other environments. 
A patch to fix the bug is also attached.

## Reproduing a crash

You can reproduce a crash by this way:

    git co a0841ecd2518d4505b96132b764b918ab5d21ad4
    git clean -dfx
    ./configure --enable-cassert CFLAGS='-DLWLOCK_STATS'
    make check

In my environment, the following messages appeared.

    ( omit... )
    ../../../src/test/regress/pg_regress --inputdir=. --temp-install=./tmp_check --top-builddir=../../..    --dlpath=.
--schedule=./parallel_schedule
    ============== creating temporary installation        ==============
    ============== initializing database system           ==============

    pg_regress: initdb failed

and initdb.log contained the following messages.

    reating directory /tmp/pghead/src/test/regress/./tmp_check/data ... ok
    creating subdirectories ... ok
    selecting default max_connections ... 100
    selecting default shared_buffers ... 128MB
    selecting dynamic shared memory implementation ... posix
    creating configuration files ... ok
    creating template1 database in /tmp/pghead/src/test/regress/./tmp_check/data/base/1 ... PID 48239 lwlock main 142:
shacq0 exacq 1 blk 0 spindelay 0 
    ( omit... )
    PID 48247 lwlock main 33058: shacq 0 exacq 1 blk 0 spindelay 0
    PID 48247 lwlock main 33005: shacq 0 exacq 48 blk 0 spindelay 0
    ok
    loading system objects' descriptions ... TRAP: FailedAssertion("!(CritSectionCount == 0 || (context) ==
ErrorContext|| (MyAuxProcType == CheckpointerProcess))", File: "mcxt.c", Line: 594) 
    Aborted (core dumped)
    child process exited with exit code 134
    initdb: data directory "/tmp/pghead/src/test/regress/./tmp_check/data" not removed at user's request

## The cause of crash

The failing assertion is for prohibiting memory allocation in a critical section, which is introduced by commit
4a170ee9on 2014-04-04. 

In my understanding, the root cause of the assertion failure is on-demand allocation of lwlock_stats entry.  For each
LWLock,a lwlock_stats entry is created at the first invocation of LWLockAcquire using MemoryContextAlloc.  If the first
invocationis in a critical section, the assertion fails. 

For 'initdb' case I mentioned above, WALWriteLock locking in XLogFlush function was the problem.
I also confirmed the assertion failure on starting postgres on a correctly initialized database. In this case, locking
CheckpointerCommLockin AbsorbFsyncRequests function was the problem. 

## A solution

In order to avoid memory allocation during critical sections, lwlock_stats hash table should be populated at the
initializationof each process. 
The attached patch populate lwlock_stats entries of MainLWLockArray at the end of CreateLWLocks, InitProcess and
InitAuxiliaryProcess.

With this patch, all regression tests can be passed so far, but I think this patch is not perfect because it does not
coverLWLocks outside of MainLWLockArray.  I'm not sure where is the right place to initialize lwlock_stats entries for
thatlocks.  So I feel it needs some refinements by you hackers. 

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Rohit Goyal
Дата:
Сообщение: Re: Error in running DBT2
Следующее
От: David Rowley
Дата:
Сообщение: Re: Allowing join removals for more join types