Обсуждение: HEAD crashes with assertion and LWLOCK_STATS enabled
Hi hackers, I found a bug that causes a crash when assertion is enabled and LWLOCK_STATS is defined. I've tested with Debian 7.5 (3.2.0-4-amd64) on VMware fusion 6, but this bug seems to be platform-independent and shouldreproduce in other environments. A patch to fix the bug is also attached. ## Reproduing a crash You can reproduce a crash by this way: git co a0841ecd2518d4505b96132b764b918ab5d21ad4 git clean -dfx ./configure --enable-cassert CFLAGS='-DLWLOCK_STATS' make check In my environment, the following messages appeared. ( omit... ) ../../../src/test/regress/pg_regress --inputdir=. --temp-install=./tmp_check --top-builddir=../../.. --dlpath=. --schedule=./parallel_schedule ============== creating temporary installation ============== ============== initializing database system ============== pg_regress: initdb failed and initdb.log contained the following messages. reating directory /tmp/pghead/src/test/regress/./tmp_check/data ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers ... 128MB selecting dynamic shared memory implementation ... posix creating configuration files ... ok creating template1 database in /tmp/pghead/src/test/regress/./tmp_check/data/base/1 ... PID 48239 lwlock main 142: shacq0 exacq 1 blk 0 spindelay 0 ( omit... ) PID 48247 lwlock main 33058: shacq 0 exacq 1 blk 0 spindelay 0 PID 48247 lwlock main 33005: shacq 0 exacq 48 blk 0 spindelay 0 ok loading system objects' descriptions ... TRAP: FailedAssertion("!(CritSectionCount == 0 || (context) == ErrorContext|| (MyAuxProcType == CheckpointerProcess))", File: "mcxt.c", Line: 594) Aborted (core dumped) child process exited with exit code 134 initdb: data directory "/tmp/pghead/src/test/regress/./tmp_check/data" not removed at user's request ## The cause of crash The failing assertion is for prohibiting memory allocation in a critical section, which is introduced by commit 4a170ee9on 2014-04-04. In my understanding, the root cause of the assertion failure is on-demand allocation of lwlock_stats entry. For each LWLock,a lwlock_stats entry is created at the first invocation of LWLockAcquire using MemoryContextAlloc. If the first invocationis in a critical section, the assertion fails. For 'initdb' case I mentioned above, WALWriteLock locking in XLogFlush function was the problem. I also confirmed the assertion failure on starting postgres on a correctly initialized database. In this case, locking CheckpointerCommLockin AbsorbFsyncRequests function was the problem. ## A solution In order to avoid memory allocation during critical sections, lwlock_stats hash table should be populated at the initializationof each process. The attached patch populate lwlock_stats entries of MainLWLockArray at the end of CreateLWLocks, InitProcess and InitAuxiliaryProcess. With this patch, all regression tests can be passed so far, but I think this patch is not perfect because it does not coverLWLocks outside of MainLWLockArray. I'm not sure where is the right place to initialize lwlock_stats entries for thatlocks. So I feel it needs some refinements by you hackers.
Вложения
On Tue, May 20, 2014 at 4:02 AM, Yuto HAYAMIZU <y.hayamizu@gmail.com> wrote: > The failing assertion is for prohibiting memory allocation in a critical section, which is introduced by commit 4a170ee9on 2014-04-04. > > In my understanding, the root cause of the assertion failure is on-demand allocation of lwlock_stats entry. For each LWLock,a lwlock_stats entry is created at the first invocation of LWLockAcquire using MemoryContextAlloc. If the first invocationis in a critical section, the assertion fails. > > For 'initdb' case I mentioned above, WALWriteLock locking in XLogFlush function was the problem. > I also confirmed the assertion failure on starting postgres on a correctly initialized database. In this case, lockingCheckpointerCommLock in AbsorbFsyncRequests function was the problem. > > ## A solution > > In order to avoid memory allocation during critical sections, lwlock_stats hash table should be populated at the initializationof each process. > The attached patch populate lwlock_stats entries of MainLWLockArray at the end of CreateLWLocks, InitProcess and InitAuxiliaryProcess. > > With this patch, all regression tests can be passed so far, but I think this patch is not perfect because it does not coverLWLocks outside of MainLWLockArray. I'm not sure where is the right place to initialize lwlock_stats entries for thatlocks. So I feel it needs some refinements by you hackers. Prior to my commit ea9df812d8502fff74e7bc37d61bdc7d66d77a7f, which introduced LWLockTranche, we used to allocate enough storage for all of the LWLOCK_STATS entries in the system; that commit changed things so that we allocate entries for particular LWLocks on an as-needed basis. Although that wasn't the main point of that patch, I thought it was a nice idea, since it might save you quite a bit of memory if you have a lot of backends that don't touch very many LWLocks. But maybe we need to give up on that in view of this report. I don't think we should adopt the approach proposed in this patch, though, because if we're going to preallocate all of the entries anyway there's little reason to use a hash table instead of an array. If we're going to go with the approach of preallocating all the entries, maybe we should change the definition of LWLockTranche to include the number of lwlocks in the tranche. We could then add another array parallel to LWLockTrancheArray which would point to an appropriately-sized array of lwlock_stats objects for each tranche. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, May 23, 2014 at 9:38 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, May 20, 2014 at 4:02 AM, Yuto HAYAMIZU <y.hayamizu@gmail.com> wrote: >> The failing assertion is for prohibiting memory allocation in a critical section, which is introduced by commit 4a170ee9on 2014-04-04. This issue is still in open item list for 9.4. But the commit which had caused this issue was reverted by d27d493. So I removed this from the Open Item list. Regards, -- Fujii Masao