Re: Checkpointer crashes on slave in 9.4 on windows

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Checkpointer crashes on slave in 9.4 on windows
Дата
Msg-id CA+TgmoZeFqBFtyCZTQSA+gqcyspvQ4KNui1Ggw7ab_bQ32qzdw@mail.gmail.com
обсуждение исходный текст
Ответ на Checkpointer crashes on slave in 9.4 on windows  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Checkpointer crashes on slave in 9.4 on windows  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Mon, Jul 21, 2014 at 4:16 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> During internals tests, it is observed that checkpointer
> is getting crashed on slave with below log on slave in
> windows:
>
> LOG:  checkpointer process (PID 4040) was terminated by exception 0xC0000005
> HINT:  See C include file "ntstatus.h" for a description of the hexadecimal
> value.
> LOG:  terminating any other active server processes
>
> I debugged and found that it is happening when checkpointer
> tries to update shared memory config and below is the
> call stack.
>
>> postgres.exe!LWLockAcquireCommon(LWLock * l=0x0000000000000000, LWLockMode
>> mode=LW_EXCLUSIVE, unsigned __int64 * valptr=0x0000000000000020, unsigned
>> __int64 val=18446744073709551615)  Line 579 + 0x14 bytes C
>   postgres.exe!LWLockAcquireWithVar(LWLock * l=0x0000000000000000, unsigned
> __int64 * valptr=0x0000000000000020, unsigned __int64
> val=18446744073709551615)  Line 510 C
>   postgres.exe!WALInsertLockAcquireExclusive()  Line 1627 C
>   postgres.exe!UpdateFullPageWrites()  Line 9037 C
>   postgres.exe!UpdateSharedMemoryConfig()  Line 1364 C
>   postgres.exe!CheckpointerMain()  Line 359 C
>   postgres.exe!AuxiliaryProcessMain(int argc=2, char * *
> argv=0x00000000007d2180)  Line 427 C
>   postgres.exe!SubPostmasterMain(int argc=4, char * *
> argv=0x00000000007d2170)  Line 4635 C
>   postgres.exe!main(int argc=4, char * * argv=0x00000000007d2170)  Line 207
> C
>
> Basically, here the issue is that during startup when
> checkpointer tries to acquire WAL Insertion Locks to
> update the value of fullPageWrites, it crashes because
> the same is still not initialized. It will be initialized in
> InitXLOGAccess() which will get called via RecoveryInProgress()
> in case recovery is in progress before doing actual checkpoint.
> However we are trying to access it before that which leads to
> crash.
>
> I think the reason why it occurs only on windows is that
> on linux fork will ensure that WAL Insertion Locks get
> initialized with same values as postmaster.
>
> To fix this issue, we need to ensure that WAL Insertion
> Locks should get initialized before we use them, so one of
> the ways is to call InitXLOGAccess() before calling
> CheckPointerMain() as I have done in attached patch, other
> could be to call RecoveryInProgess() much earlier in path
> than now.

So, this problem was introduced by Heikki's commit,
68a2e52bbaf98f136a96b3a0d734ca52ca440a95, to replace XLogInsert slots
with regular LWLocks.   I think the problem here is that the
initialization code here really doesn't belong in InitXLOGAccess at
all:

1. I think WALInsertLocks is just another global variable that needs
to be saved and restored in EXEC_BACKEND mode and that it therefore
ought to participate in the save_backend_variables() mechanism instead
of having its own special-purpose mechanism to save and restore the
value.

2. And I think that the LWLockRegisterTranche call belongs in
XLOGShmeInit(), so that it's parallel to the other call in
CreateLWLocks.

I think that would be more robust, because while your fix will
definitely work, we could easily reintroduce a similar
platform-specific bug for some other auxiliary process.  Using the
mechanisms described above will mean that this is set up properly for
everything that's attached to shared memory at all.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: Doing better at HINTing an appropriate column within errorMissingColumn()
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Shapes on the regression test for polygon