Обсуждение: self-deadlock at FATAL exit of boostrap process on read error
I encounter a situation that the server can't shutdown when a boostrap
process does ReadBuffer() but gets an read error. I guess the problem may be
like this - the boostrap process can't read at line:
smgrread(reln->rd_smgr, blockNum, (char *) bufBlock);
So it does a FATAL exit and shmem_exit() is called:
while (--on_shmem_exit_index >= 0) (*on_shmem_exit_list[on_shmem_exit_index].function) (code,
on_shmem_exit_list[on_shmem_exit_index].arg);
Where on_shmem_exit_list[0] = DummyProcKill on_shmem_exit_list[1] = AtProcExit_Buffers
The above callback is called in a stack order, so AtProcExit_Buffers() will
call AbortBufferIO() which is blocked by itself on "io_in_progress_lock"
(which is not the case as the comment says "since LWLockReleaseAll has
already been called, we're not holding the buffer's io_in_progress_lock").
There may other similar problems for bootstrap process like this, so I am
not sure the best fix for this ...
Regards,
Qingqing
"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes:
> I encounter a situation that the server can't shutdown when a boostrap
> process does ReadBuffer() but gets an read error.
Hm, AtProcExit_Buffers is assuming that we've done AbortTransaction,
but the WAL-replay process doesn't do that because it's not running a
transaction. Seems like we need to stack another on-proc-exit function
to do the appropriate subset of AbortTransaction ... LWLockReleaseAll at
least, not sure what else.
Do you have a test case to reproduce this problem?
regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> wrote
>
> Do you have a test case to reproduce this problem?
>
According to the error message, the problem happens during reading
pg_database. I just tried to plug in this line in mdread():
+ /* pretend there is an error reading pg_database */
+ if (reln->smgr_rnode.relNode == 1262)
+ {
+ fprintf(stderr, "Ooops \n");
+ return false;
+ }
v = _mdfd_getseg(reln, blocknum, false);
And it works.
Regards,
Qingqing
"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> wrote
>> Do you have a test case to reproduce this problem?
> According to the error message, the problem happens during reading
> pg_database. I just tried to plug in this line in mdread():
OK, patch applied for this.
regards, tom lane