ITAGAKI Takahiro wrote:
> Heikki Linnakangas <heikki@enterprisedb.com> wrote:
>
>> Now that the CheckpointStartLock starvation has been taken care of, I'm
>> seeing another problem with checkpoints in my test run: mdsync never
>> finishes.
>>
>> My proposed fix is to make a copy of pendingOpsTable before entering the
>> loop. AbsorbFsyncRequest will put new requests to a fresh new
>> pendingOpsTable, while the mdsync loop will drain the copy. I'll write a
>> patch along those lines if there's no better ideas.
>
> Yeah, I'm also anxious about the stuck. I wrote a fix to use a copy of
> pendingOpsTable as you said, when I implemented Load distributed checkpoint
> patch. (http://momjian.us/mhonarc/patches/msg00025.html) It would make me
> very happy if you review my patch and check whether my fix is proper.
>
> There was another reason to fix it in my patch. I wanted to fsync files
> only once for each file because bgwriter sleeps for each file in my patch.
Ah, I see. I looked at the patch briefly a few days ago, and wondered
why there was so many changes to mdsync. I didn't realize there was a
fix to the "getting stuck" problem in there as well.
I'll take a closer look, and try to write a patch to just fix the
"getting stuck" problem, but in a way that anticipates the load
distributed checkpoint patch so that it doesn't need to be rewritten again.
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com