On Tue, Jun 15, 2010 at 2:16 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 15/06/10 07:47, Fujii Masao wrote:
>>
>> On Tue, Jun 15, 2010 at 12:02 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
>>>
>>> Fujii Masao<masao.fujii@gmail.com> writes:
>>>>
>>>> Walsender tries to send WAL up to xlogctl->LogwrtResult.Write. OTOH,
>>>> xlogctl->LogwrtResult.Write is updated after XLogWrite() performs fsync.
>>>
>>> Wrong. LogwrtResult.Write tracks how far we've written out data,
>>> but it is only (known to be) fsync'd as far as LogwrtResult.Flush.
>>
>> Hmm.. I agree that xlogctl->LogwrtResult.Write indicates the byte position
>> we've written. But in the current XLogWrite() code, it's updated after
>> XLogWrite() calls issue_xlog_fsync(). No?
>
> issue_xlog_fsync() is only called if the caller requested a flush by
> advancing WriteRqst.Flush.
True. The scenario that I'm concerned about is:
1. A transaction commit causes XLogFlush() to write *and* fsync WAL up to the commit record.
2. XLogFlush() calls XLogWrite(), and xlogctl->LogwrtResult.Write is updated to indicate the LSN bigger than or equal
tothat of the commit record after XLogWrite() calls issue_xlog_fsync().
3. Then walsender can send WAL up to the commit record.
A transaction commit would need to wait for local fsync and replication
in a serial manner, in synchronous replication. IOW, walsender cannot
send the commit record until it's fsync'd in XLogWrite().
This scenario will not happen? Am I missing something?
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center