On Sat, Sep 15, 2018 at 12:29 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> On 2018-Sep-15, Masahiko Sawada wrote:
>
>> On Fri, Sep 14, 2018 at 4:27 PM, 李海龙 <hailong.li@qunar.com> wrote:
>
>> > When I enable the parameter track_commit_timestamp in postgresql.conf of a
>> > Base Backup (making a Base Backup from a standby and the
>> > track_commit_timestamp is off on it),
>>
>> In addition to the above operation, I've reproduced this issue by
>> replaying a commit WAL record that sets the timestamp to a new page
>> during the crash recovery (or from restart).
>>
>> It seems to me that the cause of this is that we could not extend
>> commitTs page since the COMMIT_TS_ZEROPAGE WAL wasn't generated at the
>> standby server whose track_commit_timestamp is off. So during
>> replaying the commit WAL record the startup process fails since the
>> corresponding commitTs page doesn't exist.
>
> Hmm, wow. I wonder if it's possible to detect the config difference
> early enough that the zeropage WAL records are emitted, instead. But
> even this might not work, since some transactions need to have their
> commitTS in pages that will not have been zeroed anyway, because the
> page threshold was crossed in the old primary.
>
>> To fix that maybe we can disable commitTs if
>> controlFile->track_commit_timestamp == false and the
>> track_commit_timestamp == true even in crash recovery.
>
> Hmm, so keep it off while crash recovery runs, and once it's out of that
> then enable it automatically?
Yes. The attached patch changes it to check
controlFile->track_commit_timestamp even the crash recover case. If
track_commit_timestamp is set to true in the config file, it's enabled
at end of the recovery.
> That might work -- by definition we don't
> care about the commit TSs of the transaction replayed during crash
> recovery, since they were executed in the primary that didn't have
> commitTS enable anyway.
>
> It seems like the first thing we need is TAP cases that reproduce these
> two crash scenarios.
I attached TAP test that reproduces this issue. We can reproduce it
even with single server; making postgres replay a commit WAL in the
crash recovery after consumed transactions and enabled
track_commit_timestamp.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center