Hi
> This doesn't test the consequences of the restart being skipped, nor
> does it review on a code level the correctness.
I check not only one stuff during review. I look code too: bgworker checksumhelper.c registered with:
> bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
And then process the whole cluster (even if we run checksumhelper before, but exit before its completed). Or
BgWorkerStart_RecoveryFinisheddoes not guarantee start only after recovery finished?
Before start any real work (and after recovery end) checksumhelper checked current cluster status again:
> + * If a standby was restarted when in pending state, a background worker
> + * was registered to start. If it's later promoted after the master has
> + * completed enabling checksums, we need to terminate immediately and not
> + * do anything. If the cluster is still in pending state when promoted,
> + * the background worker should start to complete the job.
> What if your replicas are delayed (e.g. recovery_min_apply_delay)?
> What if you later need to do PITR?
if we start after replay pg_enable_data_checksums and before it completed - we plan start bgworker on recovery finish.
if we replay checksumhelper finish - we _can_ start checksumhelper again and this is handled during checksumhelper
start.
Behavior seems correct for me. I miss something very wrong?
regards, Sergei