On Mon, Jul 22, 2024 at 07:01:41AM +0000, Bertrand Drouvot wrote:
> 1 ===
> Not related with your patch but this comment in the GetRedoRecPtr() function:
>
> * grabbed a WAL insertion lock to read the authoritative value in
> * Insert->RedoRecPtr
>
> sounds weird. Should'nt that be s/Insert/XLogCtl/?
No, the comment is right. We are retrieving a copy of
Insert->RedoRecPtr here.
> 2 ===
>
> + /* Write the redo LSN, used to cross check the file loaded */
>
> Nit: s/loaded/read/?
WFM.
> 3 ===
>
> + /*
> + * Read the redo LSN stored in the file.
> + */
> + if (!read_chunk_s(fpin, &file_redo) ||
> + file_redo != redo)
> + goto error;
>
> I wonder if it would make sense to have dedicated error messages for
> "file_redo != redo" and for "format_id != PGSTAT_FILE_FORMAT_ID". That would
> ease to diagnose as to why the stat file is discarded.
Yep. This has been itching me quite a bit, and that's a bit more than
just the format ID or the redo LSN: it relates to all the read_chunk()
callers. I've taken a shot at this with patch 0001, implemented on
top of the rest. Adjusted as well the redo LSN read to have more
error context, now in 0002.
> Looking at 0003:
>
> 4 ===
>
> @@ -5638,10 +5634,7 @@ StartupXLOG(void)
> * TODO: With a bit of extra work we could just start with a pgstat file
> * associated with the checkpoint redo location we're starting from.
> */
> - if (didCrash)
> - pgstat_discard_stats();
> - else
> - pgstat_restore_stats(checkPoint.redo);
> + pgstat_restore_stats(checkPoint.redo)
>
> remove the TODO comment?
Pretty sure I've removed that more than one time already, and that
this is a rebase accident. Thanks for noticing.
> 5 ===
>
> + * process) if the stats file has a redo LSN that matches with the .
>
> unfinished sentence?
This is missing a reference to the control file.
> 6 ===
>
> - * Should only be called by the startup process or in single user mode.
> + * This is called by the checkpointer or in single-user mode.
> */
> void
> -pgstat_discard_stats(void)
> +pgstat_flush_stats(XLogRecPtr redo)
> {
>
> Would that make sense to add an Assert in pgstat_flush_stats()? (checking what
> the above comment states).
There is one in pgstat_write_statsfile(), not sure there is a point in
duplicating the assertion in both.
Attaching a new v4 series, with all these comments addressed.
--
Michael