On Tue, Apr 2, 2019 at 6:56 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Sat, Mar 30, 2019 at 06:15:11PM +0100, Julien Rouhaud wrote:
> > I'd also have to get more feedback on this. For now, I'll add this
> > thread to the pg12 open items, as a follow up of the initial code
> > drop.
>
> Catching up here... I think that having a completely separate view
> with one row for each database and one row for shared objects makes
> the most sense based on what has been proposed on this thread. Being
> able to track checksum failures for shared catalogs is really
> something I'd like to be able to see easily, and I have seen
> corruption involving such objects from time to time. I think that we
> should have a design which is extensible.
Ok!
> One thing which is not
> proposed on this patch, and I am fine with it as a first draft, is
> that we don't have any information about the broken block number and
> the file involved. My gut tells me that we'd want a separate view,
> like pg_stat_checksums_details with one tuple per (dboid, rel, fork,
> blck) to be complete. But that's just for future work.
That could indeed be nice.
> For the progress part, we would most likely have a separate view for
> that as well, as the view should show no rows if there is no operation
> in progress.
Ok.
> The patch looks rather clean to me, I have some comments.
>
> - <application>pg_checksums</application>. The exit status is zero if there
> - are no checksum errors when checking them, and nonzero if at least one
> - checksum failure is detected. If enabling or disabling checksums, the
> - exit status is nonzero if the operation failed.
> + <application>pg_checksums</application>. As a consequence, the
> + <structname>pg_stat_checksums</structnameview won't reflect this activity.
> + The exit status is zero if there are no checksum errors when checking them,
> + and nonzero if at least one checksum failure is detected. If enabling or
> + disabling checksums, the exit status is nonzero if the operation failed.
>
> The docs of pg_checksums already clearly state that the cluster needs
> to be offline, so I am not sure that this addition is necessary.
Agreed, removed.
> @@ -1539,6 +1539,8 @@ pgstat_report_checksum_failures_in_db(Oid dboid,
> int failurecount)
>
> Please note that there is no need to have the list of arguments in the
> comment block at the top of pgstat_report_checksum_failures_in_db().
Indeed, fixed.
> + if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
> + result = 0;
> + else
> + result = dbentry->last_checksum_failure;
> +
> + if (result == 0)
> + PG_RETURN_NULL();
> + else
> + PG_RETURN_TIMESTAMPTZ(result);
> +}
>
> No need for two ifs here. What about just that?
> if (NULL)
> PG_RETURN_NULL();
> else
> PG_RETURN_TIMESTAMPTZ(last_checksum_failure);
I do agree, but this is done like this everywhere in pgstatfuncs.c, so
I think it's better to keep it as-is for consistency.