Re: pg_stat_database.checksum_failures vs shared relations
| От | Andres Freund | 
|---|---|
| Тема | Re: pg_stat_database.checksum_failures vs shared relations | 
| Дата | |
| Msg-id | 3j6jxcigfavufwkfxbzyauj4fibfwhl37uqh3ttjko26meqxft@p7jelszxn6nz обсуждение исходный текст | 
| Ответ на | Re: pg_stat_database.checksum_failures vs shared relations (Michael Paquier <michael@paquier.xyz>) | 
| Ответы | Re: pg_stat_database.checksum_failures vs shared relations | 
| Список | pgsql-hackers | 
Hi, On 2025-03-28 09:44:58 +0900, Michael Paquier wrote: > On Thu, Mar 27, 2025 at 12:06:45PM -0400, Robert Haas wrote: > > On Thu, Mar 27, 2025 at 11:58 AM Andres Freund <andres@anarazel.de> wrote: > > > So, today we have the weird situation that *some* checksum errors on shared > > > relations get attributed to the current database (if they happen in a backend > > > normally accessing a shared relation), whereas others get reported to the > > > "shared relations" "database" (if they happen during a base backup). That > > > seems ... not optimal. > > > > > > One question is whether we consider this a bug that should be backpatched. > > > > I think it would be defensible if pg_basebackup reported all errors > > with OID 0 and backend connections reported all errors with OID > > MyDatabaseId, but it seems hard to justify having pg_basebackup take > > care to report things using the correct database OID and individual > > backend connections not take care to do the same thing. So I think > > this is a bug. If fixing it in the back-branches is too annoying, I > > think it would be reasonable to fix it only in master, but > > back-patching seems OK too. > > Being able to get a better reporting for shared relations in back > branches would be nice, but that's going to require some invasive > chirurgy, isn't it? Yea, that's what I was worried about too. I think we basically would need a PageIsVerifiedExtended2() that backs the current PageIsVerifiedExtended(), with optional arguments that the "fixed" callers would use. > We don't know currently the OID of the relation whose block is > corrupted with only PageIsVerifiedExtended(). I don't think the relation oid is really the relevant bit, it's the database oid (or alternatively tablespace). But PageIsVerifiedExtended() doesn't know that either, obviously. > There are two callers of PIV_REPORT_STAT on HEAD: > - The checksum reports from RelationCopyStorage() know the > SMgrRelation. > - ReadBuffersOperation() has an optional Relation and a > SMgrRelationData. An SMgrRelationData suffices, via ->smgr_rlocator.locator.dbOid. FWIW, it turns out that there are more cases than just MyDatabaseId and InvalidOid - ScanSourceDatabasePgClass() and RelationCopyStorageUsingBuffer() read buffers in a different database than MyDatabaseId. > We could just refactor PageIsVerifiedExtended() so as it reports a > state about why the verification failed and let the callers report the > checksum failure with a relation OID, splitting the data for shared > and non-shared relations? Yea, I think we basically need a *checksum_failed out argument, and then the callers need to do if (checksum_failure) pgstat_report_checksum_failures_in_db(src->smgr_rlocator.locator.dbOid, 1); Or alternatively we can optionally pass in the rlocator to PageIsVerifiedExtended2(), so it can do the above internally. Btw, it seems somewhat odd that we accumulate stats for checksum failures but not for invalid page headers - the latter seems even worse... Greetings, Andres Freund
В списке pgsql-hackers по дате отправления: