Re: BUG #17212: pg_amcheck fails on checking temporary relations
От | Peter Geoghegan |
---|---|
Тема | Re: BUG #17212: pg_amcheck fails on checking temporary relations |
Дата | |
Msg-id | CAH2-Wzk=vLs8qapiK155uwmzWn6dcK2=3XRE=988V02mmkxJVg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: BUG #17212: pg_amcheck fails on checking temporary relations (Mark Dilger <mark.dilger@enterprisedb.com>) |
Ответы |
Re: BUG #17212: pg_amcheck fails on checking temporary relations
(Mark Dilger <mark.dilger@enterprisedb.com>)
|
Список | pgsql-hackers |
On Mon, Oct 4, 2021 at 5:32 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote: > Sorry, I realized after hitting <send> that you might take it that way, but I mean the logs generally, not just postgreslogs. If somebody runs "reindex concurrently" on all tables at midnight every night, and they see one morning inthe (non-postgres) logs from the midnight hour weird error messages about their RAID controller, they may well want tocheck all their indexes to see if any of them were corrupted. I don't see what the point of this example is. Why is the REINDEX CONCURRENTLY index special here? Presumably the user is using pg_amcheck with its -i option in this scenario, since you've scoped it that way. Where did they get that index name from? Presumably it's just the original familiar index name? How did an error message that's not from the Postgres logs (or something similar) contain any index name? If the overnight rebuild completed successfully then we'll verify the newly rebuilt smgr relfilenode for the index. It if failed then we'll just verify the original. In other words, if we treat the validity of indexes as a "visibility concern", everything works out just fine. If there is an orphaned index (because of the implementation issue with CONCURRENTLY) then it is *definitely* "corrupt" -- but not in any sense that pg_amcheck ought to concern itself with. Such an orphaned index can never actually be used by anybody. (We should fix this wart in the CONCURRENTLY implementation some day.) > To get back on track, let me say that I'm not taking the position that what pg_amcheck currently does is necessarily correct,but just that I'd like to be careful about what we change, if anything. There are three things that seem to irritatepeople: > > 1) A non-zero exit code means "not everything was checked and passed" rather than "at least one thing is definitely corrupt". Right. > 2) Skipping of indexes is reported to the user with the word 'ERROR' in the report rather than, say, 'NOTICE'. Right -- but it's also the specifics of the error. These are errors that only make sense when there was specific human error. Which is clearly not the case at all, except perhaps in the narrow -i case. > 3) Deadlocks can occur Right. > I have resisted changing #1 on the theory that `pg_amcheck --all && ./post_all_checks_pass.sh` should only run the post_all_checks_pass.shif indeed all checks have passed, and I'm interpreting skipping an index check as being contrary tothat. You're also interpreting it as "skipping". This is a subjective interpretation. Which is fair enough - I can see why you'd put it that way. But that's not how I see it. Again, consider that pg_dump cares about the "indisready" status of indexes, for a variety of reasons. Now, the pg_dump example doesn't necessarily mean that pg_amcheck *must* do the same thing (though it certainly suggests as much). To me the important point is that we are perfectly entitled to define "the indexes that pg_amcheck can see" in whatever way seems to make most sense overall, based on practical considerations. > But the user running pg_amcheck might also *know* that they aren't running any such DDL, and therefore expect --all toactually result in everything being checked. The user would also have to know precisely how the system catalogs work during DDL. They'd have to know that the pg_class entry might become visible very early on, rather than at the end, in some cases. They'd know all that, but still be surprised by the current pg_amcheck behavior. Which is itself not consistent with pg_dump. > I find it strange that I should do anything about #2 in pg_amcheck, since it's the function in verify_nbtree that phrasesthe situation as an error. I don't find it strange. It does that because it *is* an error. There is simply no alternative. The solution for amcheck is the same as it has always been: just write the SQL query in a way that avoids it entirely. > I'm genuinely not trying to give you grief here -- I simply don't like that pg_amcheck is adding commentary atop what thechecking functions are doing. pg_amcheck would not be adding commentary if this was addressed in the way that I have in mind. It would merely be dealing with the issue in the way that the amcheck docs have recommended, for years. The problem here (as I see it) is that pg_amcheck is already adding commentary, by not just doing that. > I also find it strange that #3 is being attributed to pg_amcheck's choice of how to call the checking function, becauseI can't think of any other function where we require the SQL caller to do anything like what you are requiring herein order to prevent deadlocks, and also because the docs for the functions don't say that a deadlock is possible, merelythat the function may return with an error. I will need to study the deadlock issue separately. -- Peter Geoghegan
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Fujii MasaoДата:
Сообщение: Re: (LOCK TABLE options) “ONLY” and “NOWAIT” are not yet implemented