Re: Offline enabling/disabling of data checksums
От | Michael Paquier |
---|---|
Тема | Re: Offline enabling/disabling of data checksums |
Дата | |
Msg-id | 20190311045323.GE1818@paquier.xyz обсуждение исходный текст |
Ответ на | Re: Offline enabling/disabling of data checksums (Fabien COELHO <coelho@cri.ensmp.fr>) |
Ответы |
Re: Offline enabling/disabling of data checksums
Re: Offline enabling/disabling of data checksums Re: Offline enabling/disabling of data checksums |
Список | pgsql-hackers |
On Wed, Feb 27, 2019 at 07:59:31AM +0100, Fabien COELHO wrote: > Hallo Michael, Okay, let's move on with these patches! > The renaming implies quite a few changes (eg in the documentation, > makefiles, tests) which warrants a review, so it should be a patch. Also, > ISTM that the renaming only make sense when adding the enable/disable > feature, so I'd say that it belongs to this patch. Opinions? I have worked on the last v4 sent by Michael B, finishing with the attached after review and addressed the last points raised by Fabien. The thing is that I have been rather unhappy with a couple of things in what was proposed, so I have finished by modifying quite a couple of areas. The patch set is now splitted as I think is suited for commit, with the refactoring and renaming being separated from the actual feature: - 0001 if a patch to refactor the routine for the control file update. I have made it backend-aware, and we ought to be careful with error handling, use of fds and such, something that v4 was not very careful about. - 0002 renames pg_verify_checksums to pg_checksums with a straight-forward switch. Docs as well as all references to pg_verify_checksums are updated. - 0003 adds the new options --check, --enable and --disable, with --check being the default as discussed. - 0004 adds a -N/--no-sync which I think is nice for consistency with other tools. That's also useful for the tests, and was not discussed until now on this thread. > Patch applies cleanly, compiles, global & local "make check" are ok. > > Doc: "enable, disable or verifies" -> "checks, enables or disables"? > Spurious space: "> ." -> ">.". > > As checksum are now checked, the doc could use "check" instead of "verify", > especially if there is a rename and the "verify" word disappears. That makes sense. I have fixed these, and simplified the docs a bit to have a more simple page. > I'd be less terse when documenting actions, eg: "Disable checksums" -> > "Disable checksums on cluster." The former is fine in my opinion. > Doc should state that checking is the default action, eg "Check checksums on > cluster. This is the default action." Check. > Help string could say that -c is the default action. There is a spurious > help line remaining from the previous "--action" implementation. Yeah, we should. Added. > open: I'm positively unsure about ?: priority over |, and probably not the > only one, so I'd add parentheses around the former. Yes, I agree that the current code is hard to decrypt. So reworked with a separate variable in scan_file, and added extra parenthesis in the part which updates the control file. > I'm at odds with the "postmaster.pid" check, which would not prevent an > issue if a cluster is started with "postmaster". I still think that the > enabling-in-progress should be stored in the cluster state. > > ISTM that the cluster read/update cycle should lock somehow the control file > being modified. However other commands do not seem to do something about it. I am still not on board for adding more complexity in this area, at least not for this stuff and for the purpose of this thread, because this can happen at various degrees for various configurations for ages and not only for checksums. Also, I think that we still have to see users complain about that. Here are some scenarios where this can happen: - A base backup partially written. pg_basebackup limits this risk but it could still be possible to see a case where partially-written data folder. And base backups are around for many years. - pg_rewind, and the tool is in the tree since 9.5, the tool is actually available on github since 9.3. > I do not think that enabling if already enabled or disabling or already > disable should exit(1), I think it is a no-op and should simply exit(0). We already issue exit(1) when attempting to verify checksums on a cluster where they are disabled. So I agree with Michael B's point of Issuing an error in such cases. I think also that this makes handling for operators easier. > About tests: I'd run a check on a disabled cluster to check that the command > fails because disabled. Makes sense. Added. We need a test also for the case of successive runs with --enable. Here are also some notes from my side. - There was no need to complicate the synopsis of the docs. - usage() included still references to --action and indentation was a bit too long at the top. - There were no tests for disabling checksums, so I have added some. - We should check that the combination of --enable and -r fails. - Tests use only long options, that's better for readability. - Improved comments in tests. - Better to check for "data_checksum_version > 0" if --enable is used. That's more portable long-term if more checksum versions are added. - The check on postmaster.pid is actually not necessary as we already know that the cluster has been shutdown cleanly per the state of the control file. I agree that there could be a small race condition here, and we could discuss that in a different thread if need be as such things could be improved for other frontend tools as well. For now I am taking the most simple approach. (Still need to indent the patches before commit, but that's a nit.) -- Michael
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Amit LangoteДата:
Сообщение: Re: Should we add GUCs to allow partition pruning to be disabled?
Следующее
От: Etsuro FujitaДата:
Сообщение: Re: Oddity with parallel safety test for scan/join target in grouping_planner