Re: Changing the state of data checksums in a running cluster
От | Tomas Vondra |
---|---|
Тема | Re: Changing the state of data checksums in a running cluster |
Дата | |
Msg-id | e812165e-a4a5-4f17-bf43-9c6173de153c@vondra.me обсуждение исходный текст |
Ответ на | Re: Changing the state of data checksums in a running cluster (Tomas Vondra <tomas@vondra.me>) |
Ответы |
Re: Changing the state of data checksums in a running cluster
|
Список | pgsql-hackers |
Here's a rebased version of the patch series, addressing the issues I've pointed out in the last round of reviews. I've kept the changes in separate patches for clarity, but it should be squashed into a single patch in the end. 1) v20250309-0001-Online-enabling-and-disabling-of-data-chec.patch ------------------------------------------------------------------ Original patch, rebased, resolving merge conflicts. 2) v20250309-0002-simple-post-rebase-fixes.patch ------------------------------------------------ A couple minor fixes, addressing test failures due to stuff committed since the previous patch version. Mostly mechanical, the main change is I don't think the pgstat_bestart() call is necessary. Or is it? 3) v20250309-0003-sync-the-data_checksums-GUC-with-the-local.patch ------------------------------------------------------------------ This is the main change, fixing failures in 002_actions.pl - the short version is that test does "-C data_checksums", but AFAICS that does not quite work because it does not call show_data_checksums() that early, and instead just grabs the variable backing the GUC. Which may be out of sync, so this patch fixes that by updating them both. That fixes the issue, but it's it a bit strange we now have three places tracking the state of data checksums? We have data_checksum_version in the control file, and then data_checksums and LocalDataChecksumVersion in the backends. Would it be possible to "unify" the latter two? That would also mean we don't have the duplicate constants like PG_DATA_CHECKSUM_VERSION and DATA_CHECKSUM_VERSION. Or why do we need that? 4) v20250309-0004-make-progress-reporting-work.patch ---------------------------------------------------- The progress reporting got "mostly disabled" in an earlier version, due to issues with the bgworkers. AFAICS the issue is the "status" row can be updated only by a single process, which does not quite work with the launcher + per-db workers architecture. I've considered a couple different approaches: a) updating the status only from the launcher This is mostly what CREATE INDEX does with parallel builds, and there it's mostly sufficient. But for checksums it'd mean we only have the number of databases to process/done, and that seems unsatisfactory, considering large clusters often have only a single large database. So not good enough, IMHO. b) allowing workers to update the status row, created by the launcher I guess this would be better, we'd know the relations/blocks counts. And I haven't tried coding this, but there would need to be some locking so that the workers don't overwrite increments from other workers, etc. But I don't think it'd work nicely with parallel per-db workers (which we don't do now, but we might). c) having one status entry per worker I ended up doing this, it didn't require any changes to the progress infrastructure, and it will work naturally even with multiple workers. There will always be one row for launcher status (which only has the number of databases total/done), and then one row per worker, with database-level info (datid, datname, #relations, #blocks). I removed the "DONE" phase, because that's right before the launcher exists, and I don't think we have that for similar cases. And I added "waiting on checkpoint" state, because that's often a long wait when the launcher seems to do nothing, so it seems useful to communicate the reason for that wait. 5) v20250309-0005-update-docs.patch ----------------------------------- Minor tweaks to docs, to reflect the changes to the progress reporting changes, and also some corrections (no resume after restart, ...). So far this passed all my tests - both chekc-world and stress testing (no failures / assert crashes, ...). One thing that puzzles me is I wasn't able to reproduce the failures reported in [1] - not even with just the rebase + minimal fixes (0001 + 0002). My best theory is this is somehow machine-specific, and my laptop is too slow or something. I'll try with the machine I used before once it gets available. regards [1] https://www.postgresql.org/message-id/e4dbcb2c-e04a-4ba2-bff0-8d979f55960e%40vondra.me -- Tomas Vondra
Вложения
В списке pgsql-hackers по дате отправления: