Re: Online enabling of checksums

Поиск
Список
Период
Сортировка
От Magnus Hagander
Тема Re: Online enabling of checksums
Дата
Msg-id CABUevEx_Zd+-h=-zKQuBLhmZd+TT1xbdwo9sVwCbrZ2bTHfbbw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Online enabling of checksums  (Heikki Linnakangas <hlinnaka@iki.fi>)
Ответы Re: Online enabling of checksums  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: Online enabling of checksums  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: Online enabling of checksums  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers


On Mon, Mar 19, 2018 at 7:27 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Hi,

The patch looks good to me at a high level. Some notes below. I didn't read through the whole thread, so sorry if some of these have been discussed already.

+void
+ShutdownChecksumHelperIfRunning(void)
+{
+       if (pg_atomic_unlocked_test_flag(&ChecksumHelperShmem->launcher_started))
+       {
+               /* Launcher not started, so nothing to shut down */
+               return;
+       }
+
+       ereport(ERROR,
+                       (errmsg("checksum helper is currently running, cannot disable checksums"),
+                        errhint("Restart the cluster or wait for the worker to finish.")));
+}

Is there no way to stop the checksum helper once it's started? That seems rather user-unfriendly. I can imagine it being a pretty common mistake to call pg_enable_data_checksums() on a 10 TB cluster, only to realize that you forgot to set the cost limit, and that it's hurting queries too much. At that point, you want to abort.

Agreed. You could do it with pg_terminate_backend() but you had to do it in the right order, etc.

Attached patch fixes this by making it possible to abort the process by executing pg_disable_data_checksums() during the process. In this case the live workers will abort, and the checksums will be switched off again.
 


+extern bool DataChecksumsEnabledOrInProgress(void);
+extern bool DataChecksumsInProgress(void);
+extern bool DataChecksumsDisabled(void);

I find the name of the DataChecksumsEnabledOrInProgress() function a bit long. And doing this in PageIsVerified looks a bit weird:

> if (DataChecksumsEnabledOrInProgress() && !DataChecksumsInProgress())

I think I'd prefer functions like:

/* checksums should be computed on write? */
bool DataChecksumNeedWrite()
/* checksum should be verified on read? */
bool DataChecksumNeedVerify()

Agreed. We also need DataChecksumsInProgress() to make it work properly, but that makes the names a lot more clear. Adjusted as such. 

 
+     <literal>template0</literal> is by default not accepting connections, to
+     enable checksums you'll need to temporarily make it accept connections.

This was already discussed, and I agree with the other comments that it's bad.

Do you have any opinion on the thread about how to handle that one? As in which way to go about solving it? (The second thread linked from the CF entry - it wasn't linked before as intended, but it is now)



+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;

pg_[enable|disable]_checksums() functions return a bool. It's not clear to me when they would return what. I'd suggest marking them as 'void' instead.

Agreed, changed.


--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION         4
 #define PG_DATA_CHECKSUM_VERSION       1
+#define PG_DATA_CHECKSUM_INPROGRESS_VERSION            2
 

This seems like a weird place for these PG_DATA_CHECKSUM_* constants. They're not actually stored in the page header, as you might think. I think the idea was to keep PG_DATA_CHECKSUM_VERSION close to PG_PAGE_LAYOUT_VERSION, because the checksums affected the on-disk format. I think it was a bit weird even before this patch, but now it's worse. At least a better comment would be in order, or maybe move these elsewhere.

True, but they apply to the page level? I'm not sure about the original reasoning to put them there,figured it wasn't the responsibility of this patch to move them. But we can certainly do that.

But what would be an appropriate place elsewhere? First thought would be pg_control.h, but that would then not be consistent with e.g. wal_level (which is declared in xlog.h and not pg_control.h..


Feature request: it'd be nice to update the 'ps status' with set_ps_display, to show a rudimentary progress indicator. Like the name and block number of the relation being processed. It won't tell you how much is left to go, but at least it will give a warm fuzzy feeling to the DBA that something is happening.

In the attached patch it sets this information in pg_stat_activity. I think that makes more sense than the ps display, and I think is more consistent with other ways we use them (e.g. autovacuum doesn't update it's ps title for every table, but it does update pg_stat_activity).


I didn't review the offline tool, pg_verify_checksums.

PFA a patch that takes into account these comments and we believe all other pending ones as well.


--
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: configure's checks for --enable-tap-tests are insufficient
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: XID-assigned idle transactions affect vacuum's job.