Re:PATCH: standby crashed when replay block which truncated instandby but failed to truncate in master node

Поиск
Список
Период
Сортировка
От Thunder
Тема Re:PATCH: standby crashed when replay block which truncated instandby but failed to truncate in master node
Дата
Msg-id 252942e7.6f38.16d5d183404.Coremail.thunder1@126.com
обсуждение исходный текст
Ответ на PATCH: standby crashed when replay block which truncated in standbybut failed to truncate in master node  (Thunder <thunder1@126.com>)
Ответы Re: PATCH: standby crashed when replay block which truncated instandby but failed to truncate in master node  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
Is this an issue? 
Can we fix like this?
Thanks!





At 2019-09-22 00:38:03, "Thunder" <thunder1@126.com> wrote:
The step to reproduce this issue.
1. Create a table
    create table gist_point_tbl(id int4, p point);
    create index gist_pointidx on gist_point_tbl using gist(p);
2. Insert data
    insert into gist_point_tbl (id, p) select g,        point(g*10, g*10) from generate_series(1, 1000000) g;
3. Delete data
     delete from gist_point_bl;
4. Vacuum table
    vacuum gist_point_tbl;
    -- Send SIGINT to vacuum process after WAL-log of the truncation is flushed and the truncation is not finished
    -- We will receive error message "ERROR:  canceling statement due to user request"
5. Vacuum table again
    vacuum gist_point tbl;
    -- The standby node crashed and the PANIC log is "PANIC:  WAL contains references to invalid pages"

The standby node succeed to replay truncate log but master node doesn't truncate the file, it will be crashed if master node writes to these blocks which truncated in standby node.
I try to fix issue to prevent query cancel interrupts during truncating.

diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 5df4382b7e..04b696ae01 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -26,6 +26,7 @@
 #include "access/xlogutils.h"
 #include "catalog/storage.h"
 #include "catalog/storage_xlog.h"
+#include "miscadmin.h"
 #include "storage/freespace.h"
 #include "storage/smgr.h"
 #include "utils/memutils.h"
@@ -248,6 +249,14 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
        if (vm)
                visibilitymap_truncate(rel, nblocks);

+       /*
+        * When master node flush WAL-log of the truncation and then receive SIGINT signal to cancel
+        * this transaction before the truncation, if standby receive this WAL-log and do the truncation,
+        * standby node will crash when master node writes to these blocks which are truncated in standby node.
+        * So we prevent query cancel interrupts.
+        */
+       HOLD_CANCEL_INTERRUPTS();
+
        /*
         * We WAL-log the truncation before actually truncating, which means
         * trouble if the truncation fails. If we then crash, the WAL replay
@@ -288,6 +297,8 @@ RelationTruncate(Relation rel, BlockNumber nblocks)

        /* Do the real work */
        smgrtruncate(rel->rd_smgr, MAIN_FORKNUM, nblocks);
+
+       RESUME_CANCEL_INTERRUPTS();
 }


 



 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fabien COELHO
Дата:
Сообщение: Re: pgbench - allow to create partitioned tables
Следующее
От: Konstantin Knizhnik
Дата:
Сообщение: Re: Global temporary tables