Re: persist logical slots to disk during shutdown checkpoint

Поиск
Список
Период
Сортировка
От vignesh C
Тема Re: persist logical slots to disk during shutdown checkpoint
Дата
Msg-id CALDaNm3jY-xiDhEoJwTYHiBWiY+4DpNjZbddNozE7ZyOXKLvFg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: persist logical slots to disk during shutdown checkpoint  (vignesh C <vignesh21@gmail.com>)
Список pgsql-hackers
On Fri, 25 Aug 2023 at 17:40, vignesh C <vignesh21@gmail.com> wrote:
>
> On Sat, 19 Aug 2023 at 11:53, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > It's entirely possible for a logical slot to have a confirmed_flush
> > LSN higher than the last value saved on disk while not being marked as
> > dirty.  It's currently not a problem to lose that value during a clean
> > shutdown / restart cycle but to support the upgrade of logical slots
> > [1] (see latest patch at [2]), we seem to rely on that value being
> > properly persisted to disk. During the upgrade, we need to verify that
> > all the data prior to shudown_checkpoint for the logical slots has
> > been consumed, otherwise, the downstream may miss some data. Now, to
> > ensure the same, we are planning to compare the confirm_flush LSN
> > location with the latest shudown_checkpoint location which means that
> > the confirm_flush LSN should be updated after restart.
> >
> > I think this is inefficient even without an upgrade because, after the
> > restart, this may lead to decoding some data again. Say, we process
> > some transactions for which we didn't send anything downstream (the
> > changes got filtered) but the confirm_flush LSN is updated due to
> > keepalives. As we don't flush the latest value of confirm_flush LSN,
> > it may lead to processing the same changes again.
>
> I was able to test and verify that we were not processing the same
> changes again.
> Note: The 0001-Add-logs-to-skip-transaction-filter-insert-operation.patch
> has logs to print if a decode transaction is skipped and also a log to
> mention if any operation is filtered.
> The test.sh script has the steps for a) setting up logical replication
> for a table b) perform insert on table that need to be published (this
> will be replicated to the subscriber) c) perform insert on a table
> that will not be published (this insert will be filtered, it will not
> be replicated) d) sleep for 5 seconds e) stop the server f) start the
> server
> I used the following steps, do the following in HEAD:
> a) Apply 0001-Add-logs-to-skip-transaction-filter-insert-operation.patch
> patch in Head and build the binaries b) execute test.sh c) view N1.log
> file to see that the insert operations were filtered again by seeing
> the following logs:
> LOG:  Filter insert for table tbl2
> ...
> ===restart===
> ...
> LOG:  Skipping transaction 0/156AD10 as start decode at is greater 0/156AE40
> ...
> LOG:  Filter insert for table tbl2
>
> We can see that the insert operations on tbl2 which was filtered
> before server was stopped is again filtered after restart too in HEAD.
>
> Lets see that the same changes were not processed again with patch:
> a) Apply v4-0001-Persist-to-disk-logical-slots-during-a-shutdown-c.patch
> from [1] also apply
> 0001-Add-logs-to-skip-transaction-filter-insert-operation.patch patch
> and build the binaries b) execute test.sh c) view N1.log file to see
> that the insert operations were skipped after restart of server by
> seeing the following logs:
> LOG:  Filter insert for table tbl2
> ...
> ===restart===
> ...
> Skipping transaction 0/156AD10 as start decode at is greater 0/156AFB0
> ...
> Skipping transaction 0/156AE80 as start decode at is greater 0/156AFB0
>
> We can see that the insert operations on tbl2 are not processed again
> after restart with the patch.

Here is another way to test using pg_replslotdata approach that was
proposed earlier at [1].
I have rebased this on top of HEAD and the v5 version for the same is attached.

We can use the same test as test.sh shared at [2].
When executed with HEAD, it was noticed that confirmed_flush points to
WAL location before both the transaction:
slot_name  slot_type    datoid   persistency     xmin  catalog_xmin
  restart_lsn         confirmed_flush     two_phase_at  two_phase
plugin
---------       ---------         ------     ----------           ----
    -----------               -----------            ---------------
         ------------          ---------          ------
sub            logical          5          persistent       0
735                      0/1531E28        0/1531E60             0/0
               0                   pgoutput

WAL record information generated using pg_walinspect for various
records at and after confirmed_flush WAL 0/1531E60:
 row_number | start_lsn |  end_lsn  | prev_lsn  | xid |
resource_manager |     record_type     | record_length |
main_data_length | fpi_length |
                               description
                                                                     |
                 block_ref

------------+-----------+-----------+-----------+-----+------------------+---------------------+---------------+------------------+------------+-------------------------------------------------------------------

--------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------
          1 | 0/1531E60 | 0/1531EA0 | 0/1531E28 |   0 | Heap2
  | PRUNE               |            57 |                9 |
0 | snapshotConflictHorizon: 0, nredirected: 0, ndead: 1, nunused: 0,
redirected: [], dead: [1], unused: []
                                                                     |
blkref #0: rel 1663/5/1255 fork main blk 58
          2 | 0/1531EA0 | 0/1531EE0 | 0/1531E60 | 735 | Heap
  | INSERT+INIT         |            59 |                3 |
0 | off: 1, flags: 0x08

                                                                     |
blkref #0: rel 1663/5/16384 fork main blk 0
          3 | 0/1531EE0 | 0/1531F20 | 0/1531EA0 | 735 | Heap
  | INSERT              |            59 |                3 |
0 | off: 2, flags: 0x08

                                                                     |
blkref #0: rel 1663/5/16384 fork main blk 0
          4 | 0/1531F20 | 0/1531F60 | 0/1531EE0 | 735 | Heap
  | INSERT              |            59 |                3 |
0 | off: 3, flags: 0x08

                                                                     |
blkref #0: rel 1663/5/16384 fork main blk 0
          5 | 0/1531F60 | 0/1531FA0 | 0/1531F20 | 735 | Heap
  | INSERT              |            59 |                3 |
0 | off: 4, flags: 0x08

                                                                     |
blkref #0: rel 1663/5/16384 fork main blk 0
          6 | 0/1531FA0 | 0/1531FE0 | 0/1531F60 | 735 | Heap
  | INSERT              |            59 |                3 |
0 | off: 5, flags: 0x08

                                                                     |
blkref #0: rel 1663/5/16384 fork main blk 0
          7 | 0/1531FE0 | 0/1532028 | 0/1531FA0 | 735 | Transaction
  | COMMIT              |            46 |               20 |
0 | 2023-08-27 23:22:17.161215+05:30

                                                                     |
          8 | 0/1532028 | 0/1532068 | 0/1531FE0 | 736 | Heap
  | INSERT+INIT         |            59 |                3 |
0 | off: 1, flags: 0x08

                                                                     |
blkref #0: rel 1663/5/16387 fork main blk 0
          9 | 0/1532068 | 0/15320A8 | 0/1532028 | 736 | Heap
  | INSERT              |            59 |                3 |
0 | off: 2, flags: 0x08

                                                                     |
blkref #0: rel 1663/5/16387 fork main blk 0
         10 | 0/15320A8 | 0/15320E8 | 0/1532068 | 736 | Heap
  | INSERT              |            59 |                3 |
0 | off: 3, flags: 0x08

                                                                     |
blkref #0: rel 1663/5/16387 fork main blk 0
         11 | 0/15320E8 | 0/1532128 | 0/15320A8 | 736 | Heap
  | INSERT              |            59 |                3 |
0 | off: 4, flags: 0x08

                                                                     |
blkref #0: rel 1663/5/16387 fork main blk 0
         12 | 0/1532128 | 0/1532168 | 0/15320E8 | 736 | Heap
  | INSERT              |            59 |                3 |
0 | off: 5, flags: 0x08

                                                                     |
blkref #0: rel 1663/5/16387 fork main blk 0
         13 | 0/1532168 | 0/1532198 | 0/1532128 | 736 | Transaction
  | COMMIT              |            46 |               20 |
0 | 2023-08-27 23:22:17.174756+05:30

                                                                     |
         14 | 0/1532198 | 0/1532210 | 0/1532168 |   0 | XLOG
  | CHECKPOINT_SHUTDOWN |           114 |               88 |
0 | redo 0/1532198; tli 1; prev tli 1; fpw true; xid 0:737; oid 16399;
 multi 1; offset 0; oldest xid 723 in DB 1; oldest multi 1 in DB 1;
oldest/newest commit timestamp xid: 0/0; oldest running xid 0;
shutdown |

Whereas the same test executed with the patch applied shows that
confirmed_flush points to CHECKPOINT_SHUTDOWN record:
slot_name   slot_type   datoid persistency    xmin catalog_xmin
restart_lsn       confirmed_flush    two_phase_at  two_phase
    plugin
---------         ---------       ------    -----------       ---
-----------           -----------           ---------------
-----------  ---------               ------
sub              logical       5          persistent    0        735
               0/1531E28       0/1532198            0/0          0
        pgoutput

WAL record information generated using pg_walinspect for various
records at and after confirmed_flush WAL 0/1532198:
 row_number | start_lsn |  end_lsn  | prev_lsn  | xid |
resource_manager |     record_type     | record_length |
main_data_length | fpi_length |
                               description
                                                                     |
block_ref

------------+-----------+-----------+-----------+-----+------------------+---------------------+---------------+------------------+------------+-------------------------------------------------------------------

--------------------------------------------------------------------------------------------------------------------------------------------+-----------
          1 | 0/1532198 | 0/1532210 | 0/1532168 |   0 | XLOG
  | CHECKPOINT_SHUTDOWN |           114 |               88 |
0 | redo 0/1532198; tli 1; prev tli 1; fpw true; xid 0:737; oid 16399;
 multi 1; offset 0; oldest xid 723 in DB 1; oldest multi 1 in DB 1;
oldest/newest commit timestamp xid: 0/0; oldest running xid 0;
shutdown |
(1 row)

[1] -
https://www.postgresql.org/message-id/flat/CALj2ACW0rV5gWK8A3m6_X62qH%2BVfaq5hznC%3Di0R5Wojt5%2Byhyw%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CALDaNm2BboFuFVYxyzP4wkv7%3D8%2B_TwsD%2BugyGhtibTSF4m4XRg%40mail.gmail.com

Regards,
Vignesh

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Yugo NAGATA
Дата:
Сообщение: Re: Incremental View Maintenance, take 2
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: list of acknowledgments for PG16