Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
От | Vitaly Davydov |
---|---|
Тема | Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly |
Дата | |
Msg-id | 1d97ec-6841b600-3-b7b4780@40747246 обсуждение исходный текст |
Ответ на | Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly ("Vitaly Davydov" <v.davydov@postgrespro.ru>) |
Ответы |
Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
|
Список | pgsql-hackers |
Dear Alexander, Amit Alexander Korotkov wrote: > Also, I've changed ReplicationSlotsComputeRequiredLSN() call to > CheckPointReplicationSlots() to update required LSN after > SaveSlotToPath() updated last_saved_restart_lsn. This helps to pass > checks in 001_stream_rep.pl without additional hacks. Thank you for the improvement and patch preparation. I confirm the test is passed without additional hacks now. I sill do not understand why this solution is favored. It is, in my opinion, a non backward-compatible solution. In any case, I'm ok to go with this patch. If needed, I may prepare a backward-compatible solution where last_saved_restart_lsn values will be in an another place of the shmem, rather than in ReplicationSlot struct. I still would like to add my 5 cents to the discussion. The purpose of the xmin value is to prevent tuples from vacuuming. Slots' restart_lsn values are used to calculate the oldest lsn to keep WAL segments from removal in checkpoint. These processes are pretty independent. The logical slots are advanced in 2 steps. At the first step, the logical decoding stuff periodically sets consistent candidate values for catalog_xmin and restart_lsn. At the second step, when LogicalConfirmReceivedLocation is called, the candidate values are assigned on catalog_xmin and restart_lsn values based on the confirmed lsn value. The slot is saved with these consistent values. It is important, that the candidate values are consistent, decoding guarantees it. In case of a crash, we should guarantee that the loaded from the disk catalog_xmin and restart_lsn values are consistent and valid for logical slots. LogicalConfirmReceivedLocation function keeps this consistency by updating them from consistent candidate values in a single operation. We have to guarantee that we use saved to disk values to calculate xmin horizon and slots' oldest lsn. For this purpose, effective_catalog_xmin is used. We update effective_catalog_xmin in LogicalConfirmReceivedLocation just after saving slot to disk. Another place where we update effective_catalog_xmin is when walsender receives hot standby feedback message. Once, we have two independent processes (vacuuming, checkpoint), we can calculate xmin horizon and oldest WAL lsn values independently (at different times) from the saved to disk values. Note, these values are updated in a non atomic way. The xmin value is set when the node receives hot standby feedback and it is used to keep tuples from vacuuming as well as catalog_xmin for decoding stuff. Not sure, xmin is applicable for logical replication. The confirmed flush lsn is used as a startpoint when a peer node doesn't provide the start lsn and to check that the start lsn is not older than the latest confirmed flush lsn. The saving of the slot on disk at each call of LogicalConfirmReceivedLocation doesn't help to avoid conflicts completely, but it helps to decrease the probability of conflicts. So, i'm still not sure, we need to save logical slots on each advance to avoid conflicts, because it doesn't help in general. The conflicts should be resolved by other means. Once, we truncate old wal segments in checkpoint only. I believe, it is ok if we calculate the oldest lsn only at the beginning of the checkpoint, as it was in the alternative solution. I think, we can update xmin horizon in checkpoint only but the horizon advancing will be more lazy in this case. Taking into account these thoughts, I can't see any problems with the alternative patch where oldest wal lsn is calculated only in checkpoint. With best regards, Vitaly
В списке pgsql-hackers по дате отправления: