Re: Forbid the use of invalidated physical slots in streaming replication.
От | Ashutosh Bapat |
---|---|
Тема | Re: Forbid the use of invalidated physical slots in streaming replication. |
Дата | |
Msg-id | CAExHW5t9w_iSU+VP-GqGLeRErRJnGzkqavxzYfCk8Z+e0pu-3w@mail.gmail.com обсуждение исходный текст |
Ответ на | RE: Forbid the use of invalidated physical slots in streaming replication. ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>) |
Ответы |
Re: Forbid the use of invalidated physical slots in streaming replication.
|
Список | pgsql-hackers |
> > > pg_replication_slot could be set back to null. > > > > In this case, since the basebackup was taken after the slot was invalidated, it > > does not require the WAL that was removed. But it seems that once the > > streaming starts, the slot sprints to life again and gets validated again. Here's > > pg_replication_slot output after the standby starts. > > Actually, It doesn't bring the invalidated slot back to life completely. > The slot's view data looks valid while the 'invalidated' flag of this slot is still > RS_INVAL_WAL_REMOVED (user are not aware of it.) > I was mislead by the code in pg_get_replication_slots(). I did not read it till the following --- code ---- case WALAVAIL_REMOVED: /* * If we read the restart_lsn long enough ago, maybe that file * has been removed by now. However, the walsender could have * moved forward enough that it jumped to another file after * we looked. If checkpointer signalled the process to * termination, then it's definitely lost; but if a process is * still alive, then "unreserved" seems more appropriate. * * If we do change it, save the state for safe_wal_size below. */ --- code --- I see now how an invalid slot's wal status can be reported as unreserved. So I think it's a bug. > > > > > > > > > So, I think it's a bug and one idea to fix is to check the validity of > > > the physical slot in > > > StartReplication() after acquiring the slot like what the attachment > > > does, what do you think ? > > > > I am not sure whether that's necessarily a bug. Of course, we don't expect > > invalid slots to be used but in this case I don't see any harm. > > The invalid slot has been revived and has all the properties set just like a valid > > slot. So it must be considered in > > ReplicationSlotsComputeRequiredXmin() as well. I haven't verified it myself > > though. In case the WAL is really lost and is requested by the standby it will > > throw an error "requested WAL segment [0-9A-F]+ has already been > > removed". So no harm there as well. > > Since the 'invalidated' field of the slot is still (RS_INVAL_WAL_REMOVED), even > if the walsender advances the restart_lsn, the slot will not be considered in the > ReplicationSlotsComputeRequiredXmin(), so the WALs and Rows are not safe > and that's why I think it's a bug. > > After looking closer, it seems this behavior started from 15f8203 which introduced the > ReplicationSlotInvalidationCause 'invalidated', after that we check the invalidated enum > in Xmin/Lsn computation function. > > If we want to go back to previous behavior, we need to revert/adjust the check > for invalidated in ReplicationSlotsComputeRequiredXmin(), but since the logical > decoding(walsender) disallow using invalidated slot, so I feel it's consistent > to do similar check for physical one. Besides, pg_replication_slot_advance() > also disallow passing invalidated slot to it as well. What do you think ? What happens if you run your script on a build prior to 15f8203? Purely from reading the code, it looks like the physical slot would sprint back to life since its restart_lsn would be updated. But I am not able to see what happens to invalidated_at. It probably remains a valid LSN and the slot would still not be considred in xmin calculation. It will be good to be compatible to pre-15f8203 behaviour. I think making logical and physical slot behaviour consistent would be better but if the inconsistent behaviour is out there for some releases, changing that now will break backward compatibility. -- Best Wishes, Ashutosh Bapat
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Ashutosh BapatДата:
Сообщение: Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)'
Следующее
От: Shlok KyalДата:
Сообщение: Re: undetected deadlock in ALTER SUBSCRIPTION ... REFRESH PUBLICATION