Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
От | Tomas Vondra |
---|---|
Тема | Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4 |
Дата | |
Msg-id | 36047d12-b943-4b7f-ad49-ebc7add2b3bc@vondra.me обсуждение исходный текст |
Ответ на | Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4 (Masahiko Sawada <sawada.mshk@gmail.com>) |
Ответы |
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
|
Список | pgsql-hackers |
On 11/11/24 23:41, Masahiko Sawada wrote: > On Mon, Nov 11, 2024 at 6:17 AM Tomas Vondra <tomas@vondra.me> wrote: >> >> If this analysis is correct, I think it's rather suspicious we don't >> reset the candidate fields on restart. Can those "old" values ever be >> valid? But I haven't tried resetting them. > > I had the same question. IIRC resetting them also fixes the > problem[1]. However, I got a comment from Alvaro[2]: > > Hmm, interesting -- I was studying some other bug recently involving the > xmin of a slot that had been invalidated and I remember wondering if > these "candidate" fields were being properly ignored when the slot is > marked not in use; but I didn't check. Are you sure that resetting them > when the slot is released is the appropriate thing to do? I mean, > shouldn't they be kept set while the slot is in use, and only reset if > we destroy it? > > Which made me re-investigate the issue and thought that it doesn't > necessarily need to clear these candidate values in memory on > releasing a slot as long as we're carefully updating restart_lsn. Not sure, but maybe it'd be useful to ask the opposite question. Why shouldn't it be correct to reset the fields, which essentially puts the slot into the same state as if it was just read from disk? That also discards all these values, and we can't rely on accidentally keeping something important info in memory (because if the instance restarts we'd lose that). But this reminds me that the patch I shared earlier today resets the slot in the ReplicationSlotAcquire() function, but I guess that's not quite correct. It probably should be in the "release" path. > Which seems a bit efficient for example when restarting from a very > old point. Of course, even if we reset them on releasing a slot, it > would perfectly work since it's the same as restarting logical > decoding with a server restart. I find the "efficiency" argument a bit odd. It'd be fine if we had a correct behavior to start with, but we don't have that ... Also, I'm not quite sure why exactly would it be more efficient? And how likely is this in practice? It seems to me that performance-sensitive cases can't do reconnects very often anyway, that's inherently inefficient. No? > I think > LogicalIncreaseRestartDecodingForSlot() should be fixed as it seems > not to be working expectedly, but I could not have proof that we > should either keep or reset them on releasing a slot. > Not sure. Chances are we need both fixes, if only to make LogicalIncreaseRestartDecodingForSlot more like the other function. regards -- Tomas Vondra
В списке pgsql-hackers по дате отправления: