Re: [PoC] pg_upgrade: allow to upgrade publisher node
От | Julien Rouhaud |
---|---|
Тема | Re: [PoC] pg_upgrade: allow to upgrade publisher node |
Дата | |
Msg-id | 20230414061248.vdsxz2febjo3re6h@jrouhaud обсуждение исходный текст |
Ответ на | RE: [PoC] pg_upgrade: allow to upgrade publisher node ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>) |
Ответы |
RE: [PoC] pg_upgrade: allow to upgrade publisher node
|
Список | pgsql-hackers |
Hi, Sorry for the delay, I didn't had time to come back to it until this afternoon. On Mon, Apr 10, 2023 at 09:18:46AM +0000, Hayato Kuroda (Fujitsu) wrote: > > I have analyzed about the point but it seemed to be difficult. This is because > some additional records like followings may be inserted. PSA the script which is > used for testing. Note that "double CHECKPOINT_SHUTDOWN" issue might be wrong, > so I wanted to withdraw it once. Sorry for noise. > > * HEAP/HEAP2 records. These records may be inserted by checkpointer. > > IIUC, if there are tuples which have not been flushed yet when shutdown is requested, > the checkpointer writes back all of them into heap file. At that time many WAL > records are generated. I think we cannot predict the number of records beforehand. > > * INVALIDATION(S) records. These records may be inserted by VACUUM. > > There is a possibility that autovacuum runs and generate WAL records. I think we > cannot predict the number of records beforehand because it depends on the number > of objects. > > * RUNNING_XACTS record > > It might be a timing issue, but I found that sometimes background writer generated > a XLOG_RUNNING record. According to the function BackgroundWriterMain(), it will be > generated when the process spends 15 seconds since last logging and there are > important records. I think it is difficult to predict whether this will be appeared or not. I don't think that your analysis is correct. Slots are guaranteed to be stopped after all the normal backends have been stopped, exactly to avoid such extraneous records. What is happening here is that the slot's confirmed_flush_lsn is properly updated in memory and ends up being the same as the current LSN before the shutdown. But as it's a logical slot and those records aren't decoded, the slot isn't marked as dirty and therefore isn't saved to disk. You don't see that behavior when doing a manual checkpoint before (per your script comment), as in that case the checkpoint also tries to save the slot to disk but then finds a slot that was marked as dirty and therefore saves it. In your script's scenario, when you restart the server the previous slot data is restored and the confirmed_flush_lsn goes backward, which explains those extraneous records. It's probably totally harmless to throw away that value for now (and probably also doesn't lead to crazy amount of work after restart, I really don't know much about the logical slot code), but clearly becomes problematic with your usecase. One easy way to fix this is to teach the checkpoint code to force saving the logical slots to disk even if they're not marked as dirty during a shutdown checkpoint, as done in the attached v1 patch (renamed as .txt to not interfere with the cfbot). With this patch applied I reliably only see a final shutdown checkpoint record with your scenario. Now such a change will make shutdown a bit more expensive when using logical replication, even if in 99% of cases you will not need to save the confirmed_flush_lsn value, so I don't know if that's acceptable or not.
Вложения
В списке pgsql-hackers по дате отправления: