Re: New WAL record to detect the checkpoint redo location
От | Amit Kapila |
---|---|
Тема | Re: New WAL record to detect the checkpoint redo location |
Дата | |
Msg-id | CAA4eK1JVKZGRHLOEotWi+e+09jucNedqpkkc-Do4dh5FTAU+5w@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: New WAL record to detect the checkpoint redo location (Robert Haas <robertmhaas@gmail.com>) |
Список | pgsql-hackers |
On Thu, Sep 21, 2023 at 9:06 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Sep 21, 2023 at 4:22 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > After the 0003 patch, do we need acquire exclusive lock via > > WALInsertLockAcquireExclusive() for non-shutdown checkpoints. Even the > > comment "We must block concurrent insertions while examining insert > > state to determine the checkpoint REDO pointer." seems to indicate > > that it is not required. If it is required then we may want to change > > the comments and also acquiring the locks twice will have more cost > > than acquiring it once and write the new WAL record under that lock. > > I think the comment needs updating. I don't think we can do curInsert > = XLogBytePosToRecPtr(Insert->CurrBytePos) without taking the locks. > Same for Insert->fullPageWrites. > If we can't do those without taking all the locks then it is fine but just wanted to give it a try to see if there is a way to avoid in case of online (non-shutdown) checkpoints. For example, curInsert is used only for the shutdown path, so we don't need to acquire all locks for it in the cases except for the shutdown case. Here, we are reading Insert->fullPageWrites which requires an insertion lock but not all the locks (as per comments in structure XLogCtlInsert). Now, I haven't done detailed analysis for XLogCtl->InsertTimeLineID/XLogCtl->PrevTimeLineID but some places reading InsertTimeLineID have a comment like "Given that we're not in recovery, InsertTimeLineID is set and can't change, so we can read it without a lock." which suggests that some analysis is required whether reading those requires all locks in this code path. OTOH, it won't matter to acquire all locks in this code path for the reasons mentioned by you and it may help in keeping the code simple. So, it is up to you to take the final call on this matter. I am fine with your decision. > > > BTW, I would like to mention that there is a slight interaction of > > this work with the patch to upgrade/migrate slots [1]. Basically in > > [1], to allow slots migration from lower to higher version, we need to > > ensure that all the WAL has been consumed by the slots before clean > > shutdown. However, during upgrade we can generate few records like > > checkpoint which we will ignore for the slot consistency checking as > > such records doesn't matter for data consistency after upgrade. We > > probably need to add this record to that list. I'll keep an eye on > > both the patches so that we don't miss that interaction but mentioned > > it here to make others also aware of the same. > > If your approach requires a code change every time someone adds a new > WAL record that doesn't modify table data, you might want to rethink > the approach a bit. > I understand your hesitation and we have discussed several approaches that do not rely on the WAL record type to determine if the slots have caught up but the other approaches seem to have different other downsides. I know it may not be a good idea to discuss those here but as there was a slight interaction with this work, so I thought to bring it up. To be precise, we need to ensure that we ignore WAL records that got generated during pg_upgrade operation (say during pg_upgrade --check). The approach we initially followed was to check if the slot's confirmed_flush_lsn is equal to the latest checkpoint in pg_controldata (which is the shutdown checkpoint after stopping the server). This approach doesn't work for the use case where the user runs pg_upgrade --check before actually performing the upgrade [1]. This is because during the upgrade check, the server will be stopped/started and update the position of the latest checkpoint, causing the check to fail in the actual upgrade and leading pg_upgrade to believe that the slot has not been caught up. To address the issues in the above approach, we also discussed several alternative approaches[2][3]: a) Adding a new field in pg_controldata to record the last checkpoint that happens in non-upgrade mode, so that we can compare the slot's confirmed_flush_lsn with this value. However, we were not sure if this was a good enough reason to add a new field in controldata field and sprinkle IsBinaryUpgrade check in checkpointer code path. b) Advancing each slot's confirmed_flush_lsn to the latest position if the first upgrade check passes. This way, when performing the actual upgrade, the confirmed_flush_lsn will also pass. However, internally advancing the LSN seems unconventional. c) Introducing a new pg_upgrade option to skip the check for slot catch-up so that if it is already done at the time of pg_upgrade --check, we can avoid rechecking during actual upgrade. Although this might work, the user would need to specify this manually, which is not ideal. d) Document this and suggest users consume the WALs, but this doesn't look acceptable to users. All the above approaches have their downsides, prompting us to consider the WAL scan approach which is to scan the end of the WAL for records that should have been streamed out. This approach was first proposed by Andres[4] and was chosen[5] after considering all other approaches. If we don't like relying on WAL record types then I think the alternative (a) to add a new field in ControlDataFile is worth considering. [1] https://www.postgresql.org/message-id/CAA4eK1LzeZLoTLaAuadmuiggc5mq39oLY6fK95oFKiPBPBf%2BeQ%40mail.gmail.com [2] https://www.postgresql.org/message-id/OS0PR01MB571640E1B58741979A5E586594F7A%40OS0PR01MB5716.jpnprd01.prod.outlook.com [3] https://www.postgresql.org/message-id/TYAPR01MB5866EF7398CB13FFDBF230E7F5F0A%40TYAPR01MB5866.jpnprd01.prod.outlook.com [4] https://www.postgresql.org/message-id/20230725170319.h423jbthfohwgnf7%40awork3.anarazel.de [5] https://www.postgresql.org/message-id/CAA4eK1KqqWayKtRhvyRgkhEHvAUemW_dEqgFn7UOG3D4B6f0ew%40mail.gmail.com -- With Regards, Amit Kapila.
В списке pgsql-hackers по дате отправления:
Следующее
От: Bharath RupireddyДата:
Сообщение: Re: [PoC] pg_upgrade: allow to upgrade publisher node