Re: raise ERROR between EndPrepare and PostPrepare_Locks causes ROLLBACK 2pc PAINC
| От | Michael Paquier |
|---|---|
| Тема | Re: raise ERROR between EndPrepare and PostPrepare_Locks causes ROLLBACK 2pc PAINC |
| Дата | |
| Msg-id | acM_fY09P1QpVkbL@paquier.xyz обсуждение исходный текст |
| Ответ на | Re: raise ERROR between EndPrepare and PostPrepare_Locks causes ROLLBACK 2pc PAINC (Andy Fan <zhihuifan1213@163.com>) |
| Ответы |
Re: raise ERROR between EndPrepare and PostPrepare_Locks causes ROLLBACK 2pc PAINC
|
| Список | pgsql-hackers |
On Wed, Mar 25, 2026 at 08:39:07AM +0800, Andy Fan wrote: > I found a similar but not exactly same case at 2014 [1] which > might be helpful to recall a boarder understanding on this area. > > [1] https://www.postgresql.org/message-id/534AF601.1030007%40vmware.com Incorrect shared state when an ERROR happens at an arbitrary location is usually bad, yes. For this one, your suggestion of delaying the end of the critical section started at StartPrepare() and ending in EndPrepare() is not an acceptable solution as far as I can see, unfortunately: it would mean doing a SyncRepWaitForLSN() while in a critical section, and I doubt we'd want to do that. Anyway, I doubt that this one is worth caring for. The current locking 2PC scheme means, as far as I remember, that it is not really possible to interact with an external command in a specific session between the EndPrepare() and the PostPrepare_Locks() calls. To put it in other words, let's imagine that we use a breakpoint between these two calls (or a wait injection point if you automate that). Is it possible for a second backend to mess with the state of the first backend waiting until its locks are transfered to the dummy PGPROC entry? That's what the 2014 thread is about: there was a race condition reachable between two sessions. If the answer to this question is yes, I'd agree that this is something that deserves a closer lookup. And before you ask: attempting to interact with a 2PC state from a second session with a first session waiting between these two points would not work: the 2PC entry is locked, cleaned up after EndPrepare() and PostPrepare_Locks() at PostPrepare_Twophase(). Trying to request an access to this entry fails, as the first backend is marked as locking it. A second backend attempting to lock it would fail, complaining that the 2PC entry with a GXID is "busy". SyncRepWaitForLSN() would be a problematic pattern between the EndPrepare() and the PostPrepare_Locks(), but we never ERROR there on purpose: even if we cancel while waiting for a transaction commit we'd just get a WARNING, meaning that we'd be able to transfer our locks anyway. Or perhaps you have a realistic scenario where it is possible to mess up with the shared state, outside a elog(ERROR) forced between these two points? -- Michael
Вложения
В списке pgsql-hackers по дате отправления: