confusing results from pg_get_replication_slots()
| От | Robert Haas |
|---|---|
| Тема | confusing results from pg_get_replication_slots() |
| Дата | |
| Msg-id | CA+TgmobCwyv-vDYMrhQJdW6TZ=TX+g9VFbFT0jpqonmFZz97RA@mail.gmail.com обсуждение исходный текст |
| Ответы |
Re: confusing results from pg_get_replication_slots()
Re: confusing results from pg_get_replication_slots() |
| Список | pgsql-hackers |
Hi, Since v13, pg_get_replication_slots() returns a wal_status field that supposedly tells you whether the slot is reserving WAL. It returns either "reserved", "extended", "unreserved", or "lost". However, the logic is more complicated than you might expect from a reporting function. We normally call GetWALAvailability() and report whatever it tells us, but there are two exceptions. First, if the slot is invalidated, we skip calling GetWALAvailability() and assume that the answer is "lost". Second, if something is still connected to the slot, we assume that any apparent "lost" answer is due to a race condition and instead return "unreserved". Both of these exceptions can occur at the same time, and the checks are done in the order I've listed here. Therefore, a still-connected slot which is invalidated is shown as "unreserved" rather than, as I would have expected, as "lost". I don't believe we should apply both of these exceptions at the same time. If we actually called GetWALAvailability() and it said the WAL was lost, then perhaps the fact that somebody's still-connected to the slot is contrary evidence and maybe due to some race condition they can catch up again. But if we didn't call GetWALAvailability() and thought that the WAL was lost because the slot is invalidated, the fact that some process is still connected to that slot doesn't invalidate the conclusion. Once the slot is invalidated, it's ignored for purposes of deciding how much WAL to retain in the future, and it's ignored for hot_standby_feedback purposes. It is no longer protecting against any of the things against which slots are supposed to protect. For all practical intents and purposes, such a slot is no more - has ceased to be - has expired and gone to meet its maker - it's an ex-slot. It makes no sense to me to display that slot with a status that shows that there is some hope of recovery when in fact there is none. Note, by the way, that in existing releases, connections to already-invalidated physical slots are not blocked. This has been changed, but only in master. Here is a patch to make invalidated slots always report as "lost", which I propose to back-patch to all supported versions. Many people were involved in the diagnosis of this issue, but particular shot-outs are appropriate to my colleague Nitin Chobisa, who produced the first reproducible test case demonstrating the issue, and my colleague Pavan Deolasee, who further refined the test case and clearly established that it was possible for slots to emerge from the "lost" state, going back to "unreserved". -- Robert Haas EDB: http://www.enterprisedb.com
Вложения
В списке pgsql-hackers по дате отправления: