Re: VM corruption on standby
От | Aleksander Alekseev |
---|---|
Тема | Re: VM corruption on standby |
Дата | |
Msg-id | CAJ7c6TMpt9Cr+M2_G97iKp_-TfLNm7ZOtHWyTVpdQKmocxchHw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: VM corruption on standby (Andrey Borodin <x4mmm@yandex-team.ru>) |
Ответы |
Re: VM corruption on standby
Re: VM corruption on standby |
Список | pgsql-hackers |
Hi Andrey, > 0. checkpointer is going to flush a heap buffer but waits on content lock > 1. client is resetting PD_ALL_VISIBLE from page > 2. postmaster is killed and command client to go down > 3. client calls LWLockReleaseAll() at ProcKill() (?) > 4. checkpointer flushes buffer with reset PG_ALL_VISIBLE that is not WAL-logged to standby > 5. subsequent deletes do not log resetting this bit > 6. deleted data is observable on standby with IndexOnlyScan Thanks for investigating this in more detail. If this is indeed what happens it is a violation of the "log before changing" approach. For this reason we have PageHeaderData.pd_lsn for instance - to make sure pages are evicted only *after* the record that changed it is written to disk (because WAL records can't be applied to pages from the future). I guess the intent here could be to do an optimization of some sort but the facts that 1. the instance can be killed at any time and 2. there might be replicas - were not considered. > Any idea how to fix this? IMHO: logging the changes first, then allowing to evict the page.
В списке pgsql-hackers по дате отправления: