Re: Reviewing freeze map code
От | Masahiko Sawada |
---|---|
Тема | Re: Reviewing freeze map code |
Дата | |
Msg-id | CAD21AoCHs9hKnAbtiEuU1GcdiNRkLDmcpPrfXyDxmi71foBE6A@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Reviewing freeze map code (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: Reviewing freeze map code
(Robert Haas <robertmhaas@gmail.com>)
|
Список | pgsql-hackers |
On Wed, Jun 15, 2016 at 9:56 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Jun 15, 2016 at 2:41 AM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> I spent some time chasing down the exact circumstances. I suspect >> that there may be an interlocking problem in heap_update. Using the >> line numbers from cae1c788 [1], I see the following interaction >> between the VACUUM, UPDATE and SELECT (pg_check_visible) backends, all >> in reference to the same block number: >> >> [VACUUM] sets all visible bit >> >> [UPDATE] heapam.c:3931 HeapTupleHeaderSetXmax(oldtup.t_data, xmax_old_tuple); >> [UPDATE] heapam.c:3938 LockBuffer(buffer, BUFFER_LOCK_UNLOCK); >> >> [SELECT] LockBuffer(buffer, BUFFER_LOCK_SHARE); >> [SELECT] observes VM_ALL_VISIBLE as true >> [SELECT] observes tuple in HEAPTUPLE_DELETE_IN_PROGRESS state >> [SELECT] barfs >> >> [UPDATE] heapam.c:4116 visibilitymap_clear(...) > > Yikes: heap_update() sets the tuple's XMAX, CMAX, infomask, infomask2, > and CTID without logging anything or clearing the all-visible flag and > then releases the lock on the heap page to go do some more work that > might even ERROR out. Only if that other work all goes OK do we > relock the page and perform the WAL-logged actions. > > That doesn't seem like a good idea even in existing releases, because > you've taken a tuple on an all-visible page and made it not > all-visible, and you've made a page modification that is not > necessarily atomic without logging it. This is is particularly bad in > 9.6, because if that page is also all-frozen then XMAX will eventually > be pointing into space and VACUUM will never visit the page to > re-freeze it the way it would have done in earlier releases. However, > even in older releases, I think there's a remote possibility of data > corruption. Backend #1 makes these changes to the page, releases the > lock, and errors out. Backend #2 writes the page to the OS. DBA > takes a hot backup, tearing the page in the middle of XMAX. Oops. > > I'm not sure what to do about this: this part of the heap_update() > logic has been like this forever, and I assume that if it were easy to > refactor this away, somebody would have done it by now. > How about changing collect_corrupt_items to acquire AccessExclusiveLock for safely checking? Regards, -- Masahiko Sawada
В списке pgsql-hackers по дате отправления: