Re: Crash safe visibility map vs hint bits

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: Crash safe visibility map vs hint bits
Дата
Msg-id 201012151454.oBFEsgw21144@momjian.us
обсуждение исходный текст
Ответ на Re: Crash safe visibility map vs hint bits  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Список pgsql-hackers
Heikki Linnakangas wrote:
> On 04.12.2010 09:14, Jesper@Krogh.cc wrote:
> > There has been a lot discussion about index-only scans and how to make the visibillity map crash safe. Then
followedby a good discussion about hint bits.
 
> >
> > What seems to be the main concern is the added wal volume and it makes me wonder if there is a way in-between that
looksmore like hint bits.
 
> >
> > How about lazily wal-log the complete visibility map say every X minutes or N amount of tuple updates and make the
walrecovery jobs of rechecking visibility of pages touched by the wal stream on recovery.
 
> 
> If you WAL-log the visibility map changes after-the-fact, it doesn't 
> solve the race condition we're struggling with: the visibility map 
> change might hit the disk before the PD_ALL_VISIBLE to the heap page. If 
> you crash, you can end up with a situation where the PD_ALL_VISIBLE flag 
> on the heap page is not set, but the bit in the visibility map is. Which 
> causes serious issues later on.

Based on hacker emails and a discussion I had with Heikki while we were
in Germany, I have updated the index-only scans wiki to document a known
solution to making the visibility map crash-safe for use by index-only
scan use:
http://wiki.postgresql.org/wiki/Index-only_scans#Making_the_Visibility_Map_Crash-Safe
Making the Visibility Map Crash-SafeCurrently, a heap page that has all-visible tuples is marked by vacuumas
PD_ALL_VISIBLEand the visibility map (VM) bit is set. This iscurrently unlogged, and a crash could require these to be
setagain.The complexity is that for index-only scans, the VM bit has meaning, andcannot be incorrectly set (though it
canbe incorrectly cleared becausethat would just result in additional heap access). If bothPD_ALL_VISIBLE and the VM
bitwere to be set, and a crash resulted theVM bit being written to disk, but not the PD_ALL_VISIBLE bit, a laterheap
accessthat wrote a conditionally-visible row would not know toclear the VM bit, causing incorrect results for
index-onlyscans.The solution is to WAL log the VM set bit activity. This will causefull-page writes for the VM page,
butthis is much less than WAL-loggingeach heap page because a VM page represents many heap pages. Thisrequires that the
VMpage not be written to disk until its VM-set WALrecord is fsynced to disk. Also, during crash recovering, reading
theVM-setWAL record would cause both the VM-set and heap PD_ALL_VISIBLE tobe set. 
 

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jan Urbański
Дата:
Сообщение: Re: hstores in pl/python
Следующее
От: Robert Haas
Дата:
Сообщение: Re: hstores in pl/python