On Mon, Jun 14, 2010 at 1:19 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
>> I *think* that the answer to this parenthesized question is "no".
>> When we vacuum a page, we set the LSN on both the heap page and the
>> visibility map page. Therefore, neither of them can get written to
>> disk until the WAL record is flushed, but they could get flushed in
>> either order. So the visibility map page could get flushed before the
>> heap page, as the non-parenthesized portion of the comment indicates.
>
> Right.
>
>> However, at least in theory, it seems like we could fix this up during
>> redo.
>
> Setting a bit in the visibility map is currently not WAL-logged, but yes
> once we add WAL-logging, that's straightforward to fix.
Eh, so. Suppose - for the sake of argument - we do the following:
1. Allocate an additional infomask(2) bit that means "xmin is frozen,
no need to call XidInMVCCSnapshot()". When we freeze a tuple, we set
this bit in lieu of overwriting xmin. Note that freezing pages is
already WAL-logged, so redo is possible.
2. Modify VACUUM so that, when the page is observed to be all-visible,
it will freeze all tuples on the page, set PD_ALL_VISIBLE, and set the
visibility map bit, writing a single XLOG record for the whole
operation (possibly piggybacking on XLOG_HEAP2_CLEAN if the same
vacuum already removed tuples; otherwise and/or when no tuples were
removed writing XLOG_HEAP2_FREEZE or some new record type). This
loses no forensic information because of (1). (If the page is NOT
observed to be all-visible, we freeze individual tuples only when they
hit the current age thresholds.)
Setting the visibility map bit is now crash-safe.
Please poke holes.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company