Re: crash-safe visibility map, take three

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: crash-safe visibility map, take three
Дата
Msg-id 4CF4A8EC.2070408@enterprisedb.com
обсуждение исходный текст
Ответ на crash-safe visibility map, take three  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: crash-safe visibility map, take three  (Robert Haas <robertmhaas@gmail.com>)
Re: crash-safe visibility map, take three  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On 30.11.2010 06:57, Robert Haas wrote:
> I can't say I'm totally in love with any of these designs.  Anyone
> else have any ideas, or any opinions about which one is best?

Well, the design I've been pondering goes like this:

At vacuum:

1. Write an "intent" XLOG record listing a chunk of visibility map bits 
that are not currently set, that we are going to try to set. A chunk of 
say 100 bits would be about right.

2. Scan the 100 heap pages as we currently do, setting the visibility 
map bits as we go.

3. After the scan, lock the visibility map page, check which of the bits 
that we set in step 2 are still set (concurrent updates might've cleared 
some), and write a final XLOG record listing the set bits. This step 
isn't necessary for correctness, BTW, but without it you lose all the 
set bits if you crash before next checkpoint.

At replay, when we see the intent XLOG record, clear all the bits listed 
in it. This ensures that if we crashed and some of the visibility map 
bits were flushed to disk but the corresponding changes to the heap 
pages were not, the bits are cleared. When we see the final XLOG record, 
we set the bits.

Some care is needed with checkpoints. Setting visibility map bits in 
step 2 is safe because crash recovery will replay the intent XLOG record 
and clear any incorrectly set bits. But if a checkpoint has happened 
after the intent XLOG record was written, that's not true. This can be 
avoided by checking RedoRecPtr in step 2, and writing a new intent XLOG 
record if it has changed since the last intent XLOG record was written.

There's a small race condition in the way a visibility map bit is 
currently cleared. When a heap page is updated, it is locked, the update 
is WAL-logged, and the lock is released. The visibility map page is 
updated only after that. If the final vacuum XLOG record is written just 
after updating the heap page, but before the visibility map bit is 
cleared, replaying the final XLOG record will set a bit that should not 
have been set.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: profiling connection overhead
Следующее
От: Itagaki Takahiro
Дата:
Сообщение: Re: Tab completion for view triggers in psql