Re: Block level concurrency during recovery

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: Block level concurrency during recovery
Дата
Msg-id 1224752254.27145.608.camel@ebony.2ndQuadrant
обсуждение исходный текст
Ответ на Re: Block level concurrency during recovery  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Ответы Re: Block level concurrency during recovery  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Re: Block level concurrency during recovery  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers
On Thu, 2008-10-23 at 09:09 +0300, Heikki Linnakangas wrote:

> However, we require that in b-tree vacuum, you take a cleanup lock on 
> *every* leaf page of the index, not only those that you modify. That's a 
> problem, because there's no trace of such pages in the WAL.

OK, good. Thanks for the second opinion. I'm glad you said that, cos I
felt sure anybody reading the patch would say "what the hell does this
bit do?". Now I can add it.

My solution is fairly simple:

As we pass through the table we keep track of which blocks need
visiting, then append that information onto the next WAL record. If the
last block doesn't contain removed rows, then we send a no-op message
saying which blocks to visit.

I'd already invented the XLOG_BTREE_VACUUM record, so now we just need
to augment it further with two fields: ordered array of blocks to visit,
and a doit flag.

Say we have a 10 block table, with rows to be removed on blocks 3,4,8. 
As we visit all 10 in sequence we would issue WAL records:

XLOG_BTREE_VACUUM block 3 visitFirst {1, 2} doit = true
XLOG_BTREE_VACUUM block 4 visitFirst {} doit = true
XLOG_BTREE_VACUUM block 8 visitFirst {5,6,7} doit = true
XLOG_BTREE_VACUUM block 10 visitFirst {9} doit = false

So that allows us to issue the same number of WAL messages yet include
all the required information to repeat the process correctly.

(The blocks can be visited out of sequence in some cases, hence the
ordered array of blocks to visit rather than just a first block value).

It would also be possible to introduce a special tweak there which is
that if the block is not in cache, don't read it in at all. If its not
in cache we know that nobody has a pin on it, so don't need to read it
in just to say "got the lock". That icing for later.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Unicode escapes in literals
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Deriving Recovery Snapshots