Re: corrupt pages detected by enabling checksums

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: corrupt pages detected by enabling checksums
Дата
Msg-id CA+U5nMKwguCMugQEDzoyx__Ug3TUcRKA_MejkkoJLkc0hS78gg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: corrupt pages detected by enabling checksums  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: corrupt pages detected by enabling checksums
Список pgsql-hackers
On 1 May 2013 16:33, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, May 1, 2013 at 11:29 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> I was worried because SyncOneBuffer checks whether it needs writing
>>> without taking a content lock, so the exclusive lock doesn't help. That
>>> makes sense, because you don't want a checkpoint to have to get a
>>> content lock on every buffer in the buffer pool. But it also means we
>>> need to follow the rules laid out in transam/README and dirty the pages
>>> before writing WAL.
>>
>> On further review, I think you're right about this.  We'd have a
>> problem if the following sequence of events were to occur:
>>
>> 1. vacuum writes the WAL record, but does not yet mark the buffer
>> dirty, and then goes into the tank
>> 2. checkpointer decides where the checkpoint will begin
>> 3. buffer pool is scanned as part of the checkpoint process, observing
>> target buffer not to be dirty
>> 4. vacuum finally wakes up and marks the buffer dirty
>> 5. crash, after WAL is flushed but before buffer is written
>>
>> However, on looking at this a little more, shouldn't we be doing
>> log_heap_clean() *before* we do visibilitymap_set()?
>
> Hit send too soon.
>
> To finish that thought: right now the order of operations is...
>
> 1. Perform the cleanup operations on the buffer.
> 2. Set the visibility map bit.
> 3. Log the visibility map change.
> 4. Log the cleanup operations.
>
> It seems to me that it would be better to do (4) as close to (1) as
> possible.  Isn't it bad if the operations are replayed in an order
> that differs from the order in which they were performed - or if, say,
> the first WAL record were replayed but the second were not, for
> whatever reason?

I agree, but that was in the original coding wasn't it?

Why aren't we writing just one WAL record for this action? We use a
single WAL record in other places where we make changes to multiple
blocks with multiple full page writes, e.g. index block split. That
would make the action atomic and we'd just have this...

1. Perform the cleanup operations on the buffer.
2. Set the visibility map bit.
3. Log the cleanup operations and visibility map change.

which can then be replayed with correct sequence, locking etc.
and AFAICS would likely be faster also.

--Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Greg Smith
Дата:
Сообщение: Re: [PATCH] add --throttle to pgbench (submission 3)
Следующее
От: Stefan Kaltenbrunner
Дата:
Сообщение: Re: Re: [BUGS] BUG #8128: pg_dump (>= 9.1) failed while dumping a scheme named "old" from PostgreSQL 8.4