Re: WAL replay bugs

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: WAL replay bugs
Дата
Msg-id 53500857.5080304@vmware.com
обсуждение исходный текст
Ответ на Re: WAL replay bugs  (Michael Paquier <michael.paquier@gmail.com>)
Ответы Re: WAL replay bugs  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: WAL replay bugs  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Список pgsql-hackers
On 04/08/2014 06:41 AM, Michael Paquier wrote:
> On Tue, Apr 8, 2014 at 3:16 AM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>>
>> I've been playing with a little hack that records a before and after image
>> of every page modification that is WAL-logged, and writes the images to a
>> file along with the LSN of the corresponding WAL record. I set up a
>> master-standby replication with that hack in place in both servers, and ran
>> the regression suite. Then I compared the after images after every WAL
>> record, as written on master, and as replayed by the standby.
> Assuming that adding some dedicated hooks in the core able to do
> actions before and after a page modification occur is not *that*
> costly (well I imagine that it is not acceptable in terms of
> performance), could it be possible to get that in the shape of a
> extension that could be used to test WAL record consistency? This may
> be an idea to think about...

Yeah, working on it. It can live as a patch set if nothing else.

This has been very fruitful, I just committed another fix for a bug I 
found with this earlier today.

There are quite a few things that cause differences between master and 
standby. We have hint bits in many places, unused space that isn't 
zeroed etc.

Two things that are not bugs, but I'd like to change just to make this 
tool easier to maintain, and to generally clean things up:

1. When creating a sequence, we first use simple_heap_insert() to insert 
the sequence tuple, which creates a WAL record. Then we write a new 
sequence RM WAL record about the same thing. The reason is that the WAL 
record written by regular heap_insert is bogus for a sequence tuple. 
After replaying just the heap insertion, but not the other record, the 
page doesn't have the magic value indicating that it's a sequence, i.e. 
it's broken as a sequence page. That's OK because we only do this when 
creating a new sequence, so if we crash between those two records, the 
whole relation is not visible to anyone. Nevertheless, I'd like to fix 
that by using PageAddItem directly to insert the tuple, instead of 
simple_heap_insert. We have to override the xmin field of the tuple 
anyway, and we don't need any of the other services like finding the 
insert location, toasting, visibility map or freespace map updates, that 
simple_heap_insert() provides.

2. _bt_restore_page, when restoring a B-tree page split record. It adds 
tuples to the page in reverse order compared to how it's done in master. 
There is a comment noting that, and it asks "Is it worth changing just 
on general principles?". Yes, I think it is.

Any objections to changing those two?

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Следующее
От: Greg Stark
Дата:
Сообщение: Re: Clock sweep not caching enough B-Tree leaf pages?