Re: HEAD seems to generate larger WAL regarding GIN index

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: HEAD seems to generate larger WAL regarding GIN index
Дата
Msg-id 53270C91.3020103@vmware.com
обсуждение исходный текст
Ответ на Re: HEAD seems to generate larger WAL regarding GIN index  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: HEAD seems to generate larger WAL regarding GIN index  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 03/17/2014 04:33 PM, Tom Lane wrote:
> Heikki Linnakangas <hlinnakangas@vmware.com> writes:
>> 2. Instead of storing the new compressed posting list in the WAL record,
>> store only the new item pointers added to the page. WAL replay would
>> then have to duplicate the work done in the main insertion code path:
>> find the right posting lists to insert to, decode them, add the new
>> items, and re-encode.
>
> That sounds fairly dangerous ... is any user-defined code involved in
> those decisions?

No.

>> This record format would be higher-level, in the sense that we would not
>> store the physical copy of the compressed posting list as it was formed
>> originally. The same work would be done at WAL replay. As the code
>> stands, it will produce exactly the same result, but that's not
>> guaranteed if we make bugfixes to the code later, and a master and
>> standby are running different minor version. There's not necessarily
>> anything wrong with that, but it's something to keep in mind.
>
> Version skew would be a hazard too, all right.  I think it's important
> that WAL replay be a pretty mechanical, predictable process.

Yeah. One particular point to note is that if in one place we do the 
more "high level" thing and have WAL replay re-encode the page as it 
sees fit, then we can *not* rely on the page being byte-by-byte 
identical in other places. Like, in vacuum, where items are deleted.

Heap and B-tree WAL records also rely on PageAddItem etc. to reconstruct 
the page, instead of making a physical copy of the modified parts. And 
_bt_restore_page even inserts the items physically in different order 
than the normal codepath does. So for good or bad, there is some 
precedence for this.

The imminent danger I see is if we change the logic on how the items are 
divided into posting lists, and end up in a situation where a master 
server adds an item to a page, and it just fits, but with the 
compression logic the standby version has, it cannot make it fit. As an 
escape hatch for that, we could have the WAL replay code try the 
compression again, with a larger max. posting list size, if it doesn't 
fit at first. And/or always leave something like 10 bytes of free space 
on every data page to make up for small differences in the logic.

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: on_exit_reset fails to clear DSM-related exit actions
Следующее
От: Robert Haas
Дата:
Сообщение: Re: HEAD seems to generate larger WAL regarding GIN index