Re: 64-bit XIDs again

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: 64-bit XIDs again
Дата
Msg-id CANP8+jKDQUrt1J6ZJY9s3v_pRd4B34AtohRWxB5QBSnEgFSVnA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: 64-bit XIDs again  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Список pgsql-hackers
On 31 July 2015 at 11:00, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
 
If use upgrade database cluster with pg_upgrade, he would stop old postmaster, pg_upgrade, start new postmaster. That means we start from the point when there is no running transactions. Thus, between tuples of old format there are two kinds: visible for everybody and invisible for everybody. When update or delete old tuple of first kind, we actually don't need to store its xmin anymore. We can store 64bit xmax in the place of xmin/xmax.

So, in order to switch to 64bit xmin/xmax, we have to take both free bits form t_infomask2 in order to implements it. They should indicate one of 3 possible tuple formats:
1) Old format: both xmin/xmax are 32bit
2) Intermediate format: xmax is 64bit, xmin is frozen.
3) New format: both xmin/xmax are 64bit.

But we can use same idea to implement epoch in heap page header as well. If new page header doesn't fits the page, then we don't have to insert something to this page, we just need to set xmax and flags to existing tuples. Then we can use two format from listed above: #1 and #2, and take one free bit from t_infomask2 for format indication.

I think we can do it by treating the page level epoch as a means of compression, rather than as a barrier which is how I first saw it.

New Page Format
New Page format has a page-level epoch.

First tuple inserted onto a block sets the page epoch. For later inserts, we check whether the current epoch matches the page epoch. If it doesn't, we try to freeze the page. If all tuples can be frozen on the page, we can then reset the page level epoch as part of our insert. If we can't then freeze all tuples on the page, we extend the relation to allow us to add a new page with current epoch on it. (We can't easily track which blocks have which epoch).

If an Update or Deletes sees a tuple from a prior epoch, we will try to freeze the tuple. If we can, then we reuse xmin as the xmax's epoch. If we can't we have problems and need a complex mechanism to avoid problems. I think it won't be necessary to invent that in the first release, we will just assume it is possible.

Current Pages
Current pages don't have an epoch, so we store a base epoch in the controlfile so we remember how to interpret them.

We don't create any new pages with this page format. For later inserts, we check whether the current epoch matches the page epoch. If it doesn't, we check whether its possible to rewrite the whole page to new format, freezing as we go. If that is not possible, we extend the relation to allow us to add a new page with current epoch on it. (We can't easily track which blocks have which epoch).

If an Update or Deletes sees a tuple from a prior epoch, we will try to freeze the tuple. If we can, then we reuse xmin as the xmax's epoch.

I don't think we need any new tuple formats to do this.

This means we have 
* changes to allow new bufpage format
* changes in hio.c for page selection
* changes to allow xmin to be reused when freeze bit set

Very little additional path length in the common case.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Encoding of early PG messages
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: brin index vacuum versus transaction snapshots