Обсуждение: 64-bit XIDs again

Поиск

Список

Период

Сортировка

64-bit XIDs again

От

Alexander Korotkov

Дата:

30 июля 2015 г., 16:27:18

Hackers,

I know there were already couple of threads about 64bit XIDs.

http://www.postgresql.org/message-id/42CCCFE9.9040809@intellilink.co.jp

http://www.postgresql.org/message-id/4F6C0E13.3080406@wiesinger.com

I read them carefully, but I didn't find all the arguments for 64bit XIDs mentioned. That's why I'd like to raise this subject again.

Now hardware capabilities are much higher than when Postgres was designed. In the modern PostgreSQL scalability tests it's typical to achieve 400 000 - 500 000 tps with pgbench. With such tps it takes only few minutes to achieve default autovacuum_freeze_max_age = 200 millions.

Notion of wraparound is evolutioning during the time. Initially it was something that almost never happens. Then it becomes something that could happen rarely, and we should be ready to it (freeze tuples in advance). Now, it becomes quite frequent periodic event for high load database. DB admins should take into account its performance impact.

Typical scenario that I've faced in real life was so. Database is divided into operative and archive parts. Operative part is small (dozens of gigabytes) and it serves most of transactions. Archive part is relatively large (some terabytes) and it serves rare selects and bulk inserts. Autovacuum work very active for operative part and very lazy for archive part (as it's expected). System works well until one day age of archive tables exceeds autovacuum_freeze_max_age. Then all autovacuum workers starts to do "autovacuum to prevent wraparound" on archive tables. If even system IO survive this, operative tables get bloated because all autovacuum workers are busy with archive tables. In such situation I typically advise to increase autovacuum_freeze_max_age and run vacuum freeze manually when system have enough of free resources.

As I mentioned in CSN thread, it would be nice to replace XID with CSN when setting hint bits for tuple. In this case when hint bits are set we don't need any additional lookups to check visibility.
http://www.postgresql.org/message-id/CAPpHfdv7BMwGv=OfUg3S-jGVFKqHi79pR_ZK1Wsk-13oZ+cy5g@mail.gmail.com

Introducing 32-bit CSN doesn't seem reasonable for me, because it would double our troubles with wraparound.

Also, I think it's possible to migrate to 64-bit XIDs without breaking pg_upgrade. Old tuples can be leaved with 32-bit XIDs while new tuples would be created with 64-bit XIDs. We can use free bits in t_infomask2 to distinguish old and new formats.

Any thoughts? Do you think 64-bit XIDs worth it?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: 64-bit XIDs again

От

Simon Riggs

Дата:

30 июля 2015 г., 17:14:22

On 30 July 2015 at 14:26, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:

Any thoughts? Do you think 64-bit XIDs worth it?

The problem of freezing is painful, but not impossible, which is why we have held out so long.

The problem of very long lived snapshots is coming closer at the same speed as freezing; there is no solution to that without 64-bit xids throughout whole infrastructure, or CSNs.

The opportunity for us to have SQL Standard historical databases becomes possible with 64-bit xids, or CSNs. That is a high value goal.

I personally now think we should thoroughly investigate 64-bit xids. I don't see this as mere debate, I see this as something that we can make a patch for and scientifically analyze the pros and cons through measurement.

--

Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: 64-bit XIDs again

От

Heikki Linnakangas

Дата:

30 июля 2015 г., 17:24:58

On 07/30/2015 04:26 PM, Alexander Korotkov wrote:
> Also, I think it's possible to migrate to 64-bit XIDs without breaking
> pg_upgrade. Old tuples can be leaved with 32-bit XIDs while new tuples
> would be created with 64-bit XIDs. We can use free bits in t_infomask2 to
> distinguish old and new formats.

I think we should move to 64-bit XIDs in in-memory structs snapshots, 
proc array etc. And expand clog to handle 64-bit XIDs. But keep the 
xmin/xmax fields on heap pages at 32-bits, and add an epoch-like field 
to the page header so that logically the xmin/xmax fields on the page 
are 64 bits wide, but physically stored in 32 bits. That's possible as 
long as no two XIDs on the same page are more than 2^31 XIDs apart. So 
you still need to freeze old tuples on the page when that's about to 
happen, but it would make it possible to have more than 2^32 XID 
transactions in the clog. You'd never be forced to do anti-wraparound 
vacuums, you could just let the clog grow arbitrarily large.

There is a big downside to expanding xmin/xmax to 64 bits: it takes 
space. More space means more memory needed for caching, more memory 
bandwidth, more I/O, etc.

- Heikki

Re: 64-bit XIDs again

От

Joe Conway

Дата:

30 июля 2015 г., 17:32:26

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/30/2015 07:14 AM, Simon Riggs wrote:
> On 30 July 2015 at 14:26, Alexander Korotkov
> <a.korotkov@postgrespro.ru
> 
> Any thoughts? Do you think 64-bit XIDs worth it?
> 
> The problem of freezing is painful, but not impossible, which is
> why we have held out so long.
> 
> The problem of very long lived snapshots is coming closer at the
> same speed as freezing; there is no solution to that without 64-bit
> xids throughout whole infrastructure, or CSNs.
> 
> The opportunity for us to have SQL Standard historical databases
> becomes possible with 64-bit xids, or CSNs. That is a high value
> goal.
> 
> I personally now think we should thoroughly investigate 64-bit
> xids. I don't see this as mere debate, I see this as something that
> we can make a patch for and scientifically analyze the pros and
> cons through measurement.

+1

I've been thinking along similar lines to both of you for quite some
time now. I think at the least we should explore an initdb time option
- -- we can and should measure the pros and cons.


- -- 
Joe Conway
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJVujVzAAoJEDfy90M199hlphUP/j/+TEJO05h+aLD1TrddZ01f
Fq2ijyvQjfe3aBN/4DEKuVBPMsQ6ZWLWtYJ3/FpktlFUIoDdDWJY//8rb63KqUut
tiSMI4MNzIp/ImvyR1pMJpAmfF9zHsJOiC8Hjpj9J8ity1BXm5My0XYzf9cux/KN
Qr8e5RTiPNKZyCB7w5Ci9byYIQKwHS9UyoHhgXQhZTopYLqrN9G7KxjKHZjTYxAs
6uJowQqsoevlgi15L8Ojk+KuJuowEHhVthhZ0f147twrOu2PwvhPP0/tf3TCSzKW
I3TGC8ChQ67+h/x4lF2LMENvDwGZFh0fB4foeu0F3oR5YX4jG6pic/k7BJFPke3f
YPk8PnA4fn5PM2otgikExIM6NFm+1y4JEeVOcGaA0GungdbgcBuN4p8gLvO9zJRa
qsJp6U+FHK7m68jBVAlo0aVERikh29devypOSWhz474nvYsZIm9bfQ4te+DQECzw
m3a9KJWJUy7Bj8xkwLpMMXmm83bIbvNMj8oDlg9tMo//CEzSsXyNjGUPG0/U9jIs
YHZUYd24i8Wg4+BjdQ19ULJH22ROZa2JBq658t6n97vab7HS3ZWGPhao0piYW20i
/q8wmd52KE0e4gg4Jixc1p8kPvIItFeJliEPgbRC1+7vnZu0rkENxXpTuS/g1fn0
Ql/P9C7Nb97cux9EvlZv
=gX7d
-----END PGP SIGNATURE-----

Re: 64-bit XIDs again

От

Alexander Korotkov

Дата:

30 июля 2015 г., 17:57:41

On Thu, Jul 30, 2015 at 5:24 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 07/30/2015 04:26 PM, Alexander Korotkov wrote:
Also, I think it's possible to migrate to 64-bit XIDs without breaking
pg_upgrade. Old tuples can be leaved with 32-bit XIDs while new tuples
would be created with 64-bit XIDs. We can use free bits in t_infomask2 to
distinguish old and new formats.

I think we should move to 64-bit XIDs in in-memory structs snapshots, proc array etc. And expand clog to handle 64-bit XIDs. But keep the xmin/xmax fields on heap pages at 32-bits, and add an epoch-like field to the page header so that logically the xmin/xmax fields on the page are 64 bits wide, but physically stored in 32 bits. That's possible as long as no two XIDs on the same page are more than 2^31 XIDs apart. So you still need to freeze old tuples on the page when that's about to happen, but it would make it possible to have more than 2^32 XID transactions in the clog. You'd never be forced to do anti-wraparound vacuums, you could just let the clog grow arbitrarily large.

Nice idea. Storing extra epoch would be extra 4 bytes per heap tuple instead of extra 8 bytes per tuple if storing 64 bits xmin/xmax.

But if first column is aligned to 8 bytes (i.e. bigserial) would we loose this 4 bytes win for alignment?

There is a big downside to expanding xmin/xmax to 64 bits: it takes space. More space means more memory needed for caching, more memory bandwidth, more I/O, etc.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: 64-bit XIDs again

От

Heikki Linnakangas

Дата:

30 июля 2015 г., 17:59:55

On 07/30/2015 05:57 PM, Alexander Korotkov wrote:
> On Thu, Jul 30, 2015 at 5:24 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>
>> I think we should move to 64-bit XIDs in in-memory structs snapshots, proc
>> array etc. And expand clog to handle 64-bit XIDs. But keep the xmin/xmax
>> fields on heap pages at 32-bits, and add an epoch-like field to the page
>> header so that logically the xmin/xmax fields on the page are 64 bits wide,
>> but physically stored in 32 bits. That's possible as long as no two XIDs on
>> the same page are more than 2^31 XIDs apart. So you still need to freeze
>> old tuples on the page when that's about to happen, but it would make it
>> possible to have more than 2^32 XID transactions in the clog. You'd never
>> be forced to do anti-wraparound vacuums, you could just let the clog grow
>> arbitrarily large.
>
> Nice idea. Storing extra epoch would be extra 4 bytes per heap tuple
> instead of extra 8 bytes per tuple if storing 64 bits xmin/xmax.

No, I was thinking that the epoch would be stored *per page*, in the 
page header.

- Heikki

Re: 64-bit XIDs again

От

Simon Riggs

Дата:

30 июля 2015 г., 18:04:37

On 30 July 2015 at 15:24, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 07/30/2015 04:26 PM, Alexander Korotkov wrote:
Also, I think it's possible to migrate to 64-bit XIDs without breaking
pg_upgrade. Old tuples can be leaved with 32-bit XIDs while new tuples
would be created with 64-bit XIDs. We can use free bits in t_infomask2 to
distinguish old and new formats.

I think we should move to 64-bit XIDs in in-memory structs snapshots, proc array etc. And expand clog to handle 64-bit XIDs. But keep the xmin/xmax fields on heap pages at 32-bits, and add an epoch-like field to the page header so that logically the xmin/xmax fields on the page are 64 bits wide, but physically stored in 32 bits. That's possible as long as no two XIDs on the same page are more than 2^31 XIDs apart. So you still need to freeze old tuples on the page when that's about to happen, but it would make it possible to have more than 2^32 XID transactions in the clog. You'd never be forced to do anti-wraparound vacuums, you could just let the clog grow arbitrarily large.

This is a good scheme, but it assumes, as you say, that you can freeze tuples that are more than 2^31 xids apart. That is no longer a safe assumption on high transaction rate systems with longer lived snapshots.

There is a big downside to expanding xmin/xmax to 64 bits: it takes space. More space means more memory needed for caching, more memory bandwidth, more I/O, etc.

My feeling is that the overhead will recede in time. Having a nice, simple change to remove old bugs and new would help us be more robust.

But let's measure the overhead before we try to optimize it away.

--

Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: 64-bit XIDs again

От

"Joshua D. Drake"

Дата:

30 июля 2015 г., 18:12:44

On 07/30/2015 08:04 AM, Simon Riggs wrote:

>     There is a big downside to expanding xmin/xmax to 64 bits: it takes
>     space. More space means more memory needed for caching, more memory
>     bandwidth, more I/O, etc.
>
>
> My feeling is that the overhead will recede in time. Having a nice,
> simple change to remove old bugs and new would help us be more robust.
>
> But let's measure the overhead before we try to optimize it away.

In field experience would agree with you. The amount of memory people 
are arbitrarily throwing at databases now is pretty significant. It is 
common to have >64GB of memory. Heck, I run into >128GB all the time and 
seeing >192GB is no longer a, "Wow".

JD

-- 
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.

Re: 64-bit XIDs again

От

Gavin Flower

Дата:

30 июля 2015 г., 23:31:49

On 31/07/15 02:24, Heikki Linnakangas wrote:
> On 07/30/2015 04:26 PM, Alexander Korotkov wrote:
>> Also, I think it's possible to migrate to 64-bit XIDs without breaking
>> pg_upgrade. Old tuples can be leaved with 32-bit XIDs while new tuples
>> would be created with 64-bit XIDs. We can use free bits in 
>> t_infomask2 to
>> distinguish old and new formats.
>
> I think we should move to 64-bit XIDs in in-memory structs snapshots, 
> proc array etc. And expand clog to handle 64-bit XIDs. But keep the 
> xmin/xmax fields on heap pages at 32-bits, and add an epoch-like field 
> to the page header so that logically the xmin/xmax fields on the page 
> are 64 bits wide, but physically stored in 32 bits. That's possible as 
> long as no two XIDs on the same page are more than 2^31 XIDs apart. So 
> you still need to freeze old tuples on the page when that's about to 
> happen, but it would make it possible to have more than 2^32 XID 
> transactions in the clog. You'd never be forced to do anti-wraparound 
> vacuums, you could just let the clog grow arbitrarily large.
>
> There is a big downside to expanding xmin/xmax to 64 bits: it takes 
> space. More space means more memory needed for caching, more memory 
> bandwidth, more I/O, etc.
>
> - Heikki
>
>
>
I think having a special case to save 32 bits per tuple would cause 
unnecessary complications, and the savings are minimal compared to the 
size of current modern storage devices and the typical memory used in 
serious database servers.

I think it is too much pain for very little gain, especially when 
looking into the future growth in storage capacity andbandwidth.

The early mainframes used a base displacement technique to keep the size 
of addresses down in instructions: 16 bit addresses, comprising 4 bits 
for a base register and 12 bits for the displacement (hence the use of 
4KB pages sizes now!).  Necessary at the time when mainframes were often 
less than 128 KB!  Now it would ludicrous to do that for modern servers!


Cheers,
Gavin

(Who is ancient enough, to have programmed such MainFrames!)

Re: 64-bit XIDs again

От

Tom Lane

Дата:

31 июля 2015 г., 00:23:32

Gavin Flower <GavinFlower@archidevsys.co.nz> writes:
> On 31/07/15 02:24, Heikki Linnakangas wrote:
>> There is a big downside to expanding xmin/xmax to 64 bits: it takes 
>> space. More space means more memory needed for caching, more memory 
>> bandwidth, more I/O, etc.

> I think having a special case to save 32 bits per tuple would cause 
> unnecessary complications, and the savings are minimal compared to the 
> size of current modern storage devices and the typical memory used in 
> serious database servers.

I think the argument that the savings are minimal is pretty thin.
It all depends on how wide your tables are --- but on a narrow table, say
half a dozen ints, the current tuple size is 24 bytes header plus the same
number of bytes of data.  We'd be going up to 32 bytes header which makes
for a 16% increase in physical table size.  If your table is large,
claiming that 16% doesn't hurt is just silly.

But the elephant in the room is on-disk compatibility.  There is
absolutely no way that we can just change xmin/xmax to 64 bits without a
disk format break.  However, if we do something like what Heikki is
suggesting, it's at least conceivable that we could convert incrementally
(ie, if you find a page with the old header format, assume all tuples in
it are part of epoch 0; and do not insert new tuples into it unless there
is room to convert the header to new format ... but I'm not sure what we
do about tuple deletion if the old page is totally full and we need to
write an xmax that's past 4G).

Only if you are willing to kiss off on-disk compatibility is it even
worth having a discussion about whether we can afford more bloat in
HeapTupleHeader.  And that would be a pretty big pain point for a lot
of users.
        regards, tom lane

Re: 64-bit XIDs again

От

Arthur Silva

Дата:

31 июля 2015 г., 00:24:06

On Thu, Jul 30, 2015 at 5:31 PM, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote:

On 31/07/15 02:24, Heikki Linnakangas wrote:
On 07/30/2015 04:26 PM, Alexander Korotkov wrote:
Also, I think it's possible to migrate to 64-bit XIDs without breaking
pg_upgrade. Old tuples can be leaved with 32-bit XIDs while new tuples
would be created with 64-bit XIDs. We can use free bits in t_infomask2 to
distinguish old and new formats.

I think we should move to 64-bit XIDs in in-memory structs snapshots, proc array etc. And expand clog to handle 64-bit XIDs. But keep the xmin/xmax fields on heap pages at 32-bits, and add an epoch-like field to the page header so that logically the xmin/xmax fields on the page are 64 bits wide, but physically stored in 32 bits. That's possible as long as no two XIDs on the same page are more than 2^31 XIDs apart. So you still need to freeze old tuples on the page when that's about to happen, but it would make it possible to have more than 2^32 XID transactions in the clog. You'd never be forced to do anti-wraparound vacuums, you could just let the clog grow arbitrarily large.

There is a big downside to expanding xmin/xmax to 64 bits: it takes space. More space means more memory needed for caching, more memory bandwidth, more I/O, etc.

- Heikki

I think having a special case to save 32 bits per tuple would cause unnecessary complications, and the savings are minimal compared to the size of current modern storage devices and the typical memory used in serious database servers.

I think it is too much pain for very little gain, especially when looking into the future growth in storage capacity andbandwidth.

The early mainframes used a base displacement technique to keep the size of addresses down in instructions: 16 bit addresses, comprising 4 bits for a base register and 12 bits for the displacement (hence the use of 4KB pages sizes now!). Necessary at the time when mainframes were often less than 128 KB! Now it would ludicrous to do that for modern servers!

Cheers,
Gavin

(Who is ancient enough, to have programmed such MainFrames!)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In the other hand PG tuple overhead is already the largest among the alternatives.
Even if storage keeps getting faster and cheaper stuff you can't ignore the overhead of adding yet another 8bytes to each tuple.

Re: 64-bit XIDs again

От

Josh Berkus

Дата:

31 июля 2015 г., 00:29:57

On 07/30/2015 07:24 AM, Heikki Linnakangas wrote:
> 
> I think we should move to 64-bit XIDs in in-memory structs snapshots,
> proc array etc. And expand clog to handle 64-bit XIDs. But keep the
> xmin/xmax fields on heap pages at 32-bits, and add an epoch-like field
> to the page header so that logically the xmin/xmax fields on the page
> are 64 bits wide, but physically stored in 32 bits. That's possible as
> long as no two XIDs on the same page are more than 2^31 XIDs apart. So
> you still need to freeze old tuples on the page when that's about to
> happen, but it would make it possible to have more than 2^32 XID
> transactions in the clog. You'd never be forced to do anti-wraparound
> vacuums, you could just let the clog grow arbitrarily large

When I introduced the same idea a few years back, having the clog get
arbitrarily large was cited as a major issue.  I was under the
impression that clog size had some major performance impacts.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: 64-bit XIDs again

От

Petr Jelinek

Дата:

31 июля 2015 г., 01:27:23

On 2015-07-30 23:23, Tom Lane wrote:
> Gavin Flower <GavinFlower@archidevsys.co.nz> writes:
>> On 31/07/15 02:24, Heikki Linnakangas wrote:
>>> There is a big downside to expanding xmin/xmax to 64 bits: it takes
>>> space. More space means more memory needed for caching, more memory
>>> bandwidth, more I/O, etc.
>
>> I think having a special case to save 32 bits per tuple would cause
>> unnecessary complications, and the savings are minimal compared to the
>> size of current modern storage devices and the typical memory used in
>> serious database servers.
>
> I think the argument that the savings are minimal is pretty thin.
> It all depends on how wide your tables are --- but on a narrow table, say
> half a dozen ints, the current tuple size is 24 bytes header plus the same
> number of bytes of data.  We'd be going up to 32 bytes header which makes
> for a 16% increase in physical table size.  If your table is large,
> claiming that 16% doesn't hurt is just silly.
>
> But the elephant in the room is on-disk compatibility.  There is
> absolutely no way that we can just change xmin/xmax to 64 bits without a
> disk format break.  However, if we do something like what Heikki is
> suggesting, it's at least conceivable that we could convert incrementally
> (ie, if you find a page with the old header format, assume all tuples in
> it are part of epoch 0; and do not insert new tuples into it unless there
> is room to convert the header to new format ...

We could theoretically do similar thing with 64bit xmin/xmax though - 
detect page is in old format and convert all tuples there to 64bit 
xmin/xmax.

But I agree that we don't want to increase bloat per tuple as it's 
already too big.

> but I'm not sure what we
> do about tuple deletion if the old page is totally full and we need to
> write an xmax that's past 4G).
>

If the page is too full we could move some data to different (or new) page.

For me bigger issue is that we'll still have to "refreeze" pages because 
if tuples are updated or deleted in different epoch than the one they 
were inserted in, the new version of tuple has to go to different page 
and the old page will have free space that can't be used by new tuples 
since the system is now in different epoch.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services

Re: 64-bit XIDs again

От

Gurjeet Singh

Дата:

31 июля 2015 г., 09:22:52

<p dir="ltr"><br /> On Jul 30, 2015 2:23 PM, "Tom Lane" <<a
href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>>wrote:<br /> ><br /> > Gavin Flower <<a
href="mailto:GavinFlower@archidevsys.co.nz">GavinFlower@archidevsys.co.nz</a>>writes:<br /> > > On 31/07/15
02:24,Heikki Linnakangas wrote:<br /> > >> There is a big downside to expanding xmin/xmax to 64 bits: it
takes<br/> > >> space. More space means more memory needed for caching, more memory<br /> > >>
bandwidth,more I/O, etc.<br /> ><br /> > > I think having a special case to save 32 bits per tuple would
cause<br/> > > unnecessary complications, and the savings are minimal compared to the<br /> > > size of
currentmodern storage devices and the typical memory used in<br /> > > serious database servers.<br /> ><br />
>I think the argument that the savings are minimal is pretty thin.<br /> > It all depends on how wide your tables
are--- but on a narrow table, say<br /> > half a dozen ints, the current tuple size is 24 bytes header plus the
same<br/> > number of bytes of data.  We'd be going up to 32 bytes header which makes<br /> > for a 16% increase
inphysical table size.  If your table is large,<br /> > claiming that 16% doesn't hurt is just silly.<br /> ><br
/>> But the elephant in the room is on-disk compatibility.  There is<br /> > absolutely no way that we can just
changexmin/xmax to 64 bits without a<br /> > disk format break.  However, if we do something like what Heikki is<br
/>> suggesting, it's at least conceivable that we could convert incrementally<br /> > (ie, if you find a page
withthe old header format, assume all tuples in<br /> > it are part of epoch 0; and do not insert new tuples into it
unlessthere<br /> > is room to convert the header to new format ... but I'm not sure what we<br /> > do about
tupledeletion if the old page is totally full and we need to<br /> > write an xmax that's past 4G).<p dir="ltr">Can
wesafely relegate the responsibility of tracking the per block epoch to a relation fork?

Re: 64-bit XIDs again

От

Heikki Linnakangas

Дата:

31 июля 2015 г., 09:32:55

On 07/31/2015 09:22 AM, Gurjeet Singh wrote:
> On Jul 30, 2015 2:23 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
>> But the elephant in the room is on-disk compatibility.  There is
>> absolutely no way that we can just change xmin/xmax to 64 bits without a
>> disk format break.  However, if we do something like what Heikki is
>> suggesting, it's at least conceivable that we could convert incrementally
>> (ie, if you find a page with the old header format, assume all tuples in
>> it are part of epoch 0; and do not insert new tuples into it unless there
>> is room to convert the header to new format ... but I'm not sure what we
>> do about tuple deletion if the old page is totally full and we need to
>> write an xmax that's past 4G).
>
> Can we safely relegate the responsibility of tracking the per block epoch
> to a relation fork?

Sounds complicated and fragile. I would rather attack the page version 
problem head on.

- Heikki

Re: 64-bit XIDs again

От

Alexander Korotkov

Дата:

31 июля 2015 г., 12:16:48

On Fri, Jul 31, 2015 at 1:27 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:

On 2015-07-30 23:23, Tom Lane wrote:
But the elephant in the room is on-disk compatibility. There is
absolutely no way that we can just change xmin/xmax to 64 bits without a
disk format break. However, if we do something like what Heikki is
suggesting, it's at least conceivable that we could convert incrementally
(ie, if you find a page with the old header format, assume all tuples in
it are part of epoch 0; and do not insert new tuples into it unless there
is room to convert the header to new format ...

We could theoretically do similar thing with 64bit xmin/xmax though - detect page is in old format and convert all tuples there to 64bit xmin/xmax.

But I agree that we don't want to increase bloat per tuple as it's already too big.

but I'm not sure what we
do about tuple deletion if the old page is totally full and we need to
write an xmax that's past 4G).

If the page is too full we could move some data to different (or new) page.

For me bigger issue is that we'll still have to "refreeze" pages because if tuples are updated or deleted in different epoch than the one they were inserted in, the new version of tuple has to go to different page and the old page will have free space that can't be used by new tuples since the system is now in different epoch.

It is not so easy to move heap tuple to the different page. When table has indexes each tuple is referenced by index tuples as (blockNumber; offset). And we can't remove these references without vacuum. Thus, we would have to invent something like multipage HOT in order to move tuples between pages. And that seems to be a complicated kludge.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com

The Russian Postgres Company

Re: 64-bit XIDs again

От

Alexander Korotkov

Дата:

31 июля 2015 г., 16:01:47

On Fri, Jul 31, 2015 at 12:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

But the elephant in the room is on-disk compatibility. There is
absolutely no way that we can just change xmin/xmax to 64 bits without a
disk format break.

That seems problematic. But I'm not yet convinced that there is absolutely no way to do this.

However, if we do something like what Heikki is
suggesting, it's at least conceivable that we could convert incrementally
(ie, if you find a page with the old header format, assume all tuples in
it are part of epoch 0; and do not insert new tuples into it unless there
is room to convert the header to new format ... but I'm not sure what we
do about tuple deletion if the old page is totally full and we need to
write an xmax that's past 4G).

If use upgrade database cluster with pg_upgrade, he would stop old postmaster, pg_upgrade, start new postmaster. That means we start from the point when there is no running transactions. Thus, between tuples of old format there are two kinds: visible for everybody and invisible for everybody. When update or delete old tuple of first kind, we actually don't need to store its xmin anymore. We can store 64bit xmax in the place of xmin/xmax.

So, in order to switch to 64bit xmin/xmax, we have to take both free bits form t_infomask2 in order to implements it. They should indicate one of 3 possible tuple formats:

1) Old format: both xmin/xmax are 32bit

2) Intermediate format: xmax is 64bit, xmin is frozen.

3) New format: both xmin/xmax are 64bit.

But we can use same idea to implement epoch in heap page header as well. If new page header doesn't fits the page, then we don't have to insert something to this page, we just need to set xmax and flags to existing tuples. Then we can use two format from listed above: #1 and #2, and take one free bit from t_infomask2 for format indication.

Probably I'm missing something, but I think keeping on-disk compatibility should be somehow possible.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com

The Russian Postgres Company

Re: 64-bit XIDs again

От

Simon Riggs

Дата:

31 июля 2015 г., 16:50:48

On 31 July 2015 at 11:00, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:

If use upgrade database cluster with pg_upgrade, he would stop old postmaster, pg_upgrade, start new postmaster. That means we start from the point when there is no running transactions. Thus, between tuples of old format there are two kinds: visible for everybody and invisible for everybody. When update or delete old tuple of first kind, we actually don't need to store its xmin anymore. We can store 64bit xmax in the place of xmin/xmax.

So, in order to switch to 64bit xmin/xmax, we have to take both free bits form t_infomask2 in order to implements it. They should indicate one of 3 possible tuple formats:
1) Old format: both xmin/xmax are 32bit
2) Intermediate format: xmax is 64bit, xmin is frozen.
3) New format: both xmin/xmax are 64bit.

But we can use same idea to implement epoch in heap page header as well. If new page header doesn't fits the page, then we don't have to insert something to this page, we just need to set xmax and flags to existing tuples. Then we can use two format from listed above: #1 and #2, and take one free bit from t_infomask2 for format indication.

I think we can do it by treating the page level epoch as a means of compression, rather than as a barrier which is how I first saw it.

New Page Format

New Page format has a page-level epoch.

First tuple inserted onto a block sets the page epoch. For later inserts, we check whether the current epoch matches the page epoch. If it doesn't, we try to freeze the page. If all tuples can be frozen on the page, we can then reset the page level epoch as part of our insert. If we can't then freeze all tuples on the page, we extend the relation to allow us to add a new page with current epoch on it. (We can't easily track which blocks have which epoch).

If an Update or Deletes sees a tuple from a prior epoch, we will try to freeze the tuple. If we can, then we reuse xmin as the xmax's epoch. If we can't we have problems and need a complex mechanism to avoid problems. I think it won't be necessary to invent that in the first release, we will just assume it is possible.

Current Pages

Current pages don't have an epoch, so we store a base epoch in the controlfile so we remember how to interpret them.

We don't create any new pages with this page format. For later inserts, we check whether the current epoch matches the page epoch. If it doesn't, we check whether its possible to rewrite the whole page to new format, freezing as we go. If that is not possible, we extend the relation to allow us to add a new page with current epoch on it. (We can't easily track which blocks have which epoch).

If an Update or Deletes sees a tuple from a prior epoch, we will try to freeze the tuple. If we can, then we reuse xmin as the xmax's epoch.

I don't think we need any new tuple formats to do this.

This means we have

* changes to allow new bufpage format

* changes in hio.c for page selection

* changes to allow xmin to be reused when freeze bit set

Very little additional path length in the common case.

--

Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: 64-bit XIDs again

От

Robert Haas

Дата:

31 июля 2015 г., 22:00:55

On Thu, Jul 30, 2015 at 5:23 PM, Arthur Silva <arthurprs@gmail.com> wrote:
> In the other hand PG tuple overhead is already the largest among the
> alternatives.
> Even if storage keeps getting faster and cheaper stuff you can't ignore the
> overhead of adding yet another 8bytes to each tuple.

+1, very much.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: 64-bit XIDs again

От

Heikki Linnakangas

Дата:

01 августа 2015 г., 00:47:05

On 07/31/2015 12:29 AM, Josh Berkus wrote:
> On 07/30/2015 07:24 AM, Heikki Linnakangas wrote:
>>
>> I think we should move to 64-bit XIDs in in-memory structs snapshots,
>> proc array etc. And expand clog to handle 64-bit XIDs. But keep the
>> xmin/xmax fields on heap pages at 32-bits, and add an epoch-like field
>> to the page header so that logically the xmin/xmax fields on the page
>> are 64 bits wide, but physically stored in 32 bits. That's possible as
>> long as no two XIDs on the same page are more than 2^31 XIDs apart. So
>> you still need to freeze old tuples on the page when that's about to
>> happen, but it would make it possible to have more than 2^32 XID
>> transactions in the clog. You'd never be forced to do anti-wraparound
>> vacuums, you could just let the clog grow arbitrarily large
>
> When I introduced the same idea a few years back, having the clog get
> arbitrarily large was cited as a major issue.  I was under the
> impression that clog size had some major performance impacts.

Well, sure, if you don't want the clog to grow arbitrarily large, then 
you need to freeze. And most people would want to freeze regularly, to 
keep the clog size in check. The point is that you wouldn't *have* to do 
so at any particular time. You would never be up against the wall, in 
the "you must freeze now or your database will shut down" situation.

I'm not sure what performance impact a very large clog might have. It 
takes some disk space (1 GB per 4 billion XIDs), and caching it takes 
some memory. And there is a small fixed number of CLOG buffers in shared 
memory. But I don't think there's any particularly nasty problem there.

- Heikki

Re: 64-bit XIDs again

От

Josh Berkus

Дата:

01 августа 2015 г., 03:19:07

On 07/31/2015 02:46 PM, Heikki Linnakangas wrote:
> On 07/31/2015 12:29 AM, Josh Berkus wrote:
>> On 07/30/2015 07:24 AM, Heikki Linnakangas wrote:
>>>
>>> I think we should move to 64-bit XIDs in in-memory structs snapshots,
>>> proc array etc. And expand clog to handle 64-bit XIDs. But keep the
>>> xmin/xmax fields on heap pages at 32-bits, and add an epoch-like field
>>> to the page header so that logically the xmin/xmax fields on the page
>>> are 64 bits wide, but physically stored in 32 bits. That's possible as
>>> long as no two XIDs on the same page are more than 2^31 XIDs apart. So
>>> you still need to freeze old tuples on the page when that's about to
>>> happen, but it would make it possible to have more than 2^32 XID
>>> transactions in the clog. You'd never be forced to do anti-wraparound
>>> vacuums, you could just let the clog grow arbitrarily large
>>
>> When I introduced the same idea a few years back, having the clog get
>> arbitrarily large was cited as a major issue.  I was under the
>> impression that clog size had some major performance impacts.
> 
> Well, sure, if you don't want the clog to grow arbitrarily large, then
> you need to freeze. And most people would want to freeze regularly, to
> keep the clog size in check. The point is that you wouldn't *have* to do
> so at any particular time. You would never be up against the wall, in
> the "you must freeze now or your database will shut down" situation.

Well, we still have to freeze *eventually*.  Just not for 122,000 years
at current real transaction rates.  In 2025, though, we'll be having
this conversation again because of people doing 100 billion transactions
per second. ;-)

> I'm not sure what performance impact a very large clog might have. It
> takes some disk space (1 GB per 4 billion XIDs), and caching it takes
> some memory. And there is a small fixed number of CLOG buffers in shared
> memory. But I don't think there's any particularly nasty problem there.

Well, one way to find out, clearly.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: 64-bit XIDs again

От

Tom Lane

Дата:

01 августа 2015 г., 03:27:46

Josh Berkus <josh@agliodbs.com> writes:
> On 07/31/2015 02:46 PM, Heikki Linnakangas wrote:
>> Well, sure, if you don't want the clog to grow arbitrarily large, then
>> you need to freeze. And most people would want to freeze regularly, to
>> keep the clog size in check. The point is that you wouldn't *have* to do
>> so at any particular time. You would never be up against the wall, in
>> the "you must freeze now or your database will shut down" situation.

> Well, we still have to freeze *eventually*.  Just not for 122,000 years
> at current real transaction rates.  In 2025, though, we'll be having
> this conversation again because of people doing 100 billion transactions
> per second. ;-)

Well, we'd wrap the 64-bit WAL position counters well before we wrap
64-bit TIDs ... and there is no code to support wraparound in WAL LSNs.
        regards, tom lane

Re: 64-bit XIDs again

От

Simon Riggs

Дата:

01 августа 2015 г., 10:42:10

On 31 July 2015 at 22:46, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 07/31/2015 12:29 AM, Josh Berkus wrote:
On 07/30/2015 07:24 AM, Heikki Linnakangas wrote:

You'd never be forced to do anti-wraparound
vacuums, you could just let the clog grow arbitrarily large

When I introduced the same idea a few years back, having the clog get
arbitrarily large was cited as a major issue. I was under the
impression that clog size had some major performance impacts.

Well, sure, if you don't want the clog to grow arbitrarily large, then you need to freeze.

This statement isn't quite right, things are better than that.

We don't need to freeze in order to shrink the clog, we just need to hint and thereby ensure we move forwards the lowest unhinted xid. That does involve scanning, but doesn't need to scan indexes. That scan won't produce anywhere near as much additional WAL traffic or I/O.

In practice, larger clog would only happen with higher transaction rate, which means more system resources, so I don't think its too much of a problem overall.

--

Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: 64-bit XIDs again

От

Simon Riggs

Дата:

11 августа 2015 г., 18:28:19

On 30 July 2015 at 14:26, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:

As I mentioned in CSN thread, it would be nice to replace XID with CSN when setting hint bits for tuple. In this case when hint bits are set we don't need any additional lookups to check visibility.
http://www.postgresql.org/message-id/CAPpHfdv7BMwGv=OfUg3S-jGVFKqHi79pR_ZK1Wsk-13oZ+cy5g@mail.gmail.com
Introducing 32-bit CSN doesn't seem reasonable for me, because it would double our troubles with wraparound.

Your idea to replace XIDs with CSNs instead of hinting them was a good one. It removes the extra-lookup we thought we needed to check visibility with CSN snapshots.

I agree 32-bit CSNs would not be a good idea though, a 64-bit CSN is needed.

If we break a CSN down into an Epoch and a 32-bit value then it becomes more easily possible. The Epoch for XID and CSN can be the same - whichever wraps first we just increment the Epoch.

By doing this we can reuse the page-level epoch for both XID and CSN. Now hinting a tuple is just replacing a 32-bit XID with a 32-bit CSN.

We would probably need an extra flag bit for the case where the CSN is one epoch later than the XID.

--

Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services