Обсуждение: why pg_class.relfrozenxid needs to be updated for frozen tables (where all rows have xmin=2)?
Hi,
I have to deal with badly written system which regularly suffers from transaction wraparound issue. This issue is happenning every 10-14 days and forces me to take system offline and vacuum in single-user mode.Basically the database stores events data in daily partitioned table "daily_events".
What I did, was - I ran vaccum freeze on all partitions (the tables are never touched after they're done for a day). I have also scheduled vacuum-freeze for a partition after it's done writing.On 1/30/15 5:44 PM, Slava Mudry wrote: > Hi, > I have to deal with badly written system which regularly suffers from > transaction wraparound issue. This issue is happenning every 10-14 days > and forces me to take system offline and vacuum in single-user mode. > Main causes for this are (afaik): > 1) heavy transaction traffic + 100+GB of stale tables > 2) slow i/o (rotational drives) > 3) autovacuum can't keep up. > > Basically the database stores events data in daily partitioned table > "daily_events". > What I did, was - I ran vaccum freeze on all partitions (the tables are > never touched after they're done for a day). I have also scheduled > vacuum-freeze for a partition after it's done writing. > > This essentially set xmin in each partition to "frozen" value of "2". > However, to my surprise, this was not enough! > Postgres stores relfrozenxid in pg_class and this value apparently is > getting old pretty fast (due to high volume of transactions). > And it seems that it doesn't really matter that xmin is frozen for a > table, the relfrozenxid is what causing transaction wraparound. relfrozenxid is only part of the picture. A database-wide freeze vacuum will be controlled by pg_database.datfrozenxid. What version is this? You may also be suffering from multixact wrap. > Why is that? and most importantly - why updating pg_class.relfrozenxid > requires huge amount of i/o by vacuum process for tables that are never > updated? Because it has to scan the entire table to see what the oldest XID is. We don't check to see if relfrozenxid is already 2, though I suppose we could add that. > Is it safe to just update pg_class.relfrozenxid for tables where xmin=2 > for all rows? Same for linked toast table? That would be a great way to lose data... You need to look at relations where relfrozenxid is >= 3 and see why relfrozenxid isn't advancing fast enough on them. Check your cost delay settings as well as the *freeze* settings. It's very likely that on a system this busy autovac would never keep up with default settings. Also, keep in mind that transaction and multixact IDs are cluster-wide, so this is going to affect all databases in that instance. You should think about ways to move the heaviest transaction workload to a separate cluster; possibly putting the raw updates there and having a separate process that aggregates that data into fewer transactions for the main cluster. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
On 2/2/15 7:01 PM, Slava Mudry wrote: Please don't top-post. It's much better to answer questions inline in an email. > I am running PostgreSQL 9.3.2 on linux. Freeze values are defaults. > We cannot rely on autovacuum on our current hardware, so it's turned > down to 2 workers with naptime=10min. But we are running weekly vacuum > on whole db and daily vacuum freeze on new partitions. While you may not be able to rely on autovac for all your needs, changing the naptime is unlikely to help much, unless you have an extremely large number of tables (like, 100k or more). > I agree that system is designed pretty bad in a way that it creates high > transaction volume instead of batching updates/inserts. However I still > feel that postgres should be able to do something more optimal in such > cases. > > Currently the fact that it needs to go back to old tables and FTS them > every 2B transactions (or rely on autovacuum for this) and you can't do > anything about it (like permanently freeze the tables) seems like a big > scalability issue. Does it not? Unfortunately it's not terribly easy to fix this. The problem is if we try to play games here, we must have a 100% reliable method for changing relfrozenxid as soon as someone inserts a new tuple in the relation. It might be possible to tie this into the visibility map, but no one has looked at this yet. Perhaps you'd be willing to investigate this, or sponsor the work? > Thank you. > > On Fri, Jan 30, 2015 at 5:28 PM, Jim Nasby <Jim.Nasby@bluetreble.com > <mailto:Jim.Nasby@bluetreble.com>> wrote: > > On 1/30/15 5:44 PM, Slava Mudry wrote: > > Hi, > I have to deal with badly written system which regularly suffers > from > transaction wraparound issue. This issue is happenning every > 10-14 days > and forces me to take system offline and vacuum in single-user mode. > Main causes for this are (afaik): > 1) heavy transaction traffic + 100+GB of stale tables > 2) slow i/o (rotational drives) > 3) autovacuum can't keep up. > > Basically the database stores events data in daily partitioned table > "daily_events". > What I did, was - I ran vaccum freeze on all partitions (the > tables are > never touched after they're done for a day). I have also scheduled > vacuum-freeze for a partition after it's done writing. > > This essentially set xmin in each partition to "frozen" value of > "2". > However, to my surprise, this was not enough! > Postgres stores relfrozenxid in pg_class and this value > apparently is > getting old pretty fast (due to high volume of transactions). > And it seems that it doesn't really matter that xmin is frozen for a > table, the relfrozenxid is what causing transaction wraparound. > > > relfrozenxid is only part of the picture. A database-wide freeze > vacuum will be controlled by pg_database.datfrozenxid. > > What version is this? You may also be suffering from multixact wrap. > > Why is that? and most importantly - why updating > pg_class.relfrozenxid > requires huge amount of i/o by vacuum process for tables that > are never > updated? > > > Because it has to scan the entire table to see what the oldest XID > is. We don't check to see if relfrozenxid is already 2, though I > suppose we could add that. > > Is it safe to just update pg_class.relfrozenxid for tables where > xmin=2 > for all rows? Same for linked toast table? > > > That would be a great way to lose data... > > You need to look at relations where relfrozenxid is >= 3 and see why > relfrozenxid isn't advancing fast enough on them. Check your cost > delay settings as well as the *freeze* settings. It's very likely > that on a system this busy autovac would never keep up with default > settings. > > Also, keep in mind that transaction and multixact IDs are > cluster-wide, so this is going to affect all databases in that > instance. You should think about ways to move the heaviest > transaction workload to a separate cluster; possibly putting the raw > updates there and having a separate process that aggregates that > data into fewer transactions for the main cluster. > -- > Jim Nasby, Data Architect, Blue Treble Consulting > Data in Trouble? Get it in Treble! http://BlueTreble.com > > -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
On 2/2/15 7:36 PM, Jim Nasby wrote: >> >> Currently the fact that it needs to go back to old tables and FTS them >> every 2B transactions (or rely on autovacuum for this) and you can't do >> anything about it (like permanently freeze the tables) seems like a big >> scalability issue. Does it not? > > Unfortunately it's not terribly easy to fix this. The problem is if we > try to play games here, we must have a 100% reliable method for changing > relfrozenxid as soon as someone inserts a new tuple in the relation. It > might be possible to tie this into the visibility map, but no one has > looked at this yet. > > Perhaps you'd be willing to investigate this, or sponsor the work? Oh, there is another possibility that's been discussed: read-only tables. If we had the ability to mark a table read-only, then a VACUUM FREEZE on such a table would be able to set that table's relfrozenxid to FrozenTransactionId and prevent any further attempts at vacuuming. This might be easier than trying to do something automatic. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
On Mon, Feb 2, 2015 at 5:52 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
Having read-only tables would be great.
On 2/2/15 7:36 PM, Jim Nasby wrote:
Currently the fact that it needs to go back to old tables and FTS them
every 2B transactions (or rely on autovacuum for this) and you can't do
anything about it (like permanently freeze the tables) seems like a big
scalability issue. Does it not?
Unfortunately it's not terribly easy to fix this. The problem is if we
try to play games here, we must have a 100% reliable method for changing
relfrozenxid as soon as someone inserts a new tuple in the relation. It
might be possible to tie this into the visibility map, but no one has
looked at this yet.
Perhaps you'd be willing to investigate this, or sponsor the work?
I'll see what I can do. Will talk to folks at pgDay in a month.
 
Oh, there is another possibility that's been discussed: read-only tables. If we had the ability to mark a table read-only, then a VACUUM FREEZE on such a table would be able to set that table's relfrozenxid to FrozenTransactionId and prevent any further attempts at vacuuming. This might be easier than trying to do something automatic.
I think if we could log "last update/delete/insert" timestamp for a table - we could use that to freeze tables that are not changed.
I also wonder how pg_database.datfrozenxid is set? Is it equal to the oldest pg_class.relfrozenxid for that database?
I also wonder how pg_database.datfrozenxid is set? Is it equal to the oldest pg_class.relfrozenxid for that database?
I ask because I am willing to give a try and update relfrozenxid for the tables that are never updated and frozen. Currently we are looking at 8-hour downtime to vacuum the whole db in single-user mode. High availability is more important that data loss in my case. [I still don't want to lose data, but it won't be the end of world if it happens].
Having read-only tables would be great.
I was able to get great performance from unlogged tables, similarly read-only tables would be able to address issue with high-transactions and many large stale tables.
 
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
On 2/2/15 9:37 PM, Slava Mudry wrote: > > On Mon, Feb 2, 2015 at 5:52 PM, Jim Nasby <Jim.Nasby@bluetreble.com > <mailto:Jim.Nasby@bluetreble.com>> wrote: > > On 2/2/15 7:36 PM, Jim Nasby wrote: > > > Currently the fact that it needs to go back to old tables > and FTS them > every 2B transactions (or rely on autovacuum for this) and > you can't do > anything about it (like permanently freeze the tables) seems > like a big > scalability issue. Does it not? > > > Unfortunately it's not terribly easy to fix this. The problem is > if we > try to play games here, we must have a 100% reliable method for > changing > relfrozenxid as soon as someone inserts a new tuple in the > relation. It > might be possible to tie this into the visibility map, but no > one has > looked at this yet. > > Perhaps you'd be willing to investigate this, or sponsor the work? > > I'll see what I can do. Will talk to folks at pgDay in a month. > > > Oh, there is another possibility that's been discussed: read-only > tables. If we had the ability to mark a table read-only, then a > VACUUM FREEZE on such a table would be able to set that table's > relfrozenxid to FrozenTransactionId and prevent any further attempts > at vacuuming. This might be easier than trying to do something > automatic. > > I think if we could log "last update/delete/insert" timestamp for a > table - we could use that to freeze tables that are not changed. A timestamp wouldn't work; you need to have an exact XID. Even if it did work you still have the same problem: there's a huge, hairy race condition between what vacuum is trying to do and any DML. > I also wonder how pg_database.datfrozenxid is set? Is it equal to the > oldest pg_class.relfrozenxid for that database? Correct. > I ask because I am willing to give a try and update relfrozenxid for the > tables that are never updated and frozen. Currently we are looking at > 8-hour downtime to vacuum the whole db in single-user mode. High > availability is more important that data loss in my case. [I still don't > want to lose data, but it won't be the end of world if it happens]. Why are you trying to go into single user mode? There's no reason to do that. Forcing relfrozenxid to 2 might work, but you're certainly playing with fire. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com