Обсуждение: Dump/Reload pg_statistic to cut time from pg_upgrade?
Admins; Planning to pg_upgrade some large (3TB) clusters using hard link method. Run time for the upgrade itself takes around 5 minutes. Nice!! Origin version 8.4 and destination version 9.1. Unfortunately the post-upgrade analyze of the entire cluster is going to take a minimum of 1.5 hours running several threads to analyze all tables. This was measured in an R&D environment. Got to thinking, what if we make a SAN snapshot a few hours prior to upgrade time, upgrade that... then analyze it and then dump the stats table? The assumption is that the DB will not have churned very much between then and when the real upgrade is performed and the stats saved from a few hours earlier might be good enough to go live with. Once the real upgrade is done we'd populate the stats table of it using the saved stats. I realize that there may be rows in the reloaded stats table corresponding to temporary objects that may not exist then, and would have to deal with same. I'd immediately begin analyzing the tables as to have the best stats once this is complete but being able to turn out a working production instance in like 10 minutes rather than 2 hours is the goal here. Anyway, perhaps there are other good reasons I should *not* attempt this but it turns out that the stats table can't be reloaded with it's own dump so this of course is a show-stopper. psql:d:456: ERROR: cannot accept a value of type anyarray CONTEXT: COPY pg_statistic, line 1, column stavalues1: "{_assets,_income,_liabilities,assets,income,liabilities}" Setting allow_system_table_mods to 'on' is required also prior to attempting the import. Has anyone else attempted anything similar? Any feedback is appreciated. Thanks -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consulting@comcast.net p: 312.241.7800
Jerry Sievers <gsievers19@comcast.net> wrote: > Planning to pg_upgrade some large (3TB) clusters using hard link > method. Run time for the upgrade itself takes around 5 minutes. > Nice!! Origin version 8.4 and destination version 9.1. > > Unfortunately the post-upgrade analyze of the entire cluster is going > to take a minimum of 1.5 hours running several threads to analyze all > tables. This was measured in an R&D environment. > > Got to thinking, what if we make a SAN snapshot a few hours prior to > upgrade time, upgrade that... then analyze it and then dump the stats > table? > Has anyone else attempted anything similar? Any feedback is > appreciated. I certainly understand the motivation, and it may be a good option if you test carefully beforehand. What I have done in a similar situation, to minimize down time, is to run a database ANALYZE with a very small target. I forget the particular value we used -- it may have been 3. Then ran an ANALYZE with the normal default target on a few key tables (identified in testing to be the ones eating most of the scan time with no statistics), and let users in. The database ANALYZE with the normal default target was done while normal production hit the database, without too much of a performance hit. With this technique we were able to let users in with near-normal performance with 10 or 15 minutes of down time rather than hours. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> writes: > Jerry Sievers <gsievers19@comcast.net> wrote: > >> Planning to pg_upgrade some large (3TB) clusters using hard link >> method. Run time for the upgrade itself takes around 5 minutes. >> Nice!! Origin version 8.4 and destination version 9.1. >> >> Unfortunately the post-upgrade analyze of the entire cluster is going >> to take a minimum of 1.5 hours running several threads to analyze all >> tables. This was measured in an R&D environment. >> >> Got to thinking, what if we make a SAN snapshot a few hours prior to >> upgrade time, upgrade that... then analyze it and then dump the stats >> table? > >> Has anyone else attempted anything similar? Any feedback is >> appreciated. > > I certainly understand the motivation, and it may be a good option > if you test carefully beforehand. > > What I have done in a similar situation, to minimize down time, is > to run a database ANALYZE with a very small target. I forget the > particular value we used -- it may have been 3. Then ran an > ANALYZE with the normal default target on a few key tables > (identified in testing to be the ones eating most of the scan time > with no statistics), and let users in. The database ANALYZE with > the normal default target was done while normal production hit the > database, without too much of a performance hit. With this > technique we were able to let users in with near-normal performance > with 10 or 15 minutes of down time rather than hours. Thanks Kevin! In fact, I've conceived of this solution route already and may have to resort to it. We do run with default_statistics_target set fairly high at 1000 (legacy setting here) without my knowing for certain that it's required across the board (and most likely is not). Curious though if it's known that the pg_statistic table can't be reloded from it's own pg_dump due to that error that I highlighted in the original post. Agree that being able to manually load pg_statistic is of questionable usefulness though perhaps viable under the right circumstances. > -- > Kevin Grittner > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > -- Jerry Sievers e: jerry.sievers@comcast.net p: 312.241.7800
Jerry Sievers <gsievers19@comcast.net> writes: > Kevin Grittner <kgrittn@ymail.com> writes: >> Jerry Sievers <gsievers19@comcast.net> wrote: >>> Planning to pg_upgrade some large (3TB) clusters using hard link >>> method.� Run time for the upgrade itself takes around 5 minutes. >>> Unfortunately the post-upgrade analyze of the entire cluster is going >>> to take a minimum of 1.5 hours running several threads to analyze all >>> tables.� This was measured in an R&D environment. At least for some combinations of source and destination server versions, it seems like it ought to be possible for pg_upgrade to just move the old cluster's pg_statistic tables over to the new, as though they were user data. pg_upgrade takes pains to preserve relation OIDs and attnums, so the key values should be compatible. Except in releases where we've added physical columns to pg_statistic or made a non-backward-compatible redefinition of statistics meanings, it seems like this should Just Work. In cases where it doesn't work, pg_dump and reload of that table would not work either (even without the anyarray problem). regards, tom lane
On Wed, Jul 10, 2013 at 10:47:33AM -0500, Jerry Sievers wrote: > > What I have done in a similar situation, to minimize down time, is > > to run a database ANALYZE with a very small target. I forget the > > particular value we used -- it may have been 3. Then ran an > > ANALYZE with the normal default target on a few key tables > > (identified in testing to be the ones eating most of the scan time > > with no statistics), and let users in. The database ANALYZE with > > the normal default target was done while normal production hit the > > database, without too much of a performance hit. With this > > technique we were able to let users in with near-normal performance > > with 10 or 15 minutes of down time rather than hours. > > Thanks Kevin! In fact, I've conceived of this solution route already > and may have to resort to it. We do run with > default_statistics_target set fairly high at 1000 (legacy setting > here) without my knowing for certain that it's required across the > board (and most likely is not). > > Curious though if it's known that the pg_statistic table can't be > reloded from it's own pg_dump due to that error that I highlighted in > the original post. > > Agree that being able to manually load pg_statistic is of questionable > usefulness though perhaps viable under the right circumstances. 9.2 already creates an incremental statistics script called analyze_new_cluster.sh: Make pg_upgrade create a script to incrementally generate more accurate optimizer statistics (Bruce Momjian) This reduces the time needed to generate minimal cluster statistics after an upgrade. It runs with a 1/10/default statistics target. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Wed, Jul 10, 2013 at 12:46:26PM -0400, Tom Lane wrote: > Jerry Sievers <gsievers19@comcast.net> writes: > > Kevin Grittner <kgrittn@ymail.com> writes: > >> Jerry Sievers <gsievers19@comcast.net> wrote: > >>> Planning to pg_upgrade some large (3TB) clusters using hard link > >>> method.� Run time for the upgrade itself takes around 5 minutes. > >>> Unfortunately the post-upgrade analyze of the entire cluster is going > >>> to take a minimum of 1.5 hours running several threads to analyze all > >>> tables.� This was measured in an R&D environment. > > At least for some combinations of source and destination server > versions, it seems like it ought to be possible for pg_upgrade to just > move the old cluster's pg_statistic tables over to the new, as though > they were user data. pg_upgrade takes pains to preserve relation OIDs > and attnums, so the key values should be compatible. Except in > releases where we've added physical columns to pg_statistic or made a > non-backward-compatible redefinition of statistics meanings, it seems > like this should Just Work. In cases where it doesn't work, pg_dump > and reload of that table would not work either (even without the > anyarray problem). Yes, we could certainly do that, but since 9.2 creates an incremental statistics build script, I need someone to say that is too slow before I code up something more complex. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +