Обсуждение: Dump/Reload pg_statistic to cut time from pg_upgrade?

Поиск
Список
Период
Сортировка

Dump/Reload pg_statistic to cut time from pg_upgrade?

От
Jerry Sievers
Дата:
Admins;

Planning to pg_upgrade some large (3TB) clusters using hard link
method.  Run time for the upgrade itself takes around 5 minutes.
Nice!!  Origin version 8.4 and destination version 9.1.

Unfortunately the post-upgrade analyze of the entire cluster is going
to take a minimum of 1.5 hours running several threads to analyze all
tables.  This was measured in an R&D environment.

Got to thinking, what if we make a SAN snapshot a few hours prior to
upgrade time, upgrade that... then analyze it and then dump the stats
table?

The assumption is that the DB will not have churned very much between
then and when the real upgrade is performed and  the stats saved from
a few hours earlier might be good enough to go live with.

Once the real upgrade is done we'd populate the stats table of it
using the saved stats.  I realize that there may be rows in the
reloaded stats table corresponding to temporary objects that may not
exist then, and would have to deal with same.

I'd immediately begin analyzing the tables as to have the best stats
once this is complete but being able to turn out a working production
instance in like 10 minutes rather than 2 hours is the goal here.

Anyway, perhaps there are other good reasons I should *not* attempt
this but it turns out that the stats table can't be reloaded with it's
own dump so this of course is a show-stopper.

psql:d:456: ERROR:  cannot accept a value of type anyarray
CONTEXT:  COPY pg_statistic, line 1, column stavalues1: "{_assets,_income,_liabilities,assets,income,liabilities}"

Setting allow_system_table_mods to 'on' is required also prior to
attempting the import.

Has anyone else attempted anything similar?  Any feedback is
appreciated.

Thanks

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 312.241.7800


Re: Dump/Reload pg_statistic to cut time from pg_upgrade?

От
Kevin Grittner
Дата:
Jerry Sievers <gsievers19@comcast.net> wrote:

> Planning to pg_upgrade some large (3TB) clusters using hard link
> method.  Run time for the upgrade itself takes around 5 minutes.
> Nice!!  Origin version 8.4 and destination version 9.1.
>
> Unfortunately the post-upgrade analyze of the entire cluster is going
> to take a minimum of 1.5 hours running several threads to analyze all
> tables.  This was measured in an R&D environment.
>
> Got to thinking, what if we make a SAN snapshot a few hours prior to
> upgrade time, upgrade that... then analyze it and then dump the stats
> table?

> Has anyone else attempted anything similar?  Any feedback is
> appreciated.

I certainly understand the motivation, and it may be a good option
if you test carefully beforehand.

What I have done in a similar situation, to minimize down time, is
to run a database ANALYZE with a very small target.  I forget the
particular value we used -- it may have been 3.  Then ran an
ANALYZE with the normal default target on a few key tables
(identified in testing to be the ones eating most of the scan time
with no statistics), and let users in.  The database ANALYZE with
the normal default target was done while normal production hit the
database, without too much of a performance hit.  With this
technique we were able to let users in with near-normal performance
with 10 or 15 minutes of down time rather than hours.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Dump/Reload pg_statistic to cut time from pg_upgrade?

От
Jerry Sievers
Дата:
Kevin Grittner <kgrittn@ymail.com> writes:

> Jerry Sievers <gsievers19@comcast.net> wrote:
>
>> Planning to pg_upgrade some large (3TB) clusters using hard link
>> method.  Run time for the upgrade itself takes around 5 minutes.
>> Nice!!  Origin version 8.4 and destination version 9.1.
>>
>> Unfortunately the post-upgrade analyze of the entire cluster is going
>> to take a minimum of 1.5 hours running several threads to analyze all
>> tables.  This was measured in an R&D environment.
>>
>> Got to thinking, what if we make a SAN snapshot a few hours prior to
>> upgrade time, upgrade that... then analyze it and then dump the stats
>> table?
>
>> Has anyone else attempted anything similar?  Any feedback is
>> appreciated.
>
> I certainly understand the motivation, and it may be a good option
> if you test carefully beforehand.
>
> What I have done in a similar situation, to minimize down time, is
> to run a database ANALYZE with a very small target.  I forget the
> particular value we used -- it may have been 3.  Then ran an
> ANALYZE with the normal default target on a few key tables
> (identified in testing to be the ones eating most of the scan time
> with no statistics), and let users in.  The database ANALYZE with
> the normal default target was done while normal production hit the
> database, without too much of a performance hit.  With this
> technique we were able to let users in with near-normal performance
> with 10 or 15 minutes of down time rather than hours.

Thanks Kevin!  In fact, I've conceived of this solution route already
and may have to resort to it.  We do run with
default_statistics_target set fairly high at 1000 (legacy setting
here) without my knowing for certain that it's required across the
board (and most likely is not).

Curious though if it's known that the pg_statistic table can't be
reloded from it's own pg_dump due to that error that I highlighted in
the original post.

Agree that being able to manually load pg_statistic is of questionable
usefulness though perhaps viable under the right circumstances.

> --
> Kevin Grittner
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

--
Jerry Sievers
e: jerry.sievers@comcast.net
p: 312.241.7800


Re: Dump/Reload pg_statistic to cut time from pg_upgrade?

От
Tom Lane
Дата:
Jerry Sievers <gsievers19@comcast.net> writes:
> Kevin Grittner <kgrittn@ymail.com> writes:
>> Jerry Sievers <gsievers19@comcast.net> wrote:
>>> Planning to pg_upgrade some large (3TB) clusters using hard link
>>> method.� Run time for the upgrade itself takes around 5 minutes.
>>> Unfortunately the post-upgrade analyze of the entire cluster is going
>>> to take a minimum of 1.5 hours running several threads to analyze all
>>> tables.� This was measured in an R&D environment.

At least for some combinations of source and destination server
versions, it seems like it ought to be possible for pg_upgrade to just
move the old cluster's pg_statistic tables over to the new, as though
they were user data.  pg_upgrade takes pains to preserve relation OIDs
and attnums, so the key values should be compatible.  Except in
releases where we've added physical columns to pg_statistic or made a
non-backward-compatible redefinition of statistics meanings, it seems
like this should Just Work.  In cases where it doesn't work, pg_dump
and reload of that table would not work either (even without the
anyarray problem).

            regards, tom lane


Re: Dump/Reload pg_statistic to cut time from pg_upgrade?

От
Bruce Momjian
Дата:
On Wed, Jul 10, 2013 at 10:47:33AM -0500, Jerry Sievers wrote:
> > What I have done in a similar situation, to minimize down time, is
> > to run a database ANALYZE with a very small target.  I forget the
> > particular value we used -- it may have been 3.  Then ran an
> > ANALYZE with the normal default target on a few key tables
> > (identified in testing to be the ones eating most of the scan time
> > with no statistics), and let users in.  The database ANALYZE with
> > the normal default target was done while normal production hit the
> > database, without too much of a performance hit.  With this
> > technique we were able to let users in with near-normal performance
> > with 10 or 15 minutes of down time rather than hours.
>
> Thanks Kevin!  In fact, I've conceived of this solution route already
> and may have to resort to it.  We do run with
> default_statistics_target set fairly high at 1000 (legacy setting
> here) without my knowing for certain that it's required across the
> board (and most likely is not).
>
> Curious though if it's known that the pg_statistic table can't be
> reloded from it's own pg_dump due to that error that I highlighted in
> the original post.
>
> Agree that being able to manually load pg_statistic is of questionable
> usefulness though perhaps viable under the right circumstances.

9.2 already creates an incremental statistics script called
analyze_new_cluster.sh:

        Make pg_upgrade create a script to incrementally generate more
        accurate optimizer statistics (Bruce Momjian)

        This reduces the time needed to generate minimal cluster statistics
        after an upgrade.

It runs with a 1/10/default statistics target.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +


Re: Dump/Reload pg_statistic to cut time from pg_upgrade?

От
Bruce Momjian
Дата:
On Wed, Jul 10, 2013 at 12:46:26PM -0400, Tom Lane wrote:
> Jerry Sievers <gsievers19@comcast.net> writes:
> > Kevin Grittner <kgrittn@ymail.com> writes:
> >> Jerry Sievers <gsievers19@comcast.net> wrote:
> >>> Planning to pg_upgrade some large (3TB) clusters using hard link
> >>> method.� Run time for the upgrade itself takes around 5 minutes.
> >>> Unfortunately the post-upgrade analyze of the entire cluster is going
> >>> to take a minimum of 1.5 hours running several threads to analyze all
> >>> tables.� This was measured in an R&D environment.
>
> At least for some combinations of source and destination server
> versions, it seems like it ought to be possible for pg_upgrade to just
> move the old cluster's pg_statistic tables over to the new, as though
> they were user data.  pg_upgrade takes pains to preserve relation OIDs
> and attnums, so the key values should be compatible.  Except in
> releases where we've added physical columns to pg_statistic or made a
> non-backward-compatible redefinition of statistics meanings, it seems
> like this should Just Work.  In cases where it doesn't work, pg_dump
> and reload of that table would not work either (even without the
> anyarray problem).

Yes, we could certainly do that, but since 9.2 creates an incremental
statistics build script, I need someone to say that is too slow before I
code up something more complex.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +