Обсуждение: pg_dump ANALYZE statements
There is currently a performance tip to run ANALYZE after a pg_dump script has been restored. ISTM that pg_dump should default to have ANALYZE statements for a table, when both table definition and data are dumped. If only data (-a) or only table definition (-s) is dumped the default should be to *not* add an ANALYZE statement. There doesn't seem to be any reason to skip the ANALYZE, but I'll implement it as an option. -z on | off --analyze=on | off This would add a table-specific ANALYZE statement following each table's actions. I'm not aware of a strong argument against such an option. Performance surely can't be one because the time saved on the ANALYZE will quickly bite back on time lost on poorly planned queries. What does the panel think? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Simon Riggs" <simon@2ndquadrant.com> writes: > ISTM that pg_dump should default to have ANALYZE statements for a table, > when both table definition and data are dumped. If only data (-a) or > only table definition (-s) is dumped the default should be to *not* add > an ANALYZE statement. Having pg_dump emit ANALYZE was discussed and rejected years ago. Have you read that discussion? Do you have any new arguments to make? regards, tom lane
Simon Riggs wrote: > There is currently a performance tip to run ANALYZE after a pg_dump > script has been restored. > > ISTM that pg_dump should default to have ANALYZE statements for a table, > when both table definition and data are dumped. If only data (-a) or > only table definition (-s) is dumped the default should be to *not* add > an ANALYZE statement. > > There doesn't seem to be any reason to skip the ANALYZE, but I'll > implement it as an option. > -z on | off > --analyze=on | off > > This would add a table-specific ANALYZE statement following each table's > actions. > > I'm not aware of a strong argument against such an option. Performance > surely can't be one because the time saved on the ANALYZE will quickly > bite back on time lost on poorly planned queries. > > What does the panel think? how is this going to interact with the (now by default enabled) autovacuum daemon ? Stefan
"Simon Riggs" <simon@2ndquadrant.com> writes: > There doesn't seem to be any reason to skip the ANALYZE, but I'll > implement it as an option. > -z on | off > --analyze=on | off Only an aesthetic comment: Short options don't usually take on/off arguments, I would suggest making the default be to analyze and make -z and --analyze=off disable the analyze. You might also consider having a --analyze=verbose and perhaps a --analyze=full though currently that would require doing vacuum analyze. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Mon, 2007-01-22 at 10:49 -0500, Tom Lane wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > ISTM that pg_dump should default to have ANALYZE statements for a table, > > when both table definition and data are dumped. If only data (-a) or > > only table definition (-s) is dumped the default should be to *not* add > > an ANALYZE statement. > > Having pg_dump emit ANALYZE was discussed and rejected years ago. > Have you read that discussion? Do you have any new arguments to make? Well, its been suggested before, but I can't see any good arguments against. Specifically, there was one person who spoke in favour of it last time it was mentioned. http://archives.postgresql.org/pgsql-hackers/2003-02/msg01270.php If its a performance tip, we should be doing it automatically. If we genuinely believe that autovacuum will handle it, then we can simply alter the docs to say: if you aren't running autovacuum, then don't forget to ANALYZE. IMHO it is not sufficient to rely upon autovacuum to do all of this work for us. Rod mentions the discussion has been raised before but doesn't state what the arguments were: http://archives.postgresql.org/pgsql-hackers/2003-02/msg01264.php These other posts also seem to be in favour of the idea... http://archives.postgresql.org/pgsql-performance/2006-10/msg00142.php http://archives.postgresql.org/pgsql-hackers/2003-02/msg01273.php -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On Mon, Jan 22, 2007 at 03:40:17PM +0000, Simon Riggs wrote: > This would add a table-specific ANALYZE statement following each table's > actions. It'd probably be best to put it before any index creating activities, since there's a better chance of everything from the table being in shared buffers. Better yet would be if COPY could analyze data as it was loaded in... -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
"Jim C. Nasby" <jim@nasby.net> writes: > On Mon, Jan 22, 2007 at 03:40:17PM +0000, Simon Riggs wrote: >> This would add a table-specific ANALYZE statement following each table's >> actions. > It'd probably be best to put it before any index creating activities, No, because then you'd fail to accumulate any stats on partial or functional indexes. There's been talk of using the presence of multi-column indexes to guide creation of cross-column statistics, too. regards, tom lane