Обсуждение: pg_dump ANALYZE statements

Поиск
Список
Период
Сортировка

pg_dump ANALYZE statements

От
"Simon Riggs"
Дата:
There is currently a performance tip to run ANALYZE after a pg_dump
script has been restored.

ISTM that pg_dump should default to have ANALYZE statements for a table,
when both table definition and data are dumped. If only data (-a) or
only table definition (-s) is dumped the default should be to *not* add
an ANALYZE statement.

There doesn't seem to be any reason to skip the ANALYZE, but I'll
implement it as an option. 
-z on | off 
--analyze=on | off

This would add a table-specific ANALYZE statement following each table's
actions.

I'm not aware of a strong argument against such an option. Performance
surely can't be one because the time saved on the ANALYZE will quickly
bite back on time lost on poorly planned queries.

What does the panel think?

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: pg_dump ANALYZE statements

От
Tom Lane
Дата:
"Simon Riggs" <simon@2ndquadrant.com> writes:
> ISTM that pg_dump should default to have ANALYZE statements for a table,
> when both table definition and data are dumped. If only data (-a) or
> only table definition (-s) is dumped the default should be to *not* add
> an ANALYZE statement.

Having pg_dump emit ANALYZE was discussed and rejected years ago.
Have you read that discussion?  Do you have any new arguments to make?
        regards, tom lane


Re: pg_dump ANALYZE statements

От
Stefan Kaltenbrunner
Дата:
Simon Riggs wrote:
> There is currently a performance tip to run ANALYZE after a pg_dump
> script has been restored.
> 
> ISTM that pg_dump should default to have ANALYZE statements for a table,
> when both table definition and data are dumped. If only data (-a) or
> only table definition (-s) is dumped the default should be to *not* add
> an ANALYZE statement.
> 
> There doesn't seem to be any reason to skip the ANALYZE, but I'll
> implement it as an option. 
> -z on | off 
> --analyze=on | off
> 
> This would add a table-specific ANALYZE statement following each table's
> actions.
> 
> I'm not aware of a strong argument against such an option. Performance
> surely can't be one because the time saved on the ANALYZE will quickly
> bite back on time lost on poorly planned queries.
> 
> What does the panel think?

how is this going to interact with the (now by default enabled) 
autovacuum daemon ?


Stefan


Re: pg_dump ANALYZE statements

От
Gregory Stark
Дата:
"Simon Riggs" <simon@2ndquadrant.com> writes:

> There doesn't seem to be any reason to skip the ANALYZE, but I'll
> implement it as an option. 
> -z on | off 
> --analyze=on | off

Only an aesthetic comment: 

Short options don't usually take on/off arguments, I would suggest making the
default be to analyze and make -z and --analyze=off disable the analyze.

You might also consider having a --analyze=verbose and perhaps a
--analyze=full though currently that would require doing vacuum analyze.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: pg_dump ANALYZE statements

От
"Simon Riggs"
Дата:
On Mon, 2007-01-22 at 10:49 -0500, Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> > ISTM that pg_dump should default to have ANALYZE statements for a table,
> > when both table definition and data are dumped. If only data (-a) or
> > only table definition (-s) is dumped the default should be to *not* add
> > an ANALYZE statement.
> 
> Having pg_dump emit ANALYZE was discussed and rejected years ago.
> Have you read that discussion?  Do you have any new arguments to make?

Well, its been suggested before, but I can't see any good arguments
against. Specifically, there was one person who spoke in favour of it
last time it was mentioned.

http://archives.postgresql.org/pgsql-hackers/2003-02/msg01270.php

If its a performance tip, we should be doing it automatically. If we
genuinely believe that autovacuum will handle it, then we can simply
alter the docs to say: if you aren't running autovacuum, then don't
forget to ANALYZE. IMHO it is not sufficient to rely upon autovacuum to
do all of this work for us.

Rod mentions the discussion has been raised before but doesn't state
what the arguments were:
http://archives.postgresql.org/pgsql-hackers/2003-02/msg01264.php

These other posts also seem to be in favour of the idea...
http://archives.postgresql.org/pgsql-performance/2006-10/msg00142.php
http://archives.postgresql.org/pgsql-hackers/2003-02/msg01273.php

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: pg_dump ANALYZE statements

От
"Jim C. Nasby"
Дата:
On Mon, Jan 22, 2007 at 03:40:17PM +0000, Simon Riggs wrote:
> This would add a table-specific ANALYZE statement following each table's
> actions.

It'd probably be best to put it before any index creating activities,
since there's a better chance of everything from the table being in
shared buffers.

Better yet would be if COPY could analyze data as it was loaded in...
-- 
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)


Re: pg_dump ANALYZE statements

От
Tom Lane
Дата:
"Jim C. Nasby" <jim@nasby.net> writes:
> On Mon, Jan 22, 2007 at 03:40:17PM +0000, Simon Riggs wrote:
>> This would add a table-specific ANALYZE statement following each table's
>> actions.

> It'd probably be best to put it before any index creating activities,

No, because then you'd fail to accumulate any stats on partial or
functional indexes.  There's been talk of using the presence of
multi-column indexes to guide creation of cross-column statistics, too.
        regards, tom lane