Обсуждение: document the need to analyze partitioned tables

Поиск
Список
Период
Сортировка

document the need to analyze partitioned tables

От
Justin Pryzby
Дата:
Adding -hackers, sorry for the duplicate.

This seems to be deficient, citing
https://www.postgresql.org/message-id/flat/0d1b394b-bec9-8a71-a336-44df7078b295%40gmail.com

I'm proposing something like the attached.  Ideally, there would be a central
place to put details, and the other places could refer to that.

Since the autoanalyze patch was reverted, this should be easily applied to
backbranches, which is probably most of its value.

commit 4ad2c8f6fd8eb26d76b226e68d3fdb8f0658f113
Author: Justin Pryzby <pryzbyj@telsasoft.com>
Date:   Thu Jul 22 16:06:18 2021 -0500

    documentation deficiencies for ANALYZE of partitioned tables
    
    This is partially extracted from 1b5617eb844cd2470a334c1d2eec66cf9b39c41a,
    which was reverted.

diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 36f975b1e5..decfabff5d 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -290,6 +290,14 @@
     to meaningful statistical changes.
    </para>
 
+   <para>
+    Tuples changed in partitions and inheritence children do not count towards
+    analyze on the parent table.  If the parent table is empty or rarely
+    changed, it may never be processed by autovacuum.  It is necessary to
+    periodically run an manual <command>ANALYZE</command> to keep the statistics
+    of the table hierarchy up to date.
+   </para>
+
    <para>
     As with vacuuming for space recovery, frequent updates of statistics
     are more useful for heavily-updated tables than for seldom-updated
@@ -347,6 +355,18 @@
      <command>ANALYZE</command> commands on those tables on a suitable schedule.
     </para>
    </tip>
+
+   <tip>
+    <para>
+     The autovacuum daemon does not issue <command>ANALYZE</command> commands for
+     partitioned tables.  Inheritence parents will only be analyzed if the
+     parent is changed - changes to child tables do not trigger autoanalyze on
+     the parent table.  It is necessary to periodically run an manual
+     <command>ANALYZE</command> to keep the statistics of the table hierarchy up to
+     date.
+    </para>
+   </tip>
+
   </sect2>
 
   <sect2 id="vacuum-for-visibility-map">
@@ -817,6 +837,18 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
 </programlisting>
     is compared to the total number of tuples inserted, updated, or deleted
     since the last <command>ANALYZE</command>.
+
+    Partitioned tables are not processed by autovacuum, and their statistics
+    should be updated by manually running <command>ANALYZE</command> when the
+    table is first populated, and whenever the distribution of data in its
+    partitions changes significantly.
+   </para>
+
+   <para>
+    Partitioned tables are not processed by autovacuum.  Statistics
+    should be collected by running a manual <command>ANALYZE</command> when it is
+    first populated, and updated whenever the distribution of data in its
+    partitions changes significantly.
    </para>
 
    <para>
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index 89ff58338e..b84853fd6f 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1765,9 +1765,11 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
    <title>Run <command>ANALYZE</command> Afterwards</title>
 
    <para>
+
     Whenever you have significantly altered the distribution of data
     within a table, running <link linkend="sql-analyze"><command>ANALYZE</command></link> is strongly recommended.
This
     includes bulk loading large amounts of data into the table.  Running
+
     <command>ANALYZE</command> (or <command>VACUUM ANALYZE</command>)
     ensures that the planner has up-to-date statistics about the
     table.  With no statistics or obsolete statistics, the planner might
diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml
index c423aeeea5..20ffbc2d7a 100644
--- a/doc/src/sgml/ref/analyze.sgml
+++ b/doc/src/sgml/ref/analyze.sgml
@@ -250,22 +250,33 @@ ANALYZE [ VERBOSE ] [ <replaceable class="parameter">table_and_columns</replacea
   </para>
 
   <para>
-    If the table being analyzed has one or more children,
-    <command>ANALYZE</command> will gather statistics twice: once on the
-    rows of the parent table only, and a second time on the rows of the
-    parent table with all of its children.  This second set of statistics
-    is needed when planning queries that traverse the entire inheritance
-    tree.  The autovacuum daemon, however, will only consider inserts or
-    updates on the parent table itself when deciding whether to trigger an
-    automatic analyze for that table.  If that table is rarely inserted into
-    or updated, the inheritance statistics will not be up to date unless you
-    run <command>ANALYZE</command> manually.
+    If the table being analyzed is partitioned, <command>ANALYZE</command>
+    will gather statistics by sampling blocks randomly from its partitions;
+    in addition, it will recurse into each partition and update its statistics.
+    (However, in multi-level partitioning scenarios, each leaf partition
+    will only be analyzed once.)
+    By constrast, if the table being analyzed has inheritance children,
+    <command>ANALYZE</command> will gather statistics for it twice:
+    once on the rows of the parent table only, and a second time on the
+    rows of the parent table with all of its children.  This second set of
+    statistics is needed when planning queries that traverse the entire
+    inheritance tree.  The child tables themselves are not individually
+    analyzed in this case.
   </para>
 
   <para>
-    If any of the child tables are foreign tables whose foreign data wrappers
-    do not support <command>ANALYZE</command>, those child tables are ignored while
-    gathering inheritance statistics.
+    The autovacuum daemon does not process partitioned tables or inheritence
+    parents.  It is usually necessary to periodically run a manual
+    <command>ANALYZE</command> to keep the statistics of the table hierarchy
+    up to date (except for nonempty inheritence parents which undergo
+    modifications of their own table data).
+    See...
+  </para>
+
+  <para>
+    If any of the child tables or partitions are foreign tables whose foreign
+    data wrappers do not support <command>ANALYZE</command>, those tables are
+    ignored while gathering inheritance statistics.
   </para>
 
   <para>



Re: document the need to analyze partitioned tables

От
Zhihong Yu
Дата:


On Sun, Sep 12, 2021 at 8:54 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
Adding -hackers, sorry for the duplicate.

This seems to be deficient, citing
https://www.postgresql.org/message-id/flat/0d1b394b-bec9-8a71-a336-44df7078b295%40gmail.com

I'm proposing something like the attached.  Ideally, there would be a central
place to put details, and the other places could refer to that.

Since the autoanalyze patch was reverted, this should be easily applied to
backbranches, which is probably most of its value.

commit 4ad2c8f6fd8eb26d76b226e68d3fdb8f0658f113
Author: Justin Pryzby <pryzbyj@telsasoft.com>
Date:   Thu Jul 22 16:06:18 2021 -0500

    documentation deficiencies for ANALYZE of partitioned tables

    This is partially extracted from 1b5617eb844cd2470a334c1d2eec66cf9b39c41a,
    which was reverted.

diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 36f975b1e5..decfabff5d 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -290,6 +290,14 @@
     to meaningful statistical changes.
    </para>

+   <para>
+    Tuples changed in partitions and inheritence children do not count towards
+    analyze on the parent table.  If the parent table is empty or rarely
+    changed, it may never be processed by autovacuum.  It is necessary to
+    periodically run an manual <command>ANALYZE</command> to keep the statistics
+    of the table hierarchy up to date.
+   </para>
+
    <para>
     As with vacuuming for space recovery, frequent updates of statistics
     are more useful for heavily-updated tables than for seldom-updated
@@ -347,6 +355,18 @@
      <command>ANALYZE</command> commands on those tables on a suitable schedule.
     </para>
    </tip>
+
+   <tip>
+    <para>
+     The autovacuum daemon does not issue <command>ANALYZE</command> commands for
+     partitioned tables.  Inheritence parents will only be analyzed if the
+     parent is changed - changes to child tables do not trigger autoanalyze on
+     the parent table.  It is necessary to periodically run an manual
+     <command>ANALYZE</command> to keep the statistics of the table hierarchy up to
+     date.
+    </para>
+   </tip>
+
   </sect2>

   <sect2 id="vacuum-for-visibility-map">
@@ -817,6 +837,18 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
 </programlisting>
     is compared to the total number of tuples inserted, updated, or deleted
     since the last <command>ANALYZE</command>.
+
+    Partitioned tables are not processed by autovacuum, and their statistics
+    should be updated by manually running <command>ANALYZE</command> when the
+    table is first populated, and whenever the distribution of data in its
+    partitions changes significantly.
+   </para>
+
+   <para>
+    Partitioned tables are not processed by autovacuum.  Statistics
+    should be collected by running a manual <command>ANALYZE</command> when it is
+    first populated, and updated whenever the distribution of data in its
+    partitions changes significantly.
    </para>

    <para>
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index 89ff58338e..b84853fd6f 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1765,9 +1765,11 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
    <title>Run <command>ANALYZE</command> Afterwards</title>

    <para>
+
     Whenever you have significantly altered the distribution of data
     within a table, running <link linkend="sql-analyze"><command>ANALYZE</command></link> is strongly recommended. This
     includes bulk loading large amounts of data into the table.  Running
+
     <command>ANALYZE</command> (or <command>VACUUM ANALYZE</command>)
     ensures that the planner has up-to-date statistics about the
     table.  With no statistics or obsolete statistics, the planner might
diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml
index c423aeeea5..20ffbc2d7a 100644
--- a/doc/src/sgml/ref/analyze.sgml
+++ b/doc/src/sgml/ref/analyze.sgml
@@ -250,22 +250,33 @@ ANALYZE [ VERBOSE ] [ <replaceable class="parameter">table_and_columns</replacea
   </para>

   <para>
-    If the table being analyzed has one or more children,
-    <command>ANALYZE</command> will gather statistics twice: once on the
-    rows of the parent table only, and a second time on the rows of the
-    parent table with all of its children.  This second set of statistics
-    is needed when planning queries that traverse the entire inheritance
-    tree.  The autovacuum daemon, however, will only consider inserts or
-    updates on the parent table itself when deciding whether to trigger an
-    automatic analyze for that table.  If that table is rarely inserted into
-    or updated, the inheritance statistics will not be up to date unless you
-    run <command>ANALYZE</command> manually.
+    If the table being analyzed is partitioned, <command>ANALYZE</command>
+    will gather statistics by sampling blocks randomly from its partitions;
+    in addition, it will recurse into each partition and update its statistics.
+    (However, in multi-level partitioning scenarios, each leaf partition
+    will only be analyzed once.)
+    By constrast, if the table being analyzed has inheritance children,
+    <command>ANALYZE</command> will gather statistics for it twice:
+    once on the rows of the parent table only, and a second time on the
+    rows of the parent table with all of its children.  This second set of
+    statistics is needed when planning queries that traverse the entire
+    inheritance tree.  The child tables themselves are not individually
+    analyzed in this case.
   </para>

   <para>
-    If any of the child tables are foreign tables whose foreign data wrappers
-    do not support <command>ANALYZE</command>, those child tables are ignored while
-    gathering inheritance statistics.
+    The autovacuum daemon does not process partitioned tables or inheritence
+    parents.  It is usually necessary to periodically run a manual
+    <command>ANALYZE</command> to keep the statistics of the table hierarchy
+    up to date (except for nonempty inheritence parents which undergo
+    modifications of their own table data).
+    See...
+  </para>
+
+  <para>
+    If any of the child tables or partitions are foreign tables whose foreign
+    data wrappers do not support <command>ANALYZE</command>, those tables are
+    ignored while gathering inheritance statistics.
   </para>

   <para>


Hi,
Minor comment:

periodically run an manual  -> periodically run a manual  

Cheers

Re: document the need to analyze partitioned tables

От
Justin Pryzby
Дата:
Cleaned up and attached as a .patch.

The patch implementing autoanalyze on partitioned tables should revert relevant
portions of this patch.

Вложения

Re: document the need to analyze partitioned tables

От
Tomas Vondra
Дата:
Hi,

On 10/8/21 14:58, Justin Pryzby wrote:
> Cleaned up and attached as a .patch.
> 
> The patch implementing autoanalyze on partitioned tables should
> revert relevant portions of this patch.

I went through this patch and I'd like to propose a couple changes, per 
the 0002 patch:

1) I've reworded the changes in maintenance.sgml a bit. It sounded a bit 
strange before, but I'm not a native speaker so maybe it's worse ...

2) Remove unnecessary whitespace changes in perform.sgml.

3) Simplify the analyze.sgml changes a bit - it was trying to cram too 
much stuff into a single paragraph, so I split that.

Does that seem OK, or did omit something important?

FWIW I think it's really confusing we have inheritance and partitioning, 
and partitions and child tables. And sometimes we use partitioning in 
the generic sense (i.e. including the inheritance approach), and 
sometimes only the declarative variant. Same for partitions vs child 
tables. I can't even imagine how confusing this has to be for people 
just learning this stuff. They must be in permanent WTF?! state ...

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Вложения

Re: document the need to analyze partitioned tables

От
Justin Pryzby
Дата:
Thanks for looking at this

On Fri, Jan 21, 2022 at 06:21:57PM +0100, Tomas Vondra wrote:
> Hi,
> 
> On 10/8/21 14:58, Justin Pryzby wrote:
> > Cleaned up and attached as a .patch.
> > 
> > The patch implementing autoanalyze on partitioned tables should
> > revert relevant portions of this patch.
> 
> I went through this patch and I'd like to propose a couple changes, per the
> 0002 patch:
> 
> 1) I've reworded the changes in maintenance.sgml a bit. It sounded a bit
> strange before, but I'm not a native speaker so maybe it's worse ...

+     autoanalyze on the parent table.  If your queries require statistics on
                           
 
+     parent relations for proper planning, it's necessary to periodically run
                           
 

You added two references to "relations", but everything else talks about
"tables", which is all that analyze processes.

> 2) Remove unnecessary whitespace changes in perform.sgml.

Those were a note to myself and to any reviewer - should that be updated too ?

> 3) Simplify the analyze.sgml changes a bit - it was trying to cram too much
> stuff into a single paragraph, so I split that.
> 
> Does that seem OK, or did omit something important?

+    If the table being analyzed has one or more children,

I think you're referring to both legacy inheritance and and partitioning.  That
should be more clear.

+    <command>ANALYZE</command> gathers two sets of statistics: once on the rows
+    of the parent table only, and a second one including rows of both the parent
+    table and all child relations.  This second set of statistics is needed when

I think should say ".. and all of its children".

> FWIW I think it's really confusing we have inheritance and partitioning, and
> partitions and child tables. And sometimes we use partitioning in the
> generic sense (i.e. including the inheritance approach), and sometimes only
> the declarative variant. Same for partitions vs child tables. I can't even
> imagine how confusing this has to be for people just learning this stuff.
> They must be in permanent WTF?! state ...

The docs were cleaned up some in 0c06534bd.  At least the word "partitioned"
should never be used for legacy inheritance - but "partitioning" is.

-- 
Justin



Re: document the need to analyze partitioned tables

От
Tomas Vondra
Дата:
On 1/21/22 19:02, Justin Pryzby wrote:
> Thanks for looking at this
> 
> On Fri, Jan 21, 2022 at 06:21:57PM +0100, Tomas Vondra wrote:
>> Hi,
>>
>> On 10/8/21 14:58, Justin Pryzby wrote:
>>> Cleaned up and attached as a .patch.
>>>
>>> The patch implementing autoanalyze on partitioned tables should
>>> revert relevant portions of this patch.
>>
>> I went through this patch and I'd like to propose a couple changes, per the
>> 0002 patch:
>>
>> 1) I've reworded the changes in maintenance.sgml a bit. It sounded a bit
>> strange before, but I'm not a native speaker so maybe it's worse ...
> 
> +     autoanalyze on the parent table.  If your queries require statistics on
> +     parent relations for proper planning, it's necessary to periodically run
> 
> You added two references to "relations", but everything else talks about
> "tables", which is all that analyze processes.
> 

Good point, that should use "tables" too.

>> 2) Remove unnecessary whitespace changes in perform.sgml.
> 
> Those were a note to myself and to any reviewer - should that be updated too ?
> 

Ah, I see. I don't think that part needs updating - it talks about 
having to analyze after a bulk load, and that applies to all tables 
anyway. I don't think it needs to mention partitioned tables need an 
analyze too.

>> 3) Simplify the analyze.sgml changes a bit - it was trying to cram too much
>> stuff into a single paragraph, so I split that.
>>
>> Does that seem OK, or did omit something important?
> 
> +    If the table being analyzed has one or more children,
> 
> I think you're referring to both legacy inheritance and and partitioning.  That
> should be more clear.
> 

I think it applies to both types of partitioning - it's just that in the 
declarative partitioning case the table is always empty so no stats with 
inherit=false are built.

> +    <command>ANALYZE</command> gathers two sets of statistics: once on the rows
> +    of the parent table only, and a second one including rows of both the parent
> +    table and all child relations.  This second set of statistics is needed when
> 
> I think should say ".. and all of its children".
> 

OK

>> FWIW I think it's really confusing we have inheritance and partitioning, and
>> partitions and child tables. And sometimes we use partitioning in the
>> generic sense (i.e. including the inheritance approach), and sometimes only
>> the declarative variant. Same for partitions vs child tables. I can't even
>> imagine how confusing this has to be for people just learning this stuff.
>> They must be in permanent WTF?! state ...
> 
> The docs were cleaned up some in 0c06534bd.  At least the word "partitioned"
> should never be used for legacy inheritance - but "partitioning" is.
> 

OK


-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Вложения

Re: document the need to analyze partitioned tables

От
Robert Haas
Дата:
On Fri, Jan 21, 2022 at 1:31 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
> [ new patch ]

This patch is originally by Justin. The latest version is by Tomas. I
think the next step is for Justin to say whether he's OK with the
latest version that Tomas posted. If he is, then I suggest that he
also mark it Ready for Committer, and that Tomas commit it. If he's
not, he should say what he wants changed and either post a new version
himself or wait for Tomas to do that.

I think the fact that is classified as a "Bug Fix" in the CommitFest
application is not particularly good. I would prefer to see it
classified under "Documentation". I'm prepared to concede that
documentation can have bugs as a general matter, but nobody's data is
getting eaten because the documentation wasn't updated. In fact, this
is the fourth patch from the "bug fix" section I've studied this
afternoon, and, well, none of them have been back-patchable code
defects.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: document the need to analyze partitioned tables

От
Justin Pryzby
Дата:
On Mon, Mar 14, 2022 at 05:23:54PM -0400, Robert Haas wrote:
> On Fri, Jan 21, 2022 at 1:31 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
> > [ new patch ]
> 
> This patch is originally by Justin. The latest version is by Tomas. I
> think the next step is for Justin to say whether he's OK with the
> latest version that Tomas posted. If he is, then I suggest that he
> also mark it Ready for Committer, and that Tomas commit it. If he's
> not, he should say what he wants changed and either post a new version
> himself or wait for Tomas to do that.

Yes, I think it can be Ready.  Done.

I amended some of Tomas' changes (see 0003, attached as txt).

@cfbot: the *.patch file is for your consumption, and the others are only there
to show my changes.

> I think the fact that is classified as a "Bug Fix" in the CommitFest
> application is not particularly good. I would prefer to see it
> classified under "Documentation". I'm prepared to concede that
> documentation can have bugs as a general matter, but nobody's data is
> getting eaten because the documentation wasn't updated. In fact, this
> is the fourth patch from the "bug fix" section I've studied this
> afternoon, and, well, none of them have been back-patchable code
> defects.

In fact, I consider this to be back-patchable back to v10.  IMO it's an
omission that this isn't documented.  Not all bugs cause data to be eaten.  If
someone reads the existing documentation, they might conclude that their
partitioned tables don't need to be analyzed, and they would've been better
served by not reading the docs.

-- 
Justin

Вложения

Re: document the need to analyze partitioned tables

От
Tomas Vondra
Дата:
On 3/16/22 00:00, Justin Pryzby wrote:
> On Mon, Mar 14, 2022 at 05:23:54PM -0400, Robert Haas wrote:
>> On Fri, Jan 21, 2022 at 1:31 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>>> [ new patch ]
>>
>> This patch is originally by Justin. The latest version is by Tomas. I
>> think the next step is for Justin to say whether he's OK with the
>> latest version that Tomas posted. If he is, then I suggest that he
>> also mark it Ready for Committer, and that Tomas commit it. If he's
>> not, he should say what he wants changed and either post a new version
>> himself or wait for Tomas to do that.
> 
> Yes, I think it can be Ready.  Done.
> 
> I amended some of Tomas' changes (see 0003, attached as txt).
> 
> @cfbot: the *.patch file is for your consumption, and the others are only there
> to show my changes.
> 
>> I think the fact that is classified as a "Bug Fix" in the CommitFest
>> application is not particularly good. I would prefer to see it
>> classified under "Documentation". I'm prepared to concede that
>> documentation can have bugs as a general matter, but nobody's data is
>> getting eaten because the documentation wasn't updated. In fact, this
>> is the fourth patch from the "bug fix" section I've studied this
>> afternoon, and, well, none of them have been back-patchable code
>> defects.
> 
> In fact, I consider this to be back-patchable back to v10.  IMO it's an
> omission that this isn't documented.  Not all bugs cause data to be eaten.  If
> someone reads the existing documentation, they might conclude that their
> partitioned tables don't need to be analyzed, and they would've been better
> served by not reading the docs.
> 

I've pushed the last version, and backpatched it to 10 (not sure I'd
call it a bugfix, but I certainly agree with Justin it's worth
mentioning in the docs, even on older branches).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: document the need to analyze partitioned tables

От
Daniel Gustafsson
Дата:
> On 28 Mar 2022, at 15:05, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:

> I've pushed the last version, and backpatched it to 10 (not sure I'd
> call it a bugfix, but I certainly agree with Justin it's worth
> mentioning in the docs, even on older branches).

I happened to spot a small typo in this commit in the ANALYZE docs, and have
just pushed a fix all the way down to 10 as per the original commit.

--
Daniel Gustafsson        https://vmware.com/




Re: document the need to analyze partitioned tables

От
Laurenz Albe
Дата:
On Mon, 2022-03-28 at 15:05 +0200, Tomas Vondra wrote:
> I've pushed the last version, and backpatched it to 10 (not sure I'd
> call it a bugfix, but I certainly agree with Justin it's worth
> mentioning in the docs, even on older branches).

I'd like to suggest an improvement to this.  The current wording could
be read to mean that dead tuples won't get cleaned up in partitioned tables.


By the way, where are the statistics of a partitioned tables used?  The actual
tables scanned are always the partitions, and in the execution plans that
I have seen, the optimizer always used the statistics of the partitions.

Yours,
Laurenz Albe

Вложения

Re: document the need to analyze partitioned tables

От
Andrey Lepikhov
Дата:
On 10/5/22 13:37, Laurenz Albe wrote:
> On Mon, 2022-03-28 at 15:05 +0200, Tomas Vondra wrote:
>> I've pushed the last version, and backpatched it to 10 (not sure I'd
>> call it a bugfix, but I certainly agree with Justin it's worth
>> mentioning in the docs, even on older branches).
> 
> I'd like to suggest an improvement to this.  The current wording could
> be read to mean that dead tuples won't get cleaned up in partitioned tables.
> 
> 
> By the way, where are the statistics of a partitioned tables used?  The actual
> tables scanned are always the partitions, and in the execution plans that
> I have seen, the optimizer always used the statistics of the partitions.
For example, it is used to estimate selectivity of join clause:

CREATE TABLE test (id integer, val integer) PARTITION BY hash (id);
CREATE TABLE test_0 PARTITION OF test
   FOR VALUES WITH (modulus 2, remainder 0);
CREATE TABLE test_1 PARTITION OF test
   FOR VALUES WITH (modulus 2, remainder 1);

INSERT INTO test (SELECT q, q FROM generate_series(1,10) AS q);
VACUUM ANALYZE test;
INSERT INTO test (SELECT q, q%2 FROM generate_series(11,200) AS q);
VACUUM ANALYZE test_0,test_1;

EXPLAIN (ANALYZE, TIMING OFF, SUMMARY OFF)
SELECT * FROM test t1, test t2 WHERE t1.id = t2.val;
VACUUM ANALYZE test;
EXPLAIN (ANALYZE, TIMING OFF, SUMMARY OFF)
SELECT * FROM test t1, test t2 WHERE t1.id = t2.val;

Here without actual statistics on parent table we make wrong prediction.

-- 
Regards
Andrey Lepikhov
Postgres Professional




Re: document the need to analyze partitioned tables

От
Nathan Bossart
Дата:
On Wed, Oct 05, 2022 at 10:37:01AM +0200, Laurenz Albe wrote:
> On Mon, 2022-03-28 at 15:05 +0200, Tomas Vondra wrote:
>> I've pushed the last version, and backpatched it to 10 (not sure I'd
>> call it a bugfix, but I certainly agree with Justin it's worth
>> mentioning in the docs, even on older branches).
> 
> I'd like to suggest an improvement to this.  The current wording could
> be read to mean that dead tuples won't get cleaned up in partitioned tables.

Well, dead tuples won't get cleaned up in partitioned tables, as
partitioned tables do not have storage.  But I see what you mean.  Readers
might misinterpret this to mean that autovacuum will not process the
partitions.  There's a good definition of what the docs mean by
"partitioned table" [0], but FWIW it took me some time before I
consistently read "partitioned table" to mean "only the thing with relkind
set to 'p'" and not "both the partitioned table and its partitions."  So,
while the current wording it technically correct, I think it'd be
reasonable to expand it to help avoid confusion.

Here is my take on the wording:

    Since all the data for a partitioned table is stored in its partitions,
    autovacuum does not process partitioned tables.  Instead, autovacuum
    processes the individual partitions that are regular tables.  This
    means that autovacuum only gathers statistics for the regular tables
    that serve as partitions and not for the partitioned tables.  Since
    queries may rely on a partitioned table's statistics, you should
    collect statistics via the ANALYZE command when it is first populated,
    and again whenever the distribution of data in its partitions changes
    significantly.

[0] https://www.postgresql.org/docs/devel/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: document the need to analyze partitioned tables

От
Bruce Momjian
Дата:
On Thu, Jan 12, 2023 at 03:27:47PM -0800, Nathan Bossart wrote:
> On Wed, Oct 05, 2022 at 10:37:01AM +0200, Laurenz Albe wrote:
> > On Mon, 2022-03-28 at 15:05 +0200, Tomas Vondra wrote:
> >> I've pushed the last version, and backpatched it to 10 (not sure I'd
> >> call it a bugfix, but I certainly agree with Justin it's worth
> >> mentioning in the docs, even on older branches).
> > 
> > I'd like to suggest an improvement to this.  The current wording could
> > be read to mean that dead tuples won't get cleaned up in partitioned tables.
> 
> Well, dead tuples won't get cleaned up in partitioned tables, as
> partitioned tables do not have storage.  But I see what you mean.  Readers
> might misinterpret this to mean that autovacuum will not process the
> partitions.  There's a good definition of what the docs mean by
> "partitioned table" [0], but FWIW it took me some time before I
> consistently read "partitioned table" to mean "only the thing with relkind
> set to 'p'" and not "both the partitioned table and its partitions."  So,
> while the current wording it technically correct, I think it'd be
> reasonable to expand it to help avoid confusion.
> 
> Here is my take on the wording:
> 
>     Since all the data for a partitioned table is stored in its partitions,
>     autovacuum does not process partitioned tables.  Instead, autovacuum
>     processes the individual partitions that are regular tables.  This
>     means that autovacuum only gathers statistics for the regular tables
>     that serve as partitions and not for the partitioned tables.  Since
>     queries may rely on a partitioned table's statistics, you should
>     collect statistics via the ANALYZE command when it is first populated,
>     and again whenever the distribution of data in its partitions changes
>     significantly.

Uh, what about autovacuum's handling of partitioned tables?  This makes
it sound like it ignores them because it talks about manual ANALYZE.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.



Re: document the need to analyze partitioned tables

От
Justin Pryzby
Дата:
On Tue, Jan 17, 2023 at 03:53:24PM -0500, Bruce Momjian wrote:
> On Thu, Jan 12, 2023 at 03:27:47PM -0800, Nathan Bossart wrote:
> > On Wed, Oct 05, 2022 at 10:37:01AM +0200, Laurenz Albe wrote:
> > > On Mon, 2022-03-28 at 15:05 +0200, Tomas Vondra wrote:
> > >> I've pushed the last version, and backpatched it to 10 (not sure I'd
> > >> call it a bugfix, but I certainly agree with Justin it's worth
> > >> mentioning in the docs, even on older branches).
> > > 
> > > I'd like to suggest an improvement to this.  The current wording could
> > > be read to mean that dead tuples won't get cleaned up in partitioned tables.
> > 
> > Well, dead tuples won't get cleaned up in partitioned tables, as
> > partitioned tables do not have storage.  But I see what you mean.  Readers
> > might misinterpret this to mean that autovacuum will not process the
> > partitions.  There's a good definition of what the docs mean by
> > "partitioned table" [0], but FWIW it took me some time before I
> > consistently read "partitioned table" to mean "only the thing with relkind
> > set to 'p'" and not "both the partitioned table and its partitions."  So,
> > while the current wording it technically correct, I think it'd be
> > reasonable to expand it to help avoid confusion.
> > 
> > Here is my take on the wording:
> > 
> >     Since all the data for a partitioned table is stored in its partitions,
> >     autovacuum does not process partitioned tables.  Instead, autovacuum
> >     processes the individual partitions that are regular tables.  This
> >     means that autovacuum only gathers statistics for the regular tables
> >     that serve as partitions and not for the partitioned tables.  Since
> >     queries may rely on a partitioned table's statistics, you should
> >     collect statistics via the ANALYZE command when it is first populated,
> >     and again whenever the distribution of data in its partitions changes
> >     significantly.
> 
> Uh, what about autovacuum's handling of partitioned tables?  This makes
> it sound like it ignores them because it talks about manual ANALYZE.

If we're referring to the *partitioned* table, then it does ignore them.
See:

|commit 6f8127b7390119c21479f5ce495b7d2168930e82
|Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
|Date:   Mon Aug 16 17:27:52 2021 -0400
|
|    Revert analyze support for partitioned tables

Maybe (all?) the clarification the docs need is to say:
"Partitioned tables are not *themselves* processed by autovacuum."

-- 
Justin



Re: document the need to analyze partitioned tables

От
Bruce Momjian
Дата:
On Tue, Jan 17, 2023 at 03:00:50PM -0600, Justin Pryzby wrote:
> On Tue, Jan 17, 2023 at 03:53:24PM -0500, Bruce Momjian wrote:
> > On Thu, Jan 12, 2023 at 03:27:47PM -0800, Nathan Bossart wrote:
> > > Here is my take on the wording:
> > > 
> > >     Since all the data for a partitioned table is stored in its partitions,
> > >     autovacuum does not process partitioned tables.  Instead, autovacuum
> > >     processes the individual partitions that are regular tables.  This
> > >     means that autovacuum only gathers statistics for the regular tables
> > >     that serve as partitions and not for the partitioned tables.  Since
> > >     queries may rely on a partitioned table's statistics, you should
> > >     collect statistics via the ANALYZE command when it is first populated,
> > >     and again whenever the distribution of data in its partitions changes
> > >     significantly.
> > 
> > Uh, what about autovacuum's handling of partitioned tables?  This makes
> > it sound like it ignores them because it talks about manual ANALYZE.
> 
> If we're referring to the *partitioned* table, then it does ignore them.
> See:
> 
> |commit 6f8127b7390119c21479f5ce495b7d2168930e82
> |Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
> |Date:   Mon Aug 16 17:27:52 2021 -0400
> |
> |    Revert analyze support for partitioned tables

Yes, I see that patch was trying to combine the statistics of individual
partitions into a partitioned table summary.

> Maybe (all?) the clarification the docs need is to say:
> "Partitioned tables are not *themselves* processed by autovacuum."

Yes, I think the lack of autovacuum needs to be specifically mentioned
since most people assume autovacuum handles _all_ statistics updating.

Can someone summarize how bad it is we have no statistics on partitioned
tables?  It sounds bad to me. 

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.



Re: document the need to analyze partitioned tables

От
Laurenz Albe
Дата:
On Tue, 2023-01-17 at 16:16 -0500, Bruce Momjian wrote:
> On Tue, Jan 17, 2023 at 03:00:50PM -0600, Justin Pryzby wrote:
> > Maybe (all?) the clarification the docs need is to say:
> > "Partitioned tables are not *themselves* processed by autovacuum."
>
> Yes, I think the lack of autovacuum needs to be specifically mentioned
> since most people assume autovacuum handles _all_ statistics updating.
>
> Can someone summarize how bad it is we have no statistics on partitioned
> tables?  It sounds bad to me.

Andrey Lepikhov had an example earlier in this thread[1].  It doesn't take
an exotic query.

Attached is a new version of my patch that tries to improve the wording.

Yours,
Laurenz Albe

 [1]: https://postgr.es/m/3df5c68b-13aa-53d0-c0ec-ed98e6972e2e%40postgrespro.ru

Вложения

Re: document the need to analyze partitioned tables

От
Justin Pryzby
Дата:
On Wed, Jan 18, 2023 at 10:15:18AM +0100, Laurenz Albe wrote:
> On Tue, 2023-01-17 at 16:16 -0500, Bruce Momjian wrote:
> > On Tue, Jan 17, 2023 at 03:00:50PM -0600, Justin Pryzby wrote:
> > > Maybe (all?) the clarification the docs need is to say:
> > > "Partitioned tables are not *themselves* processed by autovacuum."
> > 
> > Yes, I think the lack of autovacuum needs to be specifically mentioned
> > since most people assume autovacuum handles _all_ statistics updating.

That's what 61fa6ca79 aimed to do.  Laurenz is suggesting further
clarification.

> > Can someone summarize how bad it is we have no statistics on partitioned
> > tables?  It sounds bad to me.
> 
> Andrey Lepikhov had an example earlier in this thread[1].  It doesn't take
> an exotic query. 
> 
> Attached is a new version of my patch that tries to improve the wording.

I tweaked this a bit to end up with:

> -    Partitioned tables are not processed by autovacuum.  Statistics
> -    should be collected by running a manual <command>ANALYZE</command> when it is
> +    The leaf partitions of a partitioned table are normal tables and are processed
> +    by autovacuum; however, autovacuum does not process the partitioned table itself.
> +    This is no problem as far as <command>VACUUM</command> is concerned, since
> +    there's no need to vacuum the empty, partitioned table.  But, as mentioned in
> +    <xref linkend="vacuum-for-statistics"/>, it also means that autovacuum won't
> +    run <command>ANALYZE</command> on the partitioned table.
> +    Although statistics are automatically gathered on its leaf partitions, some queries also need
> +    statistics on the partitioned table to run optimally.  You should collect statistics by
> +    running a manual <command>ANALYZE</command> when the partitioned table is
>      first populated, and again whenever the distribution of data in its
>      partitions changes significantly.
>     </para>

"partitions are normal tables" was techically wrong, as partitions can
also be partitioned.

-- 
Justin



Re: document the need to analyze partitioned tables

От
Bruce Momjian
Дата:
On Wed, Jan 18, 2023 at 11:49:19AM -0600, Justin Pryzby wrote:
> On Wed, Jan 18, 2023 at 10:15:18AM +0100, Laurenz Albe wrote:
> > On Tue, 2023-01-17 at 16:16 -0500, Bruce Momjian wrote:
> > > On Tue, Jan 17, 2023 at 03:00:50PM -0600, Justin Pryzby wrote:
> > > > Maybe (all?) the clarification the docs need is to say:
> > > > "Partitioned tables are not *themselves* processed by autovacuum."
> > > 
> > > Yes, I think the lack of autovacuum needs to be specifically mentioned
> > > since most people assume autovacuum handles _all_ statistics updating.
> 
> That's what 61fa6ca79 aimed to do.  Laurenz is suggesting further
> clarification.

Ah, makes sense, thanks.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.



Re: document the need to analyze partitioned tables

От
Laurenz Albe
Дата:
On Wed, 2023-01-18 at 11:49 -0600, Justin Pryzby wrote:
> I tweaked this a bit to end up with:
>
> > -    Partitioned tables are not processed by autovacuum.  Statistics
> > -    should be collected by running a manual <command>ANALYZE</command> when it is
> > +    The leaf partitions of a partitioned table are normal tables and are processed
> > +    by autovacuum; however, autovacuum does not process the partitioned table itself.
> > +    This is no problem as far as <command>VACUUM</command> is concerned, since
> > +    there's no need to vacuum the empty, partitioned table.  But, as mentioned in
> > +    <xref linkend="vacuum-for-statistics"/>, it also means that autovacuum won't
> > +    run <command>ANALYZE</command> on the partitioned table.
> > +    Although statistics are automatically gathered on its leaf partitions, some queries also need
> > +    statistics on the partitioned table to run optimally.  You should collect statistics by
> > +    running a manual <command>ANALYZE</command> when the partitioned table is
> >      first populated, and again whenever the distribution of data in its
> >      partitions changes significantly.
> >     </para>
>
> "partitions are normal tables" was techically wrong, as partitions can
> also be partitioned.

I am fine with your tweaks.  I think this is good to go.

Yours,
Laurenz Albe



Re: document the need to analyze partitioned tables

От
Bruce Momjian
Дата:
On Wed, Jan 18, 2023 at 10:15:18AM +0100, Laurenz Albe wrote:
> On Tue, 2023-01-17 at 16:16 -0500, Bruce Momjian wrote:
> > On Tue, Jan 17, 2023 at 03:00:50PM -0600, Justin Pryzby wrote:
> > > Maybe (all?) the clarification the docs need is to say:
> > > "Partitioned tables are not *themselves* processed by autovacuum."
> > 
> > Yes, I think the lack of autovacuum needs to be specifically mentioned
> > since most people assume autovacuum handles _all_ statistics updating.
> > 
> > Can someone summarize how bad it is we have no statistics on partitioned
> > tables?  It sounds bad to me.
> 
> Andrey Lepikhov had an example earlier in this thread[1].  It doesn't take
> an exotic query. 
> 
> Attached is a new version of my patch that tries to improve the wording.

Ah, yes, that is the example I saw but could not re-find.  Here is the
output:

    CREATE TABLE test (id integer, val integer) PARTITION BY hash (id);
    
    CREATE TABLE test_0 PARTITION OF test
       FOR VALUES WITH (modulus 2, remainder 0);
    CREATE TABLE test_1 PARTITION OF test
       FOR VALUES WITH (modulus 2, remainder 1);
    
    INSERT INTO test (SELECT q, q FROM generate_series(1,10) AS q);
    
    VACUUM ANALYZE test;
    
    INSERT INTO test (SELECT q, q%2 FROM generate_series(11,200) AS q);
    
    VACUUM ANALYZE test_0,test_1;
    
    EXPLAIN (ANALYZE, TIMING OFF, SUMMARY OFF)
    SELECT * FROM test t1, test t2 WHERE t1.id = t2.val;
                                                   QUERY PLAN                                                
    ---------------------------------------------------------------------------------------------------------
     Hash Join  (cost=7.50..15.25 rows=200 width=16) (actual rows=105 loops=1)
       Hash Cond: (t1.id = t2.val)
       ->  Append  (cost=0.00..5.00 rows=200 width=8) (actual rows=200 loops=1)
             ->  Seq Scan on test_0 t1_1  (cost=0.00..2.13 rows=113 width=8) (actual rows=113 loops=1)
             ->  Seq Scan on test_1 t1_2  (cost=0.00..1.87 rows=87 width=8) (actual rows=87 loops=1)
       ->  Hash  (cost=5.00..5.00 rows=200 width=8) (actual rows=200 loops=1)
             Buckets: 1024  Batches: 1  Memory Usage: 16kB
             ->  Append  (cost=0.00..5.00 rows=200 width=8) (actual rows=200 loops=1)
                   ->  Seq Scan on test_0 t2_1  (cost=0.00..2.13 rows=113 width=8) (actual rows=113 loops=1)
                   ->  Seq Scan on test_1 t2_2  (cost=0.00..1.87 rows=87 width=8) (actual rows=87 loops=1)
    
    VACUUM ANALYZE test;
    
    EXPLAIN (ANALYZE, TIMING OFF, SUMMARY OFF)
    SELECT * FROM test t1, test t2 WHERE t1.id = t2.val;
                                                   QUERY PLAN                                                
    ---------------------------------------------------------------------------------------------------------
     Hash Join  (cost=7.50..15.25 rows=200 width=16) (actual rows=105 loops=1)
       Hash Cond: (t2.val = t1.id)
       ->  Append  (cost=0.00..5.00 rows=200 width=8) (actual rows=200 loops=1)
             ->  Seq Scan on test_0 t2_1  (cost=0.00..2.13 rows=113 width=8) (actual rows=113 loops=1)
             ->  Seq Scan on test_1 t2_2  (cost=0.00..1.87 rows=87 width=8) (actual rows=87 loops=1)
       ->  Hash  (cost=5.00..5.00 rows=200 width=8) (actual rows=200 loops=1)
             Buckets: 1024  Batches: 1  Memory Usage: 16kB
             ->  Append  (cost=0.00..5.00 rows=200 width=8) (actual rows=200 loops=1)
                   ->  Seq Scan on test_0 t1_1  (cost=0.00..2.13 rows=113 width=8) (actual rows=113 loops=1)
                   ->  Seq Scan on test_1 t1_2  (cost=0.00..1.87 rows=87 width=8) (actual rows=87 loops=1)

I see the inner side uses 'val' in the first EXPLAIN and 'id' in the
second, and you are right that 'val' has mostly 0/1.

Is it possible to document when partition table statistics helps?

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.



Re: document the need to analyze partitioned tables

От
Laurenz Albe
Дата:
On Wed, 2023-01-18 at 16:23 -0500, Bruce Momjian wrote:
> Is it possible to document when partition table statistics helps?

I think it would be difficult to come up with an exhaustive list.

Yours,
Laurenz Albe



Re: document the need to analyze partitioned tables

От
Bruce Momjian
Дата:
On Thu, Jan 19, 2023 at 01:50:05PM +0100, Laurenz Albe wrote:
> On Wed, 2023-01-18 at 16:23 -0500, Bruce Momjian wrote:
> > Is it possible to document when partition table statistics helps?
> 
> I think it would be difficult to come up with an exhaustive list.

I was afraid of that.  I asked only because most people assume
autovacuum handles _all_ statistics needs, but this case is not handled.
Do people even have any statistics maintenance process anymore, and if
not, how would they know they need to run a manual ANALYZE?

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.



Re: document the need to analyze partitioned tables

От
Laurenz Albe
Дата:
On Thu, 2023-01-19 at 15:56 -0500, Bruce Momjian wrote:
> On Thu, Jan 19, 2023 at 01:50:05PM +0100, Laurenz Albe wrote:
> > On Wed, 2023-01-18 at 16:23 -0500, Bruce Momjian wrote:
> > > Is it possible to document when partition table statistics helps?
> >
> > I think it would be difficult to come up with an exhaustive list.
>
> I was afraid of that.  I asked only because most people assume
> autovacuum handles _all_ statistics needs, but this case is not handled.
> Do people even have any statistics maintenance process anymore, and if
> not, how would they know they need to run a manual ANALYZE?

Probably not.  I think this would warrant an entry in the TODO list:
"make autovacuum collect startistics for partitioned tables".

Even if we cannot give better advice than "run ALANYZE manually if
the execution plan looks fishy", the patch is still an improvement,
isn't it?

I have already seen several questions by people who read the current
documentation and were worried that autovacuum wouldn't clean up their
partitioned tables.

Yours,
Laurenz Albe



Re: document the need to analyze partitioned tables

От
Bruce Momjian
Дата:
On Fri, Jan 20, 2023 at 10:33:57AM +0100, Laurenz Albe wrote:
> On Thu, 2023-01-19 at 15:56 -0500, Bruce Momjian wrote:
> > On Thu, Jan 19, 2023 at 01:50:05PM +0100, Laurenz Albe wrote:
> > > On Wed, 2023-01-18 at 16:23 -0500, Bruce Momjian wrote:
> > > > Is it possible to document when partition table statistics helps?
> > > 
> > > I think it would be difficult to come up with an exhaustive list.
> > 
> > I was afraid of that.  I asked only because most people assume
> > autovacuum handles _all_ statistics needs, but this case is not handled.
> > Do people even have any statistics maintenance process anymore, and if
> > not, how would they know they need to run a manual ANALYZE?
> 
> Probably not.  I think this would warrant an entry in the TODO list:
> "make autovacuum collect startistics for partitioned tables".

We have it already:

    Have autoanalyze of parent tables occur when child tables are modified

> Even if we cannot give better advice than "run ALANYZE manually if
> the execution plan looks fishy", the patch is still an improvement,
> isn't it?

Yes.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

Embrace your flaws.  They make you human, rather than perfect,
which you will never be.



Re: document the need to analyze partitioned tables

От
David Rowley
Дата:
On Wed, 18 Jan 2023 at 22:15, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> Attached is a new version of my patch that tries to improve the wording.

I had a look at this and agree that we should adjust the paragraph in
question if people are finding it confusing.

For your wording, I found I had a small problem with calling
partitions of a partitioned tables "normal tables" in:

+    The partitions of a partitioned table are normal tables and get processed
+    by autovacuum, but autovacuum doesn't process the partitioned table itself.

I started to adjust that but since the text is fairly short it turned
out quite different from what you had.

I ended up with:

+    With partitioned tables, since these do not directly store tuples, these
+    do not require autovacuum to perform any <command>VACUUM</command>
+    operations.  Autovacuum simply performs a <command>VACUUM</command> on the
+    partitioned table's partitions the same as it does with normal tables.
+    However, the same is true for <command>ANALYZE</command> operations, and
+    this can be problematic as there are various places in the query planner
+    that attempt to make use of table statistics for partitioned tables when
+    partitioned tables are queried.  For now, you can work around this problem
+    by running a manual <command>ANALYZE</command> command on the partitioned
+    table when the partitioned table is first populated, and again whenever
+    the distribution of data in its partitions changes significantly.

which I've also attached in patch form.

I know there's been a bit of debate on the wording and a few patches,
so I may not be helping.  If nobody is against the above, then I don't
mind going ahead with it and backpatching to whichever version this
first applies to. I just felt I wasn't 100% happy with what was being
proposed.

David

Вложения

Re: document the need to analyze partitioned tables

От
Laurenz Albe
Дата:
On Wed, 2023-01-25 at 16:26 +1300, David Rowley wrote:
> On Wed, 18 Jan 2023 at 22:15, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > Attached is a new version of my patch that tries to improve the wording.
>
> I had a look at this and agree that we should adjust the paragraph in
> question if people are finding it confusing.
>
> I started to adjust that but since the text is fairly short it turned
> out quite different from what you had.
>
> I ended up with:
>
> +    With partitioned tables, since these do not directly store tuples, these
> +    do not require autovacuum to perform any <command>VACUUM</command>
> +    operations.  Autovacuum simply performs a <command>VACUUM</command> on the
> +    partitioned table's partitions the same as it does with normal tables.
> +    However, the same is true for <command>ANALYZE</command> operations, and
> +    this can be problematic as there are various places in the query planner
> +    that attempt to make use of table statistics for partitioned tables when
> +    partitioned tables are queried.  For now, you can work around this problem
> +    by running a manual <command>ANALYZE</command> command on the partitioned
> +    table when the partitioned table is first populated, and again whenever
> +    the distribution of data in its partitions changes significantly.
>
> which I've also attached in patch form.
>
> I know there's been a bit of debate on the wording and a few patches,
> so I may not be helping.  If nobody is against the above, then I don't
> mind going ahead with it and backpatching to whichever version this
> first applies to. I just felt I wasn't 100% happy with what was being
> proposed.

Thanks, your help is welcome.

Did you see Justin's wording suggestion in
https://postgr.es/m/20230118174919.GA9837%40telsasoft.com ?
He didn't attach it as a patch, so you may have missed it.
I was pretty happy with that.

I think your first sentence it a bit clumsy and might be streamlined to

  Partitioned tables do not directly store tuples and consequently do not
  require autovacuum to perform any <command>VACUUM</command> operations.

Also, I am a little bit unhappy about

1. Your paragraph states that partitioned table need no autovacuum,
   but doesn't state unmistakably that they will never be treated
   by autovacuum.

2. You make a distinction between table partitions and "normal tables",
   but really there is no distiction.

Perhaps I am being needlessly picky here...

Yours,
Laurenz Albe



Re: document the need to analyze partitioned tables

От
David Rowley
Дата:
On Wed, 25 Jan 2023 at 19:46, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> Did you see Justin's wording suggestion in
> https://postgr.es/m/20230118174919.GA9837%40telsasoft.com ?
> He didn't attach it as a patch, so you may have missed it.
> I was pretty happy with that.

I didn't pay too much attention as I tend to apply patches to obtain
the full context of the change.  Manually trying to apply a patch from
an email is not something I like to do.

> I think your first sentence it a bit clumsy and might be streamlined to
>
>   Partitioned tables do not directly store tuples and consequently do not
>   require autovacuum to perform any <command>VACUUM</command> operations.

That seems better than what I had.

> Also, I am a little bit unhappy about
>
> 1. Your paragraph states that partitioned table need no autovacuum,
>    but doesn't state unmistakably that they will never be treated
>    by autovacuum.

hmm. I assume the reader realises from the text that lack of any
tuples means VACUUM is not required.  The remaining part of what
autovacuum does not do is explained when the text goes on to say that
ANALYZE operations are also not performed on partitioned tables. I'm
not sure what is left that's mistakable there.

> 2. You make a distinction between table partitions and "normal tables",
>    but really there is no distiction.

We may have different mental models here. This relates to the part
that I wasn't keen on in your patch, i.e:

+    The partitions of a partitioned table are normal tables and get processed
+    by autovacuum

While I agree that the majority of partitions are likely to be
relkind='r', which you might ordinarily consider a "normal table", you
just might change your mind when you try to INSERT or UPDATE records
that would violate the partition constraint. Some partitions might
also be themselves partitioned tables and others might be foreign
tables. That does not really matter much when it comes to what
autovacuum does or does not do, but I'm not really keen to imply in
our documents that partitions are "normal tables".

David



Re: document the need to analyze partitioned tables

От
David Rowley
Дата:
On Wed, 25 Jan 2023 at 21:43, David Rowley <dgrowleyml@gmail.com> wrote:
> While I agree that the majority of partitions are likely to be
> relkind='r', which you might ordinarily consider a "normal table", you
> just might change your mind when you try to INSERT or UPDATE records
> that would violate the partition constraint. Some partitions might
> also be themselves partitioned tables and others might be foreign
> tables. That does not really matter much when it comes to what
> autovacuum does or does not do, but I'm not really keen to imply in
> our documents that partitions are "normal tables".

Based on the above, I'm setting this to waiting on author.

David



Re: document the need to analyze partitioned tables

От
Daniel Gustafsson
Дата:
> On 13 Jul 2023, at 00:21, David Rowley <dgrowleyml@gmail.com> wrote:
> 
> On Wed, 25 Jan 2023 at 21:43, David Rowley <dgrowleyml@gmail.com> wrote:
>> While I agree that the majority of partitions are likely to be
>> relkind='r', which you might ordinarily consider a "normal table", you
>> just might change your mind when you try to INSERT or UPDATE records
>> that would violate the partition constraint. Some partitions might
>> also be themselves partitioned tables and others might be foreign
>> tables. That does not really matter much when it comes to what
>> autovacuum does or does not do, but I'm not really keen to imply in
>> our documents that partitions are "normal tables".
> 
> Based on the above, I'm setting this to waiting on author.

Based on the above, and that the thread has been stalled for months, I'm
marking this returned with feedback.  Please feel free to resubmit a new
version of the patch to a future CF.

--
Daniel Gustafsson




Re: document the need to analyze partitioned tables

От
Laurenz Albe
Дата:
Sorry for dropping the ball on this; I'll add it to the next commitfest.

On Wed, 2023-01-25 at 21:43 +1300, David Rowley wrote:
> > I think your first sentence it a bit clumsy and might be streamlined to
> >
> >   Partitioned tables do not directly store tuples and consequently do not
> >   require autovacuum to perform any <command>VACUUM</command> operations.
>
> That seems better than what I had.

Ok, I went with it.

> > Also, I am a little bit unhappy about
> >
> > 1. Your paragraph states that partitioned table need no autovacuum,
> >    but doesn't state unmistakably that they will never be treated
> >    by autovacuum.
>
> hmm. I assume the reader realises from the text that lack of any
> tuples means VACUUM is not required.  The remaining part of what
> autovacuum does not do is explained when the text goes on to say that
> ANALYZE operations are also not performed on partitioned tables. I'm
> not sure what is left that's mistakable there.

I rewrote the paragraph a little so that it looks clearer to me.
I hope it is OK for you as well.

> > 2. You make a distinction between table partitions and "normal tables",
> >    but really there is no distiction.
>
> We may have different mental models here. This relates to the part
> that I wasn't keen on in your patch, i.e:
>
> +    The partitions of a partitioned table are normal tables and get processed
> +    by autovacuum
>
> While I agree that the majority of partitions are likely to be
> relkind='r', which you might ordinarily consider a "normal table", you
> just might change your mind when you try to INSERT or UPDATE records
> that would violate the partition constraint. Some partitions might
> also be themselves partitioned tables and others might be foreign
> tables. That does not really matter much when it comes to what
> autovacuum does or does not do, but I'm not really keen to imply in
> our documents that partitions are "normal tables".

Agreed, there are differences between partitions and normal tables.
And this is not the place in the documentation where we would like to
get into detail about the differences.

Attached is the next version of my patch.

Yours,
Laurenz Albe

Вложения

Re: document the need to analyze partitioned tables

От
Bruce Momjian
Дата:
On Wed, Sep  6, 2023 at 05:53:56AM +0200, Laurenz Albe wrote:
> > We may have different mental models here. This relates to the part
> > that I wasn't keen on in your patch, i.e:
> > 
> > +    The partitions of a partitioned table are normal tables and get processed
> > +    by autovacuum
> > 
> > While I agree that the majority of partitions are likely to be
> > relkind='r', which you might ordinarily consider a "normal table", you
> > just might change your mind when you try to INSERT or UPDATE records
> > that would violate the partition constraint. Some partitions might
> > also be themselves partitioned tables and others might be foreign
> > tables. That does not really matter much when it comes to what
> > autovacuum does or does not do, but I'm not really keen to imply in
> > our documents that partitions are "normal tables".
> 
> Agreed, there are differences between partitions and normal tables.
> And this is not the place in the documentation where we would like to
> get into detail about the differences.
> 
> Attached is the next version of my patch.

I adjusted your patch to be shorter and clearer, patch attached.  I am
planning to apply this back to PG 11.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Вложения

Re: document the need to analyze partitioned tables

От
Laurenz Albe
Дата:
On Fri, 2023-09-29 at 18:08 -0400, Bruce Momjian wrote:
> On Wed, Sep  6, 2023 at 05:53:56AM +0200, Laurenz Albe wrote:
> > > We may have different mental models here. This relates to the part
> > > that I wasn't keen on in your patch, i.e:
> > >
> > > +    The partitions of a partitioned table are normal tables and get processed
> > > +    by autovacuum
> > >
> > > While I agree that the majority of partitions are likely to be
> > > relkind='r', which you might ordinarily consider a "normal table", you
> > > just might change your mind when you try to INSERT or UPDATE records
> > > that would violate the partition constraint. Some partitions might
> > > also be themselves partitioned tables and others might be foreign
> > > tables. That does not really matter much when it comes to what
> > > autovacuum does or does not do, but I'm not really keen to imply in
> > > our documents that partitions are "normal tables".
> >
> > Agreed, there are differences between partitions and normal tables.
> > And this is not the place in the documentation where we would like to
> > get into detail about the differences.
> >
> > Attached is the next version of my patch.
>
> I adjusted your patch to be shorter and clearer, patch attached.  I am
> planning to apply this back to PG 11.

Thanks for looking at this.

I am mostly fine with your version, but it does not directly state that
autovacuum does not process partitioned tables.  I think this should be
clarified in the beginning.

Yours,
Laurenz Albe



Re: document the need to analyze partitioned tables

От
Bruce Momjian
Дата:
On Sat, Sep 30, 2023 at 12:39:43AM +0200, Laurenz Albe wrote:
> On Fri, 2023-09-29 at 18:08 -0400, Bruce Momjian wrote:
> > On Wed, Sep  6, 2023 at 05:53:56AM +0200, Laurenz Albe wrote:
> > > > We may have different mental models here. This relates to the part
> > > > that I wasn't keen on in your patch, i.e:
> > > > 
> > > > +    The partitions of a partitioned table are normal tables and get processed
> > > > +    by autovacuum
> > > > 
> > > > While I agree that the majority of partitions are likely to be
> > > > relkind='r', which you might ordinarily consider a "normal table", you
> > > > just might change your mind when you try to INSERT or UPDATE records
> > > > that would violate the partition constraint. Some partitions might
> > > > also be themselves partitioned tables and others might be foreign
> > > > tables. That does not really matter much when it comes to what
> > > > autovacuum does or does not do, but I'm not really keen to imply in
> > > > our documents that partitions are "normal tables".
> > > 
> > > Agreed, there are differences between partitions and normal tables.
> > > And this is not the place in the documentation where we would like to
> > > get into detail about the differences.
> > > 
> > > Attached is the next version of my patch.
> > 
> > I adjusted your patch to be shorter and clearer, patch attached.  I am
> > planning to apply this back to PG 11.
> 
> Thanks for looking at this.
> 
> I am mostly fine with your version, but it does not directly state that
> autovacuum does not process partitioned tables.  I think this should be
> clarified in the beginning.

Very good point!   Updated patch attached.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Вложения

Re: document the need to analyze partitioned tables

От
Laurenz Albe
Дата:
On Fri, 2023-09-29 at 22:34 -0400, Bruce Momjian wrote:
> Very good point!   Updated patch attached.

Thanks!  Some small corrections:

> diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
> index 9cf9d030a8..be1c522575 100644
> --- a/doc/src/sgml/maintenance.sgml
> +++ b/doc/src/sgml/maintenance.sgml
> @@ -861,10 +861,16 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
>     </para>
>  
>     <para>
> -    Partitioned tables are not processed by autovacuum.  Statistics
> -    should be collected by running a manual <command>ANALYZE</command> when it is
> -    first populated, and again whenever the distribution of data in its
> -    partitions changes significantly.
> +    Partitioned tables do not directly store tuples and consequently
> +    autovacuum does not <command>VACUUM</command> them.  (Autovacuum does

... does not <command>VACUUM</command> or <command>ANALYZE</command> them.

Perhaps it would be shorter to say "does not process them" like the
original wording.

> +    perform <command>VACUUM</command> on table partitions just like other

Just like *on* other tables, right?

> +    tables.)  Unfortunately, this also means that autovacuum doesn't
> +    run <command>ANALYZE</command> on partitioned tables, and this
> +    can cause suboptimal plans for queries that reference partitioned
> +    table statistics.  You can work around this problem by manually
> +    running <command>ANALYZE</command> on partitioned tables when they
> +    are first populated, and again whenever the distribution of data in
> +    their partitions changes significantly.
>     </para>
>  
>     <para>

Yours,
Laurenz Albe



Re: document the need to analyze partitioned tables

От
Bruce Momjian
Дата:
On Mon, Oct  2, 2023 at 04:48:20AM +0200, Laurenz Albe wrote:
> On Fri, 2023-09-29 at 22:34 -0400, Bruce Momjian wrote:
> > Very good point!   Updated patch attached.
> 
> Thanks!  Some small corrections:
> 
> > diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
> > index 9cf9d030a8..be1c522575 100644
> > --- a/doc/src/sgml/maintenance.sgml
> > +++ b/doc/src/sgml/maintenance.sgml
> > @@ -861,10 +861,16 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
> >     </para>
> >  
> >     <para>
> > -    Partitioned tables are not processed by autovacuum.  Statistics
> > -    should be collected by running a manual <command>ANALYZE</command> when it is
> > -    first populated, and again whenever the distribution of data in its
> > -    partitions changes significantly.
> > +    Partitioned tables do not directly store tuples and consequently
> > +    autovacuum does not <command>VACUUM</command> them.  (Autovacuum does
> 
> ... does not <command>VACUUM</command> or <command>ANALYZE</command> them.
> 
> Perhaps it would be shorter to say "does not process them" like the
> original wording.
> 
> > +    perform <command>VACUUM</command> on table partitions just like other
> 
> Just like *on* other tables, right?

Good points, updated patch attached.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Вложения

Re: document the need to analyze partitioned tables

От
Laurenz Albe
Дата:
On Fri, 2023-10-06 at 12:20 -0400, Bruce Momjian wrote:
> Good points, updated patch attached.

That patch is good to go, as far as I am concerned.

Yours,
Laurenz Albe



Re: document the need to analyze partitioned tables

От
Bruce Momjian
Дата:
On Fri, Oct  6, 2023 at 06:49:05PM +0200, Laurenz Albe wrote:
> On Fri, 2023-10-06 at 12:20 -0400, Bruce Momjian wrote:
> > Good points, updated patch attached.
> 
> That patch is good to go, as far as I am concerned.

Patch applied back to PG 11, thanks.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.