Re: Row estimates for empty tables

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Row estimates for empty tables
Дата
Msg-id 629600.1595630085@sss.pgh.pa.us
обсуждение исходный текст
Ответы Re: Row estimates for empty tables  (David Rowley <dgrowleyml@gmail.com>)
Re: Row estimates for empty tables  (Pavel Stehule <pavel.stehule@gmail.com>)
Список pgsql-hackers
[ redirecting to -hackers ]

I wrote:
> The core issue here is "how do we know whether the table is likely to stay
> empty?".  I can think of a couple of more or less klugy solutions:

> 1. Arrange to send out a relcache inval when adding the first page to
> a table, and then remove the planner hack for disbelieving relpages = 0.
> I fear this'd be a mess from a system structural standpoint, but it might
> work fairly transparently.

I experimented with doing this.  It's not hard to code, if you don't mind
having RelationGetBufferForTuple calling CacheInvalidateRelcache.  I'm not
sure whether that code path might cause any long-term problems, but it
seems to work OK right now.  However, this solution causes massive
"failures" in the regression tests as a result of plans changing.  I'm
sure that's partly because we use so many small tables in the tests.
Nonetheless, it's not promising from the standpoint of not causing
unexpected problems in the real world.

> 2. Establish the convention that vacuuming or analyzing an empty table
> is what you do to tell the system that this state is going to persist.
> That's more or less what the existing comments in plancat.c envision,
> but we never made a definition for how the occurrence of that event
> would be recorded in the catalogs, other than setting relpages > 0.
> Rather than adding another pg_class column, I'm tempted to say that
> vacuum/analyze should set relpages to a minimum of 1, even if the
> relation has zero pages.

I also tried this, and it seems a lot more promising: no existing
regression test cases change.  So perhaps we should do the attached
or something like it.

            regards, tom lane

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1bbc4598f7..243cdaa409 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -594,6 +594,19 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
     new_frozen_xid = scanned_all_unfrozen ? FreezeLimit : InvalidTransactionId;
     new_min_multi = scanned_all_unfrozen ? MultiXactCutoff : InvalidMultiXactId;

+    /*
+     * Another corner case is that if the relation is physically zero-length,
+     * we don't want to record relpages=reltuples=0 in pg_class; what we want
+     * to record is relpages=1, reltuples=0.  This tells the planner that the
+     * relation has been seen to be empty by VACUUM or ANALYZE, so it should
+     * not override a small measured table size.
+     */
+    if (new_rel_pages == 0)
+    {
+        Assert(new_live_tuples == 0);
+        new_rel_pages = 1;
+    }
+
     vac_update_relstats(onerel,
                         new_rel_pages,
                         new_live_tuples,
diff --git a/src/backend/access/table/tableam.c b/src/backend/access/table/tableam.c
index 4b2bb29559..0a92450e8a 100644
--- a/src/backend/access/table/tableam.c
+++ b/src/backend/access/table/tableam.c
@@ -585,11 +585,9 @@ table_block_relation_estimate_size(Relation rel, int32 *attr_widths,
      * doesn't happen instantaneously, and it won't happen at all for cases
      * such as temporary tables.)
      *
-     * We approximate "never vacuumed" by "has relpages = 0", which means this
-     * will also fire on genuinely empty relations.  Not great, but
-     * fortunately that's a seldom-seen case in the real world, and it
-     * shouldn't degrade the quality of the plan too much anyway to err in
-     * this direction.
+     * We test "never vacuumed" by seeing whether relpages = 0.  VACUUM and
+     * ANALYZE will never set relpages to less than 1, even if the relation is
+     * in fact zero length.
      *
      * If the table has inheritance children, we don't apply this heuristic.
      * Totally empty parent tables are quite common, so we should be willing
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 924ef37c81..f0247e350f 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -620,6 +620,16 @@ do_analyze_rel(Relation onerel, VacuumParams *params,

         visibilitymap_count(onerel, &relallvisible, NULL);

+        /*
+         * As in VACUUM, replace relpages=reltuples=0 with relpages=1,
+         * reltuples=0.
+         */
+        if (relpages == 0)
+        {
+            Assert(totalrows == 0);
+            relpages = 1;
+        }
+
         vac_update_relstats(onerel,
                             relpages,
                             totalrows,

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ranier Vilela
Дата:
Сообщение: Re: Improving connection scalability: GetSnapshotData()
Следующее
От: Andres Freund
Дата:
Сообщение: Re: hashagg slowdown due to spill changes