Reduce maximum error in tuples estimation after vacuum.

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Reduce maximum error in tuples estimation after vacuum.
Дата
Msg-id 20130614.173528.146929812.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответы Re: Reduce maximum error in tuples estimation after vacuum.  (Kyotaro HORIGUCHI <kyota.horiguchi@gmail.com>)
Re: Reduce maximum error in tuples estimation after vacuum.  (Amit Kapila <amit.kapila@huawei.com>)
Список pgsql-hackers
Hello, 

Postgresql estimates the number of live tuples after the vacuum
has left some buffers unscanned. This estimation does well for
most cases, but makes completely different result with a strong
imbalance of tuple density.

For example,

create table t (a int, b int);
insert into t (select a, (random() * 100000)::int from generate_series((select count(*) from t) + 1, 1000000) a);
update t set b = b + 1 where a <  (select count(*) from t) * 0.7;
vacuum t;
delete from t where a < (select count(*) from t) * 0.99;

After this, pg_stat_user_tables.n_live_tup shows 417670 which is
41 times larger than the real number of rows 100001. And what
makes it worse, autovacuum nor autoanalyze won't run until
n_dead_tup goes above 8 times larger than the real number of
tuples in the table for the default settings..


| postgres=# select n_live_tup, n_dead_tup
|            from pg_stat_user_tables where relname='t';
|  n_live_tup | n_dead_tup 
| ------------+------------
|      417670 |          0
| 
| postgres=# select reltuples from pg_class where relname='t';
|  reltuples 
| -----------
|     417670
| 
| postgres=# select count(*) from t;
|  count 
| -------
|  10001

Using n_dead_tup before vacuuming seems to make it better but I
heard that the plan is abandoned from some reason I don't know.

So I've come up with the another plan - using FSM to estimate the
tuple density in unscanned pages. The point is that make
estimation reliying on the uniformity of tuple length instead of
tuple density. This change seems keeping that errors under a few
times of tuples. Additional page reads for FSM are about 4000th
(SlotsPerFSMPage) of the skipped pages, and I suppose this is
tolerable during vacuum.

Overall algorithm could be illistrated as below,
- summing up used bytes, max offnum(PageGetMaxOffsetNumber),  maximum free bytes for tuple data , and free bytes after
page vacuum through all scanned pages.
 
- summing up free bytes informed by FSM through all skipped  pages.
- Calculate mean tuple length from the overall used bytes and  sum of max offnums, and scanned pages.
- Guess tuple density in skipped pages using overall free bytes  from FSM and the mean tuple length calculated above.
- Finally, feed estimated number of the live tuples BEFORE  vacuum into vac_estimate_reltuples.

Of course this method affected by the imbalance of tuple LENGTH,
but it also seems to be kept within a few times of the number of
tuples.

for rows with invariable length, the test for head shows, where
"tups est" is pg_class.reltuples and "tups real" is count(*).

del% | pages | n_live_tup | tups est | tups real | est/real | bufs 
-----+-------+------------+----------+-----------+----------+------0.9 |  4425 |     100001 |   470626 |    100001 |
4.706| 3985
 
0.95 |  4425 |      50001 |   441196 |     50001 |    8.824 | 4206
0.99 |  4425 |     417670 |   417670 |     10001 |   41.763 | 4383

and with the patch
0.9 |  4425 |     106169 |   106169 |    100001 |    1.062 | 3985
0.95 |  4425 |      56373 |    56373 |     50001 |    1.127 | 4206
0.99 |  4425 |      10001 |    16535 |     10001 |    1.653 | 4383


What do you think about this?

=====

The attached files are:
 - vacuum_est_improve_20130614.patch: the patch for this proposal
 - vactest.sql: sql script to cause the sitiation
 - vactest.sh: test script to find the errors relating this patch.
 - test_result.txt: all of the test result for various deletion   ratio which the test script above yields.

regards,
-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index d6d20fd..1e581c1 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1280,7 +1280,8 @@ acquire_sample_rows(Relation onerel, int elevel,    *totalrows = vac_estimate_reltuples(onerel,
true,                                       totalblocks,                                        bs.m,
 
-                                        liverows);
+                                        liverows,
+                                        onerel->rd_rel->reltuples);    if (bs.m > 0)        *totaldeadrows =
floor((deadrows/ bs.m) * totalblocks + 0.5);    else
 
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 641c740..4bdf0c1 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -501,10 +501,10 @@ doublevac_estimate_reltuples(Relation relation, bool is_analyze,
BlockNumbertotal_pages,                       BlockNumber scanned_pages,
 
-                       double scanned_tuples)
+                       double scanned_tuples,
+                       double old_rel_tuples){    BlockNumber old_rel_pages = relation->rd_rel->relpages;
-    double        old_rel_tuples = relation->rd_rel->reltuples;    double        old_density;    double
new_density;   double        multiplier;
 
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 7e46f9e..80304a6 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -396,7 +396,11 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,    double        num_tuples,
 tups_vacuumed,                nkeep,
 
-                nunused;
+                nunused,
+                scanned_used_bytes,
+                scanned_possible_max_tups,
+                scanned_freespace,
+                total_freespace;    IndexBulkDeleteResult **indstats;    int            i;    PGRUsage    ru0;
@@ -414,6 +418,8 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,    empty_pages = vacuumed_pages = 0;
num_tuples= tups_vacuumed = nkeep = nunused = 0;
 
+    scanned_used_bytes = scanned_possible_max_tups = 0;
+    scanned_freespace = total_freespace = 0;    indstats = (IndexBulkDeleteResult **)        palloc0(nindexes *
sizeof(IndexBulkDeleteResult*));
 
@@ -487,6 +493,10 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,        bool        has_dead_tuples;
TransactionId visibility_cutoff_xid = InvalidTransactionId;
 
+        /* FSM slot for the previous block should be well mainteined */
+        if (blkno > 0)
+            total_freespace += GetRecordedFreeSpace(onerel, blkno - 1);
+        if (blkno == next_not_all_visible_block)        {            /* Time to advance next_not_all_visible_block */
@@ -692,6 +702,7 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,        hastup = false;
prev_dead_count= vacrelstats->num_dead_tuples;        maxoff = PageGetMaxOffsetNumber(page);
 
+        scanned_possible_max_tups += maxoff;        /*         * Note: If you change anything in the loop below, also
lookat
 
@@ -892,6 +903,8 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,        }        freespace =
PageGetHeapFreeSpace(page);
+        scanned_freespace += freespace;
+        scanned_used_bytes += PageGetItemBytes(page);        /* mark page all-visible, if appropriate */        if
(all_visible&& !all_visible_according_to_vm)
 
@@ -969,15 +982,45 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
RecordPageWithFreeSpace(onerel,blkno, freespace);    }
 
+    total_freespace += GetRecordedFreeSpace(onerel, nblocks - 1);
+    /* save stats for use later */    vacrelstats->scanned_tuples = num_tuples;    vacrelstats->tuples_deleted =
tups_vacuumed;
-    /* now we can compute the new value for pg_class.reltuples */
-    vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
-                                                         nblocks,
-                                                  vacrelstats->scanned_pages,
-                                                         num_tuples);
+    {
+        double prev_ntuples = onerel->rd_rel->reltuples;
+
+        if (tups_vacuumed > num_tuples &&
+            (double)vacrelstats->scanned_pages / nblocks > 0.1)
+        {
+            /*
+             * When certain amount of tuples are vacuumed, there might be
+             * unignorable imballance in the number of tuples between scanned
+             * pages and skipped ones. So we guess the number of tuples on the
+             * skipped pages assuming the uniformity of tuple size among the
+             * whole relation, and the correctness of FSM at this point.
+             */
+            BlockNumber scanned_pages = vacrelstats->scanned_pages;
+            BlockNumber skipped_pages = nblocks - scanned_pages;
+            double buffer_max_bytes =
+                (scanned_used_bytes + scanned_freespace) / scanned_pages;
+            double skipped_freespace = total_freespace - scanned_freespace;
+            double mean_tup_len =
+                buffer_max_bytes / (scanned_possible_max_tups / scanned_pages);
+            double skipped_tups_per_blk =
+                (buffer_max_bytes - skipped_freespace / skipped_pages) /
+                mean_tup_len;
+            prev_ntuples = skipped_tups_per_blk * nblocks;
+        }
+
+        /* now we can compute the new value for pg_class.reltuples */
+        vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
+                                            nblocks,
+                                            vacrelstats->scanned_pages,
+                                            num_tuples,
+                                            prev_ntuples);
+    }    /*     * Release any remaining pin on visibility map page.
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index d8dd8b0..aedc48e 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -147,7 +147,8 @@ extern void vac_close_indexes(int nindexes, Relation *Irel, LOCKMODE lockmode);extern double
vac_estimate_reltuples(Relationrelation, bool is_analyze,                       BlockNumber total_pages,
      BlockNumber scanned_pages,
 
-                       double scanned_tuples);
+                       double scanned_tuples,
+                        double old_rel_tuples);extern void vac_update_relstats(Relation relation,
BlockNumbernum_pages,                    double num_tuples,
 
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index abcf8a0..aa067d9 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -233,6 +233,13 @@ typedef PageHeaderData *PageHeader;    ((ItemId) (&((PageHeader) (page))->pd_linp[(offsetNumber) -
1]))/*
+ * PageGetItemBytes
+ *        Returns the size in bytes occupied by items data on the page
+ */
+#define PageGetItemBytes(page) \
+    (((PageHeader) (page))->pd_special - ((PageHeader) (page))->pd_upper)
+
+/* * PageGetContents *        To be used in case the page does not contain item pointers. *
drop table if exists t;
create table t (a int, b int);
insert into t (select a, (random() * 100000)::int from generate_series((select count(*) from t) + 1, 1000000) a);
update t set b = b + 1 where a <  (select count(*) from t) * 0.7;
vacuum t;
delete from t where a < (select count(*) from t) * 0.99;
vacuum t;
select c.relpages, s.n_live_tup, c.reltuples, (select count(*) from t) as tuples, reltuples::float / (select count(*)
fromt) as ratio  from pg_stat_user_tables s, pg_class c where s.relname = 't' and c.relname = 't'; 
#! /bin/sh

dbname="postgres"
function insert_result() {psql ${dbname} -f - > /dev/null <<EOF
insert into result select $1, $2, $3, c.relpages, s.n_live_tup, c.reltuples, (select count(*) from t), reltuples::float
/(select count(*) from t) from pg_stat_user_tables s, pg_class c where s.relname = 't' and c.relname = 't';
 
EOF
}

function vac_with_bufs() {psql ${dbname} -c "vacuum verbose t" |&  egrep "INFO: *\"t\": found " | sed -e 's/^.*
versionsin \([0-9]*\) .*$/\1/'
 
}

function update_result_bufs() {local test_no=$1local delratio=$2local vac_no=$3local bufs=$4
psql ${dbname} -c "update result set bufs=${bufs} where \"#\"=$test_no and \"del%\"=$delratio and \"##\"=$vac_no"
>/dev/null
}

function store_result() {local test_no=$1local delratio=$2scanned_bufs=`vac_with_bufs`insert_result $test_no $delratio
1update_result_bufs$test_no $delratio 1 $scanned_bufs
 
#    scanned_bufs=`vac_with_bufs`
#    insert_result $test_no $delratio 2
#    update_result_bufs $test_no $delratio 2 $scanned_bufs
}


function test1() {local delratio=$1
echo "test1 ratio = $delratio"psql ${dbname} -f - > /dev/null <<EOF
drop table if exists t;
create table t (a int, b int);
insert into t (select a, (random() * 100000)::int from generate_series((select count(*) from t) + 1, $nrows) a);
update t set b = b + 1 where a <  (select count(*) from t) * 0.7;
vacuum t;
delete from t where a < (select count(*) from t) * $delratio;
EOF
store_result 1 $delratio
}

function test2() {local delratio=$1
echo "test2 ratio = $delratio"psql ${dbname} -f - > /dev/null <<EOF
drop table if exists t;
create table t (a int, b text);
insert into t (select a, 'abcdefg' from generate_series((select count(*) from t) + 1, $nrows) a);
update t set b = repeat('abcdefghij', 250) where a <  (select count(*) from t) * 0.7;
vacuum t;
delete from t where a < (select count(*) from t) * $delratio;
EOFstore_result 2 $delratio
}

psql ${dbname} -f - > /dev/null <<EOF
drop table if exists result;
create table result ("#" int, "del%" float, "##" int, pages int, n_live_tup int, "tups est" int, "tups real" int,
"est/real"numeric(10, 3), bufs int default 0);
 
EOF
nrows=1000000
test1 0.1
test1 0.2
test1 0.3
test1 0.4
test1 0.5
test1 0.6
test1 0.7
test1 0.8
test1 0.9
test1 0.95
test1 0.99

nrows=100000
test2 0.1
test2 0.2
test2 0.3
test2 0.4
test2 0.5
test2 0.6
test2 0.7
test2 0.8
test2 0.9
test2 0.95
test2 0.99

psql ${dbname} -c 'select * from result order by "#", "del%", "##"'
==== BEFORE patch. # = 1 is fixed sized rows, # = 2 is variable sized rows
# | del% | pages | n_live_tup | tups est | tups real | est/real | bufs 
---+------+-------+------------+----------+-----------+----------+------1 |  0.1 |  7523 |     900001 |   941326 |
900001|    1.046 |  4441 |  0.2 |  7523 |     800001 |   882465 |    800001 |    1.103 |  8861 |  0.3 |  7523 |
700001|   823697 |    700001 |    1.177 | 13291 |  0.4 |  7523 |     600001 |   764836 |    600001 |    1.275 | 17711 |
0.5 |  7523 |     500001 |   706068 |    500001 |    1.412 | 22141 |  0.6 |  7523 |     400001 |   647206 |    400001 |
  1.618 | 26561 |  0.7 |  4425 |     300001 |   588239 |    300001 |    1.961 | 30991 |  0.8 |  4425 |     200001 |
529394|    200001 |    2.647 | 35421 |  0.9 |  4425 |     100001 |   470626 |    100001 |    4.706 | 39851 | 0.95 |
4425|      50001 |   441196 |     50001 |    8.824 | 42061 | 0.99 |  4425 |     417670 |   417670 |     10001 |
41.763| 43832 |  0.1 |  1263 |     100000 |    91902 |     90001 |    1.021 |  1042 |  0.2 |  1263 |     100000 |
83737|     80001 |    1.047 |  2072 |  0.3 |  1263 |     100000 |    75573 |     70001 |    1.080 |  3102 |  0.4 |
1263|     100000 |    67409 |     60001 |    1.123 |  4132 |  0.5 |  1263 |     100000 |    59245 |     50001 |
1.185|  5162 |  0.6 |  1263 |     100000 |    51099 |     40001 |    1.277 |  6202 |  0.7 |   541 |     100000 |
42855|     30001 |    1.428 |  7232 |  0.8 |   541 |      38607 |    38607 |     20001 |    1.930 |  7782 |  0.9 |
541|     100000 |    34321 |     10001 |    3.432 |  8322 | 0.95 |   541 |     100000 |    34930 |      5001 |    6.985
| 8852 | 0.99 |   541 |      30930 |    30930 |      1001 |   30.899 |  885
 


==== AFTER the patch# | del% | pages | n_live_tup | tups est | tups real | est/real | bufs 
---+------+-------+------------+----------+-----------+----------+------1 |  0.1 |  7523 |     900001 |   941326 |
900001|    1.046 |  4441 |  0.2 |  7523 |     800001 |   803449 |    800001 |    1.004 |  8861 |  0.3 |  7523 |
700001|   703835 |    700001 |    1.005 | 13291 |  0.4 |  7523 |     600001 |   604220 |    600001 |    1.007 | 17711 |
0.5 |  7523 |     500001 |   504606 |    500001 |    1.009 | 22141 |  0.6 |  7523 |     400001 |   404992 |    400001 |
  1.012 | 26561 |  0.7 |  4425 |     300001 |   305340 |    300001 |    1.018 | 30991 |  0.8 |  4425 |     205758 |
205758|    200001 |    1.029 | 35421 |  0.9 |  4425 |     106169 |   106169 |    100001 |    1.062 | 39851 | 0.95 |
4425|      56373 |    56373 |     50001 |    1.127 | 42061 | 0.99 |  4425 |      10001 |    16535 |     10001 |
1.653| 43832 |  0.1 |  1263 |     100000 |    91902 |     90001 |    1.021 |  1042 |  0.2 |  1263 |     100000 |
67589|     80001 |    0.845 |  2072 |  0.3 |  1263 |     100000 |    57522 |     70001 |    0.822 |  3102 |  0.4 |
1263|      60001 |    47489 |     60001 |    0.791 |  4132 |  0.5 |  1263 |     100000 |    37469 |     50001 |
0.749|  5162 |  0.6 |  1263 |      40001 |    27455 |     40001 |    0.686 |  6202 |  0.7 |   541 |     100000 |
17432|     30001 |    0.581 |  7232 |  0.8 |   541 |      20001 |    12894 |     20001 |    0.645 |  7782 |  0.9 |
541|       7570 |     7570 |     10001 |    0.757 |  8322 | 0.95 |   541 |     100000 |     6605 |      5001 |    1.321
| 8852 | 0.99 |   541 |     100000 |     2598 |      1001 |    2.595 |  885
 
(44 rows)


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Cédric Villemain
Дата:
Сообщение: Re: [PATCH] Add transforms feature
Следующее
От: Kyotaro HORIGUCHI
Дата:
Сообщение: Add visibility map information to pg_freespace.