Обсуждение: wrong count estimation in query plan

Поиск
Список
Период
Сортировка

wrong count estimation in query plan

От
Rural Hunter
Дата:
Hi,

I have a table with very frequent updates while inserts and deletes are
very few. Recently I noticed the count estimation of the table in query
plan are far wrong. This is what I found:

# select count(*) from users;
  count
-------
   1116
(1 row)

testdb=# explain select * from users;
                           QUERY PLAN
---------------------------------------------------------------
  Seq Scan on users  (cost=0.00..34456.68 rows=73468 width=625)
(1 row)

testdb=# vacuum analyze users;
VACUUM
testdb=# explain select * from users;
                           QUERY PLAN
---------------------------------------------------------------
  Seq Scan on users  (cost=0.00..34705.32 rows=97332 width=624)
(1 row)

testdb=# analyze users;
ANALYZE
testdb=# explain select * from users;
                           QUERY PLAN
---------------------------------------------------------------
  Seq Scan on users  (cost=0.00..33857.43 rows=11743 width=625)
(1 row)

testdb=# analyze users;
ANALYZE
testdb=# explain select * from users;
                           QUERY PLAN
--------------------------------------------------------------
  Seq Scan on users  (cost=0.00..33765.88 rows=2288 width=648)
(1 row)

testdb=# analyze users;
\ANALYZE
testdb=# explain select * from users;
                           QUERY PLAN
--------------------------------------------------------------
  Seq Scan on users  (cost=0.00..33778.94 rows=1294 width=625)
(1 row)

testdb=# analyze users;
ANALYZE
testdb=# explain select * from users;
                           QUERY PLAN
--------------------------------------------------------------
  Seq Scan on users  (cost=0.00..33777.19 rows=1119 width=624)
(1 row)

My question is, why I had to run analyze so many times to make the
estimation be close with the actual count?


Re: wrong count estimation in query plan

От
Tom Lane
Дата:
Rural Hunter <ruralhunter@gmail.com> writes:
> My question is, why I had to run analyze so many times to make the
> estimation be close with the actual count?

If ANALYZE doesn't scan the entire table then it updates the rowcount
estimate using a moving-average approach, rather than assuming that
the blocks it did look at are necessarily completely representative.
That causes the estimate to converge over multiple passes.  Since
it took so many cycles to get close to reality, we can conclude that
the table is several times larger than the number of blocks ANALYZE
is willing to sample ... which, for a table with only 1000 rows in
it, says you've got a bad table-bloat problem.  VACUUM FULL or CLUSTER
might be indicated.

            regards, tom lane


Re: wrong count estimation in query plan

От
Rural Hunter
Дата:
在 2014/10/30 22:25, Tom Lane 写道:
> If ANALYZE doesn't scan the entire table then it updates the rowcount
> estimate using a moving-average approach, rather than assuming that
> the blocks it did look at are necessarily completely representative.
> That causes the estimate to converge over multiple passes. Since it
> took so many cycles to get close to reality, we can conclude that the
> table is several times larger than the number of blocks ANALYZE is
> willing to sample ... which, for a table with only 1000 rows in it,
> says you've got a bad table-bloat problem. VACUUM FULL or CLUSTER
> might be indicated. regards, tom lane
Got it. A vacuum solved the problem. Thanks.