Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuplesinaccurate.

Поиск
Список
Период
Сортировка
От David Gould
Тема Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuplesinaccurate.
Дата
Msg-id 20180302140037.00af305d@engels
обсуждение исходный текст
Ответ на Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.  (Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>)
Ответы Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Fri, 2 Mar 2018 18:47:44 +0300
Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> wrote:

> The calculation I made for the first step applies to the next steps too, 
> with minor differences. So, the estimate increases at each step. Just 
> out of interest, I plotted the reltuples for 60 steps, and it doesn't 
> look like it's going to converge anytime soon (see attached).
> Looking at the formula, this overshoot term is created when we multiply 
> the old density by the new number of pages. I'm not sure how to fix 
> this. I think we could average the number of tuples, not the densities. 
> The attached patch demonstrates what I mean.

I'm confused at this point, I provided a patch that addresses this and a
test case. We seem to be discussing everything as if we first noticed the
issue. Have you reviewed the patch and and attached analysis and tested it?
Please commment on that?

Thanks.

Also, here is a datapoint that I found just this morning on a clients
production system:

INFO:  "staging_xyz": scanned 30000 of   pages, containing 63592 live rows and 964346 dead rows;
30000 rows in sample, 1959918155 estimated total rows

# select (50000953.0/30000*63592)::int as nrows;
   nrows          
-----------
 105988686

This tables reltuples is 18 times the actual row count. It will never converge
because with 50000953 pages analyze can only adjust reltuples by 0.0006 each time.

It will also almost never get vacuumed because the autovacuum threshold of
0.2 * 1959918155 = 391983631 about 3.7 times larger than the actual row count.

The submitted patch is makes analyze effective in setting reltuples to within
a few percent of the count(*) value.

-dg


-- 
David Gould                                   daveg@sonic.net
If simplicity worked, the world would be overrun with insects.


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: heap_lock_updated_tuple_rec can leak a buffer refcount
Следующее
От: Magnus Hagander
Дата:
Сообщение: Re: Online enabling of checksums