Обсуждение: [BUGS] BUG #14858: Inaccurate estimations after bulk delete and bulk insert

Поиск
Список
Период
Сортировка

[BUGS] BUG #14858: Inaccurate estimations after bulk delete and bulk insert

От
stanislaw.skonieczny@gmail.com
Дата:
The following bug has been logged on the website:

Bug reference:      14858
Logged by:          Stanisław Skonieczny
Email address:      stanislaw.skonieczny@gmail.com
PostgreSQL version: 9.6.5
Operating system:   Ubuntu
Description:

In our product we have found that PG is choosing full scan on a big table
instead using indexes.
This was caused caused by very inaccurate estimations of total number of
rows in the table.
Estimations are few orders of magnitude lower than exact number rof
tuples.

We have extracted what our product does to simple script that illustrates
this behaviour. The scenario is:
1. load 1B tuples
2. delete most of the tuples (~1B)
3. autoanalyze is beeing run
4. load next 1B of tuples
5. autoanalyze is beeing run
In product numbers can differ and step can be interlaced with many other
queries.
Steps 1,2,4 can by run either in single transaction or span multiple
transactions.

It turns out that after step 5 planner estimates number of rows to 2.7M
instead of 1B. That is 370 times too low.
Output of analyze at step 5 is: INFO:  analyzing "public.test" INFO:  "test": scanned 30000 of 10810615 pages,
containing2807827 live 
rows and 2742296 dead rows; 30000 rows in sample, 2807827 estimated total
rows
In product there is much more pages, because real tuples are much longer
than test ones.

If analyze (at step 5) just simply extrapolated number of rows found it
would produce acceptable answer.
But it is being mislead by information stored by previous analyze (at step
3) that density of tuples is ~0, so it assumes
that all pages it did not see have ~0 tuples (but in real half of them ate
empty, half of them are full).

If analyze (at step 5) had taken information on tuples density not from
reltuples, but from n_live_tuples
it would also produce acceptable answer,
because n_live_tuples is incremented with ever insert and after step 4 it
correctly shows 1B live tuples.

Unfortunately, after step 5 there is no easy way to correct estimation. Next
analyze will not help much.
Each next analyze will just slightly raise estimations, because density of
tuples is considered as low.
We have checked that analyze must be run more that 200 times so that
estimations reach ~50% of reality.

What is interesting, if load is higher and there if 2B rows instead of 1B in
the scenario above,
estimations after step 5 shows very similar number of tuples, but problem is
worse,
because they are 2 times worse when compared to real number of tuples.
And analyze must be run much more times to raise it to reasonable level.

We could raise default_statistics_target, but we would have to raise it a
lot.
To reach a level when estimation is 50% off, we would have to make analyze
to read half of table (500M rows),
which would kill the performance.

We could run analyze manually in a loop until local minimum/maximum of
estimation is found, so that we can quickly get
to the place where estimations are good enough, but we believe there should
be some better solution on the DB level.



--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14858: Inaccurate estimations after bulk delete andbulk insert

От
Alvaro Herrera
Дата:
stanislaw.skonieczny@gmail.com wrote:

> In our product we have found that PG is choosing full scan on a big table
> instead using indexes.
> This was caused caused by very inaccurate estimations of total number of
> rows in the table.

Stanislaw emailed the -owner address with the attached email.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Bug reference:      14858

I attach sample script to resproduce the problem and it's output.

Stanisław Skonieczny

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Вложения