Query runtime dependent on ANALYZE run

Поиск
Список
Период
Сортировка
От Viktor Rosenfeld
Тема Query runtime dependent on ANALYZE run
Дата
Msg-id 20120606194025.GA7266@kenny.fritz.box
обсуждение исходный текст
Список pgsql-general
Hi,

(I've sent a copy of this message to pgus-general in error. Sorry about
that.)

I've noticed that the selection of the executed query plan (and
therefore query runtime) is dependent on the statistics generated by
an ANALYZE run. As an demonstration, I chose the best runtime of 5
consecutive runs of the query linked below, regenerated the statistics
for the column node_annotation.value and re-ran the query. This
experiment was repeated a hundred times each for the statistics
targets 10, 100 (default), 1000, 10000. I've used PostgreSQL 9.1.3.

Query: http://www.informatik.hu-berlin.de/~rosenfel/analyze/query.sql
Schema: http://www.informatik.hu-berlin.de/~rosenfel/analyze/schema.pdf

The next table shows for each statistics target the number of distinct
plans generated in the experiment, how often the most common plan was
generated, the runtime (ms) of the best and worst non-unique plan, and
the number of timeouts where the query did not finish within 60 seconds.

| Statistics | # plans | most common plan | Best | Worst | Timeouts |
|------------+---------+------------------+------+-------+----------|
|         10 |      37 |               60 | 1876 |  2180 |        0 |
|        100 |      90 |                4 | 2225 |  7927 |       14 |
|       1000 |      75 |                6 | 2214 |  6329 |       22 |
|      10000 |       6 |               85 | 2195 |  2900 |        3 |

The distribution for each statistics target is linked below:

Distribution: http://www.informatik.hu-berlin.de/~rosenfel/analyze/histogram.pdf

As one can see, using the default value of 100 (and also 1000) there is
a considerable spread in the runtime of the query. The best plan is
chosen most often, but only about a quarter of the time and there are
also many timeouts. The best results can be achieved with a statistics
target of 10: The most stable query plan is the second best and there
are no timeouts. Using a statistics target of 10000 generates the most
stable plan selection, i.e. the same plan is chosen most often, but it
is almost a second slower than the best plan.

I would like to know how I can mitigate against these random results.

Cheers,
Viktor

В списке pgsql-general по дате отправления:

Предыдущее
От: Guillaume Lelarge
Дата:
Сообщение: Re: pg_database_size differs from df -s
Следующее
От: Frank Lanitz
Дата:
Сообщение: Re: pg_database_size differs from df -s