Logarithmic data frequency distributions and the query planner

Поиск
Список
Период
Сортировка
От Jerry Gamache
Тема Logarithmic data frequency distributions and the query planner
Дата
Msg-id 4C34E998.3020000@idilia.com
обсуждение исходный текст
Ответы Re: Logarithmic data frequency distributions and the query planner
Список pgsql-performance
On 8.1, I have a very interesting database where the distributions of
some values in a multi-million rows table is logarithmic (i.e. the most
frequent value is an order of magnitude more frequent than the next
ones). If I analyze the table, the statistics become extremely skewed
towards the most frequent values and this prevents the planner from
giving any good results on queries that do not target these entries.

In a recent case, the planner estimated that the number of returned rows
would be ~13% of the table size and from this bad assumption generated a
very slow conservative plan that executed in days. If I set the
statistics at zero for that table, the planner uses a hardcoded ratio
(seems like 0.5%) for the number of returned rows and this helps
generating a plan that executes in 3 minutes (still sub-optimal, but not
as bad).

Generating partial index for the less frequent cases helps, but this
solution is not flexible enough for our needs as the number of complex
queries grow. We are mostly left with pre-generating a lot of temporary
tables whenever the planner over-evaluates the number of rows generated
by a subquery (query execution was trimmed from 3 minutes to 30 seconds
using this technique) or using the OFFSET 0 tweak, but it would be nice
if the planner could handle this on its own.

Am I missing something obvious? Setting the statistics for this table to
zero seems awkward even if it gives good results.
Jerry.



В списке pgsql-performance по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: big data - slow select (speech search)
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Logarithmic data frequency distributions and the query planner