Re: default_statistics_target WAS: max_wal_senders must die

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: default_statistics_target WAS: max_wal_senders must die
Дата
Msg-id AANLkTimHJEE-0guj0j8-a7S_ioPWaMNKUQ9Y902K2CXJ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: default_statistics_target WAS: max_wal_senders must die  (Greg Stark <gsstark@mit.edu>)
Список pgsql-hackers
On Wed, Oct 20, 2010 at 7:13 PM, Greg Stark <gsstark@mit.edu> wrote:
> On Wed, Oct 20, 2010 at 3:54 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Yes, I think a percentage of the table is going to break down either
>> at the high end or the low end.  Hand-waving (but based on
>> experience), for a 1000 row table a statistics target of 10 is
>> probably approximately right and 100 is too much and 1 is too little.
>> But for a 1,000,000 row table 10,000 is probably too much and even
>> 1,000 is pushing it.  So using a constant percentage of table rows
>> doesn't feel right.  I had a thought today that it might make sense to
>> use an exponential curve, like min(2 * N^(1/3), 10).  I can't really
>> justify that mathematically, but that doesn't mean it won't work well
>> in practice.
>
> Well we can analyze it but as you said later, it all depends on what
> queries you're running. If we want to aim for the same confidence
> interval at all times, ie that the estimated frequency is accurate to
> within +/- x% 95% of the time then:
>
> If we're querying ranges a..b which represent a constant percentage of
> the table we need a fixed number of buckets and a sample size that
> varies very little with respect to the size of the table (effectively
> constant).
>
> If we're querying ranges a..b which are constant sized and therefore
> represent a smaller percentage of the table as it grows then we need a
> number of buckets that's proportional to the size of the table. The
> sample size is proportional to the number of buckets (ie, it's a
> constant sized sample per bucket).
>
> If we're querying for a specific value which isn't one of the most
> common values then I'm not clear how to characterize the accuracy or
> precision of our current estimates let alone how they would vary if we
> changed our sample sizes.

I think that sums it up pretty well.  There's no one right formula.  I
think this problem needs an empirical approach rather than a
statistical analysis.  We know that it's impractical for the stats
target to be linear in the table size.  We also know that constant
values are excessive for small tables and sometimes inadequate for
large one.  Therefore, we should pick something that grows, but
sublinearly.  Discuss.  :-)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Josh Berkus
Дата:
Сообщение: Re: default_statistics_target WAS: max_wal_senders must die
Следующее
От: Nathan Boley
Дата:
Сообщение: Re: default_statistics_target WAS: max_wal_senders must die