Re: benchmarking the query planner

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: benchmarking the query planner
Дата
Msg-id 5301.1229092509@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: benchmarking the query planner  ("Robert Haas" <robertmhaas@gmail.com>)
Ответы Re: benchmarking the query planner  ("Greg Stark" <stark@enterprisedb.com>)
Re: benchmarking the query planner  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers
"Robert Haas" <robertmhaas@gmail.com> writes:
> On Fri, Dec 12, 2008 at 4:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> The existing sampling mechanism is tied to solid statistics.
>> 
>> Sounds great, but its not true. The sample size is not linked to data
>> volume, so how can it possibly give a consistent confidence range?

> It is a pretty well-known mathematical fact that for something like an
> opinion poll your margin of error does not depend on the size of the
> population but only on the size of your sample.

Right.  The solid math that Greg referred to concerns how big a sample
we need in order to have good confidence in the histogram results.
It doesn't speak to whether we get good results for ndistinct (or for
most-common-values, though in practice that seems to work fairly well).

AFAICS, marginal enlargements in the sample size aren't going to help
much for ndistinct --- you really need to look at most or all of the
table to be guaranteed anything about that.

But having said that, I have wondered whether we should consider
allowing the sample to grow to fill maintenance_work_mem, rather than
making it a predetermined number of rows.  One difficulty is that the
random-sampling code assumes it has a predetermined rowcount target;
I haven't looked at whether that'd be easy to change or whether we'd
need a whole new sampling algorithm.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Robert Haas"
Дата:
Сообщение: Re: WIP: default values for function parameters
Следующее
От: "Pavel Stehule"
Дата:
Сообщение: Re: WIP: default values for function parameters