Re: Multi-pass planner

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: Multi-pass planner
Дата
Msg-id CAMkU=1y13w5oYaJ_mwJ-dPpZAugOUTWfffgAKRa=5Po5gbznCA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Multi-pass planner  (Greg Stark <stark@mit.edu>)
Ответы Re: Multi-pass planner  (Claudio Freire <klaussfreire@gmail.com>)
Список pgsql-hackers
On Wed, Apr 3, 2013 at 6:40 PM, Greg Stark <stark@mit.edu> wrote:

On Fri, Aug 21, 2009 at 6:54 PM, decibel <decibel@decibel.org> wrote:
Would it? Risk seems like it would just be something along the lines of the high-end of our estimate. I don't think confidence should be that hard either. IE: hard-coded guesses have a low confidence. Something pulled right out of most_common_vals has a high confidence.

I wouldn't be so sure of that.  I've run into cases where all of the frequencies pulled out of most_common_vals are off by orders of magnitude.  The problem is that if ANALYZE only samples 1/1000th of the table, and it sees a value twice, it assumes the value is present 2000 times in the table, even when it was only in the table twice.  Now, for any given value that occurs twice in the table, it is very unlikely for both of those to end up in the sample. But when you have millions of distinct values which each occur twice (or some low number of time), it is a near certainty that several of them are going to end with both instances in the sample.  Those few ones that get "lucky" are of course going to end up in the most_common_vals list.   

Since the hashjoin estimates cost depending on the frequency of the most common value, having this be systematically off by a factor of 1000 is rather unfortunate.

The problem here is that the sample size which is adequate for getting a good estimate of the histograms (which is what controls the sample size currently) is not adequate for getting a good estimate of most_common_vals.  Cranking up the statistics_target would give a better estimates of most_common_vals, but at the expense of having a needlessly large histogram, which slows down planning.  There is currently no knob to crank up the sample size for the sake of most common values, but then prune the histogram back down for storage.

Cheers,

Jeff

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Daniel Farina
Дата:
Сообщение: Re: confusing message about archive failures
Следующее
От: Claudio Freire
Дата:
Сообщение: Re: Multi-pass planner