Re: ANALYZE sampling is too good

Поиск

Список

Период

Сортировка

От	Mark Kirkwood
Тема	Re: ANALYZE sampling is too good
Дата	9 декабря 2013 г. 01:20:07
Msg-id	52A4F084.3070200@catalyst.net.nz обсуждение исходный текст
Ответ на	Re: ANALYZE sampling is too good (Josh Berkus <josh@agliodbs.com>)
Список	pgsql-hackers

Дерево обсуждения

On 09/12/13 08:03, Josh Berkus wrote:
>
> So there's a set of math designed to calculate for the skew introduced
> by reading *all* of the rows in each block.  That's what I meant by
> "block-based sampling"; you read, say, 400 pages, you compile statistics
> on *all* of the rows on those pages, you apply some algorithms to adjust
> for groupings of rows based on how grouped they are.  And you have a
> pretty good estimate of how grouped they are, because you just looked a
> complete sets of rows on a bunch of nonadjacent pages.
>
> Obviously, you need to look at more rows than you would with a
> pure-random sample.  Like I said, the 80%+ accurate point in the papers
> seemed to be at a 5% sample.  However, since those rows come from the
> same pages, the cost of looking at more rows is quite small, compared to
> the cost of looking at 64 times as many disk pages.
>
> My ACM subscription has lapsed, though; someone with a current ACM
> subscription could search for this; there are several published papers,
> with math and pseudocode.
>

This makes more sense! Unfortunately you had confused everyone (well me 
at least) by decreeing that we needed block based sampling - when we 
started using that in 2004 - there is more than one way to sample blocks 
it seems :-)

What you are actually suggesting is a way to do block based sampling 
that requires reading fewer blocks, and that is a great idea! Of course 
as Greg has just suggested - the details are gonna be the clincher. Can 
you supply authors or titles of papers?

Also with reference to Greenplum - their use of postgres is fairly 
special case, and an approach that works well for them is not always 
suitable for more general purpose use.

Regards

Mark

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Greg Stark
Дата: 09 декабря 2013 г., 00:16:01
Сообщение: Re: ANALYZE sampling is too good

Следующее

От: Jim Nasby
Дата: 09 декабря 2013 г., 02:38:09
Сообщение: Re: Why we are going to have to go DirectIO

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: ANALYZE sampling is too good

Предыдущее

Следующее