Re: Improving N-Distinct estimation by ANALYZE

Поиск

Список

Период

Сортировка

От	Jim C. Nasby
Тема	Re: Improving N-Distinct estimation by ANALYZE
Дата	6 января 2006 г. 12:09:11
Msg-id	20060106160906.GH3902@pervasive.com обсуждение исходный текст
Ответ на	Re: Improving N-Distinct estimation by ANALYZE (Greg Stark <gsstark@mit.edu>)
Ответы	Re: Improving N-Distinct estimation by ANALYZE Re: Improving N-Distinct estimation by ANALYZE
Список	pgsql-hackers

Дерево обсуждения

On Fri, Jan 06, 2006 at 01:24:41AM -0500, Greg Stark wrote:
> > 5% based on block-based sampling is reasonable; that means a straight 5% of 
> > the on-disk size of the table, so 5gb for a 100gb table.  With random-row 
> > sampling, that would require as much as 25% of the table, making it easier 
> > to just scan the whole thing.
> 
> Postgres's current sample sizes are clearly geared towards the histograms
> where they are entirely realistic. All of the distinct estimates are clearly
> just ad hoc attempts based on the existing sampling. 
> 
> Is a mechanism that is only 5x faster than reading the whole table (assuming
> random_page_cost of 4) and is off by more than a factor of three 10% of the
> time really worth it?

Before we start debating merits of proposals based on random reads, can
someone confirm that the sampling code actually does read randomly? I
looked at it yesterday; there is a comment that states that blocks to be
scanned are passed to the analyze function in physical order, and AFAICT
the function that chooses blocks does so based strictly on applying a
probability function to block numbers as it increments a counter. It
seems that any reading is actually sequential and not random, which
makes all the random_page_cost hand-waving null and void.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Improving N-Distinct estimation by ANALYZE