Re: Improving N-Distinct estimation by ANALYZE

Поиск
Список
Период
Сортировка
От Jim C. Nasby
Тема Re: Improving N-Distinct estimation by ANALYZE
Дата
Msg-id 20060106160906.GH3902@pervasive.com
обсуждение исходный текст
Ответ на Re: Improving N-Distinct estimation by ANALYZE  (Greg Stark <gsstark@mit.edu>)
Ответы Re: Improving N-Distinct estimation by ANALYZE  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Improving N-Distinct estimation by ANALYZE  (Greg Stark <gsstark@mit.edu>)
Список pgsql-hackers
On Fri, Jan 06, 2006 at 01:24:41AM -0500, Greg Stark wrote:
> > 5% based on block-based sampling is reasonable; that means a straight 5% of 
> > the on-disk size of the table, so 5gb for a 100gb table.  With random-row 
> > sampling, that would require as much as 25% of the table, making it easier 
> > to just scan the whole thing.
> 
> Postgres's current sample sizes are clearly geared towards the histograms
> where they are entirely realistic. All of the distinct estimates are clearly
> just ad hoc attempts based on the existing sampling. 
> 
> Is a mechanism that is only 5x faster than reading the whole table (assuming
> random_page_cost of 4) and is off by more than a factor of three 10% of the
> time really worth it?

Before we start debating merits of proposals based on random reads, can
someone confirm that the sampling code actually does read randomly? I
looked at it yesterday; there is a comment that states that blocks to be
scanned are passed to the analyze function in physical order, and AFAICT
the function that chooses blocks does so based strictly on applying a
probability function to block numbers as it increments a counter. It
seems that any reading is actually sequential and not random, which
makes all the random_page_cost hand-waving null and void.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stefan Kaltenbrunner
Дата:
Сообщение: could not access status of transaction 0
Следующее
От: Tom Lane
Дата:
Сообщение: Re: could not access status of transaction 0