Re: COUNT(*) and index-only scans

Поиск
Список
Период
Сортировка
От Garick Hamlin
Тема Re: COUNT(*) and index-only scans
Дата
Msg-id 20111012183346.GA17896@isc.upenn.edu
обсуждение исходный текст
Ответ на Re: COUNT(*) and index-only scans  (Greg Stark <stark@mit.edu>)
Список pgsql-hackers
On Wed, Oct 12, 2011 at 03:16:54PM +0100, Greg Stark wrote:
> On Wed, Oct 12, 2011 at 2:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I think it's overkill, and possibly unpleasantly unstable as well.
> > For the initial attack on this we should just have VACUUM and ANALYZE
> > count the number of all-visible blocks and store that in pg_class along
> > with the tuple-count statistics.  There's no reason to think that this
> > will be any worse than the way we deal with dead tuple counts, for
> > instance.
> 
> So I have a theory.
> 
> Assuming you're in a steady-state situation the amount of all-visible
> blocks will fluctuate from a high just after vacuum to a low just
> before the next vacuum. There are other ways a block can be marked
> all-visible but for the most part I would expect the fraction to go
> steadily down until vacuum comes along and cleans things up.
> 
> So if vacuum tracked the fraction of blocks marked all-visible
> *before* it processed them and the fraction it marked all-visible
> after processing we have an upper and lower bound. If we knew how long
> it's been since vacuum we could interpolate between those, or we could
> just take the mean, or we could take the lower bound as a conservative
> estimate.
> 
> > What I suggest as a first cut for that is: simply derate the visibility fraction as the fraction
> > of the table expected to be scanned gets smaller.
> 
> I think there's a statistically more rigorous way of accomplishing the
> same thing. If you treat the pages we estimate we're going to read as
> a random sample of the population of pages then your expected value is
> the fraction of the overall population that is all-visible but your
> 95th percentile confidence interval will be, uh, a simple formula we
> can compute but I don't recall off-hand.

Incidentally, I had a similar idea at PGCon relating to planning...

My idea was to compute not just the cost but the sensitivity
of the cost an estimate for each plan.   The sensitivity is the 
derivate of the cost.  So, if the total cost was n^2 the sensitivity 
would be 2n.

If you picked a tolerance (like 2 standard deviations) of the 
observed distribution.  You could compare the expected cost and the 
expected 'unlucky cost' for plans.  

(Basically, this is parametric error propagation) 

I know very little about the planner...
I don't know how how often it would lead to picking a better
plan (it might not be worth the cost to compute), but it seemed
like an interesting approach to me.

Garick



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [BUGS] *.sql contrib files contain unresolvable MODULE_PATHNAME
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: [BUGS] *.sql contrib files contain unresolvable MODULE_PATHNAME