Re: Correlation in cost_index()

Поиск
Список
Период
Сортировка
От Sean Chittenden
Тема Re: Correlation in cost_index()
Дата
Msg-id 20030807204419.GJ94710@perrin.int.nxad.com
обсуждение исходный текст
Ответ на Re: Correlation in cost_index()  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Correlation in cost_index()  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Correlation in cost_index()  (Manfred Koizar <mkoi-pg@aon.at>)
Список pgsql-hackers
> > AFAICS (part of) the real problem is in costsize.c:cost_index() where
> > IO_cost is calculated from min_IO_cost, pages_fetched,
> > random_page_cost, and indexCorrelation.  The current implementation
> > uses indexCorrelation^2 to interpolate between min_IO_cost and
> > max_IO_cost, which IMHO gives results that are too close to
> > max_IO_cost.
> 
> The indexCorrelation^2 algorithm was only a quick hack with no theory
> behind it :-(.  I've wanted to find some better method to put in there,
> but have not had any time to research the problem.

Could we "quick hack" it to a geometric mean instead since a mean
seemed to yield better results than indexCorrelation^2?

> > As nobody knows how each of these proposals performs in real life
> > under different conditions, I suggest to leave the current
> > implementation in, add all three algorithms, and supply a GUC variable
> > to select a cost function.
> 
> I don't think it's really a good idea to expect users to pick among
> multiple cost functions that *all* have no guiding theory behind them.
> I'd prefer to see us find a better cost function and use it.  Has anyone
> trawled the database literature on the subject?

Hrm, after an hour of searching and reading, I think one of the better
papers on the subject can be found here:

http://www.cs.ust.hk/faculty/dimitris/PAPERS/TKDE-NNmodels.pdf

Page 13, figure 3-12 is the ticket you were looking for Tom.  It's an
interesting read with a pretty good analysis and conclusion.  The
author notes that his formula begins to fall apart when the number of
dimensions reaches 10 and suggests the use of a high dimension
random cost estimate algo, but that the use of those comes at great
expense to the CPU (which is inline with a few other papers that I
read).  The idea of precomputing values piqued my interest and I
thought was very clever, albeit space intensive.  *shrug*

-sc


-- 
Sean Chittenden


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Marc G. Fournier"
Дата:
Сообщение: ignore this test
Следующее
От: Tom Lane
Дата:
Сообщение: Re: build on unixware 713