От: Jim C. Nasby
Тема: Re: Sort and index
Дата: ,
Msg-id: 20050511211516.GX31103@decibel.org
(см: обсуждение, исходный текст)
Ответ на: Re: Sort and index  (Manfred Koizar)
Ответы: Re: Sort and index  (Manfred Koizar)
Список: pgsql-performance

Скрыть дерево обсуждения

Sort and index  (Andrei Gaspar, )
 Re: Sort and index  ("Dave Held", )
  Re: Sort and index  (Andrei Gaspar, )
  Re: Sort and index  (Michael Fuhr, )
   Re: Sort and index  (Andrei Gaspar, )
  Re: Sort and index  ("Jim C. Nasby", )
   Re: Sort and index  (Tom Lane, )
    Re: Sort and index  ("Jim C. Nasby", )
     Re: Sort and index  ("Jim C. Nasby", )
      Re: Sort and index  (Tom Lane, )
       Re: Sort and index  ("Jim C. Nasby", )
        Re: Sort and index  (Tom Lane, )
         Re: Sort and index  ("Jim C. Nasby", )
          Re: Sort and index  (Manfred Koizar, )
           Re: Sort and index  ("Jim C. Nasby", )
            Re: Sort and index  (Manfred Koizar, )
             Re: Sort and index  ("Jim C. Nasby", )

First, I've got some updated numbers up at
http://stats.distributed.net/~decibel/

timing2.log shows that the planner actually under-estimates an index
scan by several orders of magnitude. Granted, random_page_cost is set to
an unrealistic 1.1 (otherwise I can't force the index scan), but that
alone isn't enough to explain the difference.

On Wed, May 11, 2005 at 05:59:10PM +0200, Manfred Koizar wrote:
> On Sun, 24 Apr 2005 17:01:46 -0500, "Jim C. Nasby" <>
> wrote:
> >> >> Feel free to propose better cost equations.
>
> I did.  More than once.
>
> >estimated index scan cost for (project_id, id, date) is
> >0.00..100117429.34 while the estimate for work_units is
> >0.00..103168408.62; almost no difference,
>
> ~3%
>
> > even though project_id correlation is .657
>
> This is divided by the number of index columns, so the index correlation
> is estimated to be 0.219.

That seems like a pretty bad assumption to make.

Is there any eta on having statistics for multi-column indexes?

> >you'll see that the cost of the index scan is way overestimated. Looking
> >at the code, the runcost is calculated as
> >
> >    run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);
> >
> >where csquared is indexCorrelation^2. Why is indexCorrelation squared?
> >The comments say a linear interpolation between min_IO and max_IO is
> >used, but ISTM that if it was linear then instead of csquared,
> >indexCorrelation would just be used.
>
> In my tests I got much more plausible results with
>
>     1 - (1 - abs(correlation))^2

What's the theory behind that?

And I'd still like to know why correlation squared is used.

> Jim, are you willing to experiment with one or two small patches of
> mine?  What version of Postgres are you running?

It depends on the patches, since this is a production machine. Currently
it's running 7.4.*mumble*, though I need to upgrade to 8, which I was
intending to do via slony. Perhaps the best thing would be for me to get
that setup and we can experiment against version 8.0.3.
--
Jim C. Nasby, Database Consultant               
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


В списке pgsql-performance по дате сообщения:

От: Bruce Momjian
Дата:
Сообщение: Re: Intel SRCS16 SATA raid?
От: Alex Stapleton
Дата:
Сообщение: Re: Partitioning / Clustering