Statistics on key distribution (was: Re: order by and index path)

Поиск
Список
Период
Сортировка
От jwieck@debis.com (Jan Wieck)
Тема Statistics on key distribution (was: Re: order by and index path)
Дата
Msg-id m0zTowS-000EBRC@orion.SAPserv.Hamburg.dsh.de
обсуждение исходный текст
Ответ на Re: [HACKERS] Re: order by and index path  ("Thomas G. Lockhart" <lockhart@alumni.caltech.edu>)
Список pgsql-hackers
>
> >     We  do  not  have  this statistical information. So the whole
> >     thing is at this time academic.
>
> I recall that Commercial Ingres made the assumption that one row (or 1%
> of rows? My memory of Ingres is fading :) would be returned from a
> qualified query if no statistics were available to suggest otherwise.
>
> It did collect statistics on data distribution to try to help make those
> optimizer choices.
>
> It may be reasonable to assume that if there is an index, then using it
> with any qualified query would be a win. Since the alternative is to
> decide to _not_ use an index, a decision for which we have no support
> with existing statistics.

    It  may  be  also reasonable to collect statistic information
    and use that to quantify the cost of an index scan.

    The vacuum cleaner scans all indices on a  relation  vacuum'd
    completely.  And  at that time it already knows the number of
    pages and tuples in  the  heap  relation  (has  that  in  the
    vcrelstats).

    Based  on this it could decide to take every n'th index tuple
    while scanning and drop them somewhere where  other  backends
    can  find  them.   This  would be the statistical information
    needed by the optimizer to estimate the real cost of an index
    scan.  It  is  only of interest for big tables, where hopping
    from block to block will make an index scan a looser  against
    a  seqscan  in  a  many row matching scan.  So it's up to the
    optimizer do decide based on the # of  pages  if  statistical
    information is really required for cost calculation.

    Having   the  final  indexqual  along  with  the  statistical
    information it will be a little tricky to figure out how many
    rows it might return, but not impossible.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Brook Milligan
Дата:
Сообщение: Re: [PATCHES] TCL/TK configuration clean-up patches
Следующее
От: Oleg Bartunov
Дата:
Сообщение: small bug in src/interfaces/ecpg/lib/Makefile.in