Sorting costs (was Caution: tonight's commits force initdb)

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Sorting costs (was Caution: tonight's commits force initdb)
Дата
Msg-id 2084.935506730@sss.pgh.pa.us
обсуждение исходный текст
Ответ на RE: [HACKERS] Caution: tonight's commits force initdb  ("Hiroshi Inoue" <Inoue@tpf.co.jp>)
Список pgsql-hackers
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> Hmm,Index scan is chosen to select all rows.
> AFAIK,sequential scan + sort is much faster than index scan in
> most cases.
>     cost of index scan < cost of sequential scan + cost of sort
> I have felt that the current cost estimation of index scan is too small,
> though I have no alternative.

Hmm.  Well, it's still a step forward that the system is able to
consider this query plan --- if it's choosing a plan that's actually
slower, then that indicates we have a problem with our cost estimates.

The current cost estimate for a sort (see optimizer/path/costsize.c) is
basically just P log P disk accesses (P being the estimated relation
size in pages) plus N log N tuple comparisons (N being the estimated
tuple count).  This is fairly bogus --- for one thing it does not
account for the fact that sorts smaller than SortMem kilobytes are done
in-memory without temp files.  I doubt that the amount of I/O for a
larger sort is quite right either.  We need to look at exactly what
usage psort.c makes of temp files and revise the I/O estimates
accordingly.

I am also suspicious that indexscan costs are underestimated.  The
cost of reading the index is probably not too far off, but the cost
of accessing the main table is bogus.  Worst case, for a table whose
tuples are thoroughly scattered, you would have a main-table page fetch
for *each* returned tuple.  In practice it's probably not anywhere near
that bad, since you may have some clustering of tuples (so that the same
page is hit several times in a row), and also the Postgres buffers and
the underlying Unix system's disk cache will save trips to disk if there
is any locality of reference at all.  I have no idea how to estimate
that effect --- anyone?  But cost_index is clearly producing a
ridiculously optimistic estimate at the moment.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Leon
Дата:
Сообщение: Re: [HACKERS] Lex and things...
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] vacuum process size