Re: [PERFORM] Slow query: bitmap scan troubles

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: [PERFORM] Slow query: bitmap scan troubles
Дата
Msg-id CA+U5nMLjf2-kTa4-AR-0XLKKwbc+=_fb4237i_UAWYzowW+-1Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PERFORM] Slow query: bitmap scan troubles  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [PERFORM] Slow query: bitmap scan troubles  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On 6 January 2013 18:58, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On 5 January 2013 22:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> No.  The argument is that if we don't have some such correction, the
>>> planner is liable to believe that different-sized indexes have *exactly
>>> the same cost*, if a given query would fetch the same number of index
>>> entries.
>
>> The only difference between a large and a small index is the initial
>> fetch, since the depth of the index may vary. After that the size of
>> the index is irrelevant to the cost of the scan, since we're just
>> scanning across the leaf blocks. (Other differences may exist but not
>> related to size).
>
> Right: except for the "fudge factor" under discussion, all the indexscan
> costs that we model come from accessing index leaf pages and leaf
> tuples.  So to the extent that the fudge factor has any principled basis
> at all, it's an estimate of index descent costs.  And in that role I
> believe that total index size needs to be taken into account.
>
>> Perhaps the cost of the initial fetch is what you mean by a
>> "correction"? In that case, why not use the index depth directly from
>> the metapage, rather than play with size?
>
> IIRC, one of my very first attempts to deal with this was to charge
> random_page_cost per level of index descended.  This was such a horrid
> overestimate that it never went anywhere.  I think that reflects that in
> practical applications, the upper levels of the index tend to stay in
> cache.  We could ignore I/O on that assumption and still try to model
> CPU costs of the descent, which is basically what Jeff is proposing.
> My objection to his formula is mainly that it ignores physical index
> size, which I think is important to include somehow for the reasons
> I explained in my other message.

Having a well principled approach will help bring us towards a
realistic estimate.

I can well believe what you say about random_page_cost * index_depth
being an over-estimate.

Making a fudge factor be random_page_cost * ln(1 + index_pages/100000)just seems to presume an effective cache of 8GB
anda fixed
 
depth:size ratio, which it might not be. On a busy system, or with a
very wide index that could also be wrong.

I'd be more inclined to explicitly discount the first few levels by
using random_page_cost * (max(index_depth - 3, 0))
or even better use a formula that includes the effective cache size
and index width to work out the likely number of tree levels cached
for an index.

Whatever we do we must document that we are estimating the cache
effects on the cost of index descent, so we can pick that up on a
future study on cacheing effects.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [PERFORM] Slow query: bitmap scan troubles
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: enhanced error fields