Re: anti-join chosen even when slower than old plan

Поиск
Список
Период
Сортировка
От Vitalii Tymchyshyn
Тема Re: anti-join chosen even when slower than old plan
Дата
Msg-id 4CDD2DC1.80107@gmail.com
обсуждение исходный текст
Ответ на Re: anti-join chosen even when slower than old plan  (Cédric Villemain <cedric.villemain.debian@gmail.com>)
Ответы Re: anti-join chosen even when slower than old plan
Список pgsql-performance
12.11.10 12:56, Cédric Villemain написав(ла):
> I supposed it was an answer to my mail but not sure... please keep
> CC'ed people, it is easier to follow threads (at least for me)
>
OK
> 2010/11/12 Vitalii Tymchyshyn<tivv00@gmail.com>:
>
>> I'd say there are two Qs here:
>>
>> 1) Modify costs based on information on how much of the table is in cache.
>> It would be great  if this can be done, but I'd prefer to have it as admin
>> knobs (because of plan stability). May be both admin and automatic ways can
>> be followed with some parallel (disableable) process modify knobs on admin
>> behalf. In this case different strategies to automatically modify knobs can
>> be applied.
>>
> OS cache is usualy stable enough to keep your plans stable too, I think.
>
Not if it is on edge. There are always edge cases where data fluctuates
near some threshold.
>
>> 2) Modify costs for part of table retrieval. Then you need to define "part".
>> Current ways are partitioning and partial indexes. Some similar to partial
>> index thing may be created, that has only "where" clause and no data. But
>> has statistics and knobs (and may be personal bufferspace if they are
>> introduced). I don't like to gather data about "last X percents" or like,
>> because it works only in clustering and it's hard for optimizer to decide if
>> it will be enough to scan only this percents for given query.
>>
> Modifying random_page_cost and sequential_page_cost thanks to
> statistics about cached blocks can be improved if we know the
> distribution.
>
> It does not mean : we know we have last 15% in cache, and we are goign
> to request those 15%.
>

You mean *_cost for the whole table, don't you? That is case (1) for me.
Case (2) is when different cost values are selected based on what
portion of table is requested in the query. E.g. when we have data for
the whole day in one table, data for the last hour is cached and all the
other data is not. Optimizer then may use different *_cost for query
that requires all the data and for query that requires only last hour
data. But, as I've said, that is much more complex task then (1).

Best regards, Vitalii Tymchyshyn


В списке pgsql-performance по дате отправления:

Предыдущее
От: Cédric Villemain
Дата:
Сообщение: Re: anti-join chosen even when slower than old plan
Следующее
От: Cédric Villemain
Дата:
Сообщение: Re: anti-join chosen even when slower than old plan