Re: Erroneous cost estimation for nested loop join

Поиск

Список

Период

Сортировка

От	KAWAMICHI Ryoji
Тема	Re: Erroneous cost estimation for nested loop join
Дата	12 ноября 2015 г. 11:30:12
Msg-id	728495397.13043465.1447317006822.JavaMail.zimbra@tkl.iis.u-tokyo.ac.jp обсуждение исходный текст
Ответ на	Re: Erroneous cost estimation for nested loop join (Simon Riggs <simon@2ndQuadrant.com>)
Список	pgsql-hackers

Дерево обсуждения

<simon@2ndQuadrant.com> wrote:
>>
>> We guessed the cause of this error would be in the cost model of Postgres,
>> and investigated the source code of optimizer, and we found the cause of
>> this problem. It was in the index cost estimation process. On scanning
>> inner table, if loop count is greater than 1, its I/O cost is counted as
>> random access. In the case of Query2, in one loop (i.e. one inner table
>> scan) , much data is read sequentially with clustered index, so it seems to
>> be wrong that optimizer thinks its I/O workload is random access.
>>
>
> We don't have a clustered index in Postgres. We do store correlation stats
> for the index, which is used in some places to reduce cost.

Yes, postgres does not have a clustered index as you pointed. I meant an index
whose correlation is 1.0 by using word “clustered index”. In this case,
the index is primary key (records are physically ordered by this) and the index
was created just after the whole data was loaded. We’ve been assuming OLAP
workload for our experiments, so I think correlation = 1.0 is the basic case
for our experiments.

> Could you look some more at this and see if a model change that uses the
> correlation can improve this?

I cannot understand the question so let me clarify. Did you mean that I should
read the optimizer code more, and I can find the correlation is used to improve
cost estimation?

Of course I read them, and I know that correlation is used to determine
the value between the min cost and max cost. (The min cost is the best case
cost (i.e. correlation is 1.0), and the max cost is the worst case cost (i.e.
correlation is 0).

But in both case, I/O cost is counted as random access on scanning inner table.
I think I/O cost should be counted as sequential access when the correlation is
1.0, so I tried to modify the code as previous mail. But this modification is
just an example of solution. I’m not so familiar with optimizer code yet, so
I’m wondering this is the right way or not.

Thank you for your comment.
Ryoji

В списке pgsql-hackers по дате отправления:

Предыдущее

От: KAWAMICHI Ryoji
Дата: 12 ноября 2015 г., 11:22:04
Сообщение: Re: Erroneous cost estimation for nested loop join

Следующее

От: Amit Langote
Дата: 12 ноября 2015 г., 12:38:38
Сообщение: Re: pglogical_output - a general purpose logical decoding output plugin

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Erroneous cost estimation for nested loop join

Предыдущее

Следующее