Re: Improve selectivity estimate for range queries

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: Improve selectivity estimate for range queries
Дата	21 декабря 2018 г. 16:50:28
Msg-id	28533.1545411028@sss.pgh.pa.us обсуждение исходный текст
Ответ на	RE: Improve selectivity estimate for range queries ("Yuzuko Hosoya" <hosoya.yuzuko@lab.ntt.co.jp>)
Ответы	Re: Improve selectivity estimate for range queries Re: Improve selectivity estimate for range queries
Список	pgsql-hackers

Дерево обсуждения

"Yuzuko Hosoya" <hosoya.yuzuko@lab.ntt.co.jp> writes:
> From: Kyotaro HORIGUCHI [mailto:horiguchi.kyotaro@lab.ntt.co.jp]
>> At Thu, 20 Dec 2018 17:21:29 +0900, "Yuzuko Hosoya" <hosoya.yuzuko@lab.ntt.co.jp> wrote in
>> <008701d4983d$02e731c0$08b59540$@lab.ntt.co.jp>
>>> To handle such cases I've thought up of an idea based on a previous
>>> commit[1] which solved a similar problem related to
>>> DEFAULT_NUM_DISTINCT.  Just like this modification, I added a flag
>>> which shows whether the selectivity

>> The commit [1] added almost no complexity, but this does. Affects many
>> functions to introduce tighter coupling between operator-selectivity
>> functions and clause selectivity functions. isdefault travells alone
>> too long just to bind remote functions. We might need more pricipled
>> way to handle that.

Yeah, I don't think that this approach is a reasonable tradeoff to
eliminate a narrow case.  In particular, what's been done here to
clause_selectivity is totally unprincipled and poorly thought out:
you can't add an output parameter that's set in only some cases.
Yet, there's no good way to decide how to set it in many cases.
For instance, what would you do for an AND or OR where some of
the branches are default estimates and some not?

>> Around the [1], there was a discussion about introducing the notion of
>> uncertainty to selectivity.  The isdefualt is a kind of uncertainty
>> value indicating '0/100% uncertain'. So my feeling is saying that
>> it's better that Selectivity has certinty component then building a
>> selectivity arithmetics involving uncertainty, though I don't have a
>> clear picture.

> I see.  Indeed if we would change Selectivity so that it includes
> information about default/non-default, such problems I mentioned
> would be solved.  But I'm afraid that this modification would lead
> to a big impact, since there are lots of functions that calculate
> selectivity.  I also don't have a concreate idea, so I will think
> about it.  Besides that, I'd like to ask community whether such
> modification would be acceptable or not.

The planner definitely has a lot of problems that could be addressed
with some sort of uncertainty framework ... but it'd be a huge
undertaking, which is why nobody's tried yet.

A smaller-footprint way to fix the immediate problem might be to
change the values of DEFAULT_INEQ_SEL and friends so that they're
even less likely to be matched by accident.  That is, instead of
0.3333333333333333, use 0.333333333333342 or some such.  It might
seem that that's just moving the problem around, but I think it
might be possible to show that such a value couldn't be computed
by scalarltsel given a histogram with no more than 10000 members.
(I haven't tried to actually prove that, but it seems intuitive
that the set of possible results would be quantized with no more
than about 5 digits precision.)

            regards, tom lane

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Improve selectivity estimate for range queries