Re: PoC/WIP: Extended statistics on expressions

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: PoC/WIP: Extended statistics on expressions
Дата
Msg-id 958870c8-65e0-31b1-4591-b0b10e807dd9@enterprisedb.com
обсуждение исходный текст
Ответ на Re: PoC/WIP: Extended statistics on expressions  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Ответы Re: PoC/WIP: Extended statistics on expressions  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Список pgsql-hackers
On 12/11/20 1:58 PM, Dean Rasheed wrote:
> On Tue, 8 Dec 2020 at 12:44, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>>
>> Possibly. But I don't think it's worth the extra complexity. I don't
>> expect people to have a lot of overlapping stats, so the amount of
>> wasted space and CPU time is expected to be fairly limited.
>>
>> So I don't think it's worth spending too much time on this now. Let's
>> just do what you proposed, and revisit this later if needed.
>>
> 
> Yes, I think that's a reasonable approach to take. As long as the
> documentation makes it clear that building MCV stats also causes
> standard expression stats to be built on any expressions included in
> the list, then the user will know and can avoid duplication most of
> the time. I don't think there's any need for code to try to prevent
> that -- just as we don't bother with code to prevent a user building
> multiple indexes on the same column.
> 
> The only case where duplication won't be avoidable is where there are
> multiple MCV stats sharing the same expression, but that's probably
> quite unlikely in practice, and it seems acceptable to leave improving
> that as a possible future optimisation.
> 

OK. Attached is an updated version, reworking it this way.

I tried tweaking the grammar to differentiate these two syntax variants,
but that led to shift/reduce conflicts with the existing ones. I tried
fixing that, but I ended up doing that in CreateStatistics().

The other thing is that we probably can't tie this to just MCV, because
functional dependencies need the per-expression stats too. So I simply
build expression stats whenever there's at least one expression.

I also decided to keep the "expressions" statistics kind - it's not
allowed to specify it in CREATE STATISTICS, but it's useful internally
as it allows deciding whether to build the stats in a single place.
Otherwise we'd need to do that every time we build the statistics, etc.

I added a brief explanation to the sgml docs, not sure if that's good
enough - maybe it needs more details.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fabien COELHO
Дата:
Сообщение: Re: PG vs LLVM 12 on seawasp, next round
Следующее
От: "David G. Johnston"
Дата:
Сообщение: Re: Insert Documentation - Returning Clause and Order