Re: [HACKERS] PATCH: multivariate histograms and MCV lists

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Дата
Msg-id 387341b0-e4bb-e27d-8c68-afaef8189c15@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Ответы Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Список pgsql-hackers

On 03/26/2018 12:31 PM, Dean Rasheed wrote:
> On 18 March 2018 at 23:57, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>> Attached is an updated version of the patch series, addressing issues
>> pointed out by Alvaro.
> 
> I'm just starting to look at this now, and I think I'll post 
> individual comments/questions as I get to them rather than trying to 
> review the whole thing, because it's quite a large patch. Apologies
> if some of this has already been discussed.
> 

Sure, works for me. And thanks for looking!

> Looking at the changes to UpdateStatisticsForTypeChange():
> 
> +   memset(nulls, 1, Natts_pg_statistic_ext * sizeof(bool));
> 
> why the "1" there -- is it just a typo?
> 

Yeah, that should be 0. It's not causing any issues, because the
"replaces" array is initialized to 0 so we're not really using the null
value except for individual entries like here:

    if (statext_is_kind_built(oldtup, STATS_EXT_MCV))
    {
        replaces[Anum_pg_statistic_ext_stxmcv - 1] = true;
        nulls[Anum_pg_statistic_ext_stxmcv - 1] = true;
    }

but that sets the "nulls" to true anyway.

> A wider concern I have is that I think this function is trying to be 
> too clever by only resetting selected stats. IMO it should just reset
> all stats unconditionally when the column type changes, which would
> be consistent with what we do for regular stats.
> 
> Consider, for example, what would happen if a column was changed from
> real to int -- all the data values will be coerced to integers, 
> losing precision, and any ndistinct and dependency stats would
> likely be completely wrong afterwards. IMO that's a bug, and should
> be back-patched independently of these new types of extended stats.
> 
> Thoughts?
> 

The argument a year ago was that it's more plausible that the semantics
remains the same. I think the question is how the type change affects
precision - had the type change in the opposite direction (int to real)
there would be no problem, because both ndistinct and dependencies would
produce the same statistics.

In my experience people are far more likely to change data types in a
way that preserves precision, so I think the current behavior is OK.

The other reason is that when reducing precision, it generally enforces
the dependency (you can't violate functional dependencies or break
grouping by merging values). So you will have stale stats with weaker
dependencies, but it's still better than not having any.

But that's mostly unrelated to this patch, of course - for MCV lists and
histograms we can't keep the stats anyway, because the stats actually do
contain the type values (unlike stats introduced in PG10).

Actually, to be more accurate - we now store OIDs of the data types in
the MCV/histogram stats, so perhaps we could keep those too. But that
would be way more work (if at all possible).

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Damir Simunic
Дата:
Сообщение: Re: Proposal: http2 wire format
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: Parallel Aggregates for string_agg and array_agg