Re: [HACKERS] PATCH: multivariate histograms and MCV lists

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Дата
Msg-id add2f0d3-4fbc-af7c-e02e-0f2b2d6dedc3@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (David Rowley <david.rowley@2ndquadrant.com>)
Ответы Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Список pgsql-hackers
Hi,

Attached is an updated patch, fixing all the issues pointed out so far.
Unless there are some objections, I plan to commit the 0001 part by the
end of this CF. Part 0002 is a matter for PG13, as previously agreed.


On 3/24/19 1:17 AM, David Rowley wrote:
> On Sun, 24 Mar 2019 at 12:41, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>>
>> On 3/21/19 4:05 PM, David Rowley wrote:
>>> 11. In get_mincount_for_mcv_list() it's probably better to have the
>>> numerical literals of 0.0 instead of just 0.
>>
>> Why?
> 
> Isn't it what we do for float and double literals?
> 

OK, fixed.

>>
>>> 12. I think it would be better if you modified build_attnums_array()
>>> to add an output argument that sets the size of the array. It seems
>>> that most places you call this function you perform bms_num_members()
>>> to determine the array size.
>>
>> Hmmm. I've done this, but I'm not sure I like it very much - there's no
>> protection the value passed in is the right one, so the array might be
>> allocated either too small or too large. I think it might be better to
>> make it work the other way, i.e. pass the value out instead.
> 
> When I said "that sets the size", I meant "that gets set to the size",
> sorry for the confusion.  I mean, if you're doing bms_num_members()
> inside build_attnums_array() anyway, then this will save you from
> having to do that in the callers too.
> 

OK, I've done this now, and I'm fairly happy with it.

>>> 28. Can you explain what this is?
>>>
>>> uint32 type; /* type of MCV list (BASIC) */
>>>
>>> I see: #define STATS_MCV_TYPE_BASIC 1 /* basic MCV list type */
>>>
>>> but it's not really clear to me what else could exist. Maybe the
>>> "type" comment can explain there's only one type for now, but more
>>> might exist in the future?
>>>
>>
>> It's the same idea as for dependencies/mvdistinct stats, i.e.
>> essentially a version number for the data structure so that we can
>> perhaps introduce some improved version of the data structure in the future.
>>
>> But now that I think about it, it seems a bit pointless. We would only
>> do that in a major version anyway, and we don't keep statistics during
>> upgrades. So we could just as well introduce the version/flag/... if
>> needed. We can't do this for regular persistent data, but for stats it
>> does not matter.
>>
>> So I propose we just remove this thingy from both the existing stats and
>> this patch.
> 
> I see. I wasn't aware that existed for the other types.   It certainly
> gives some wiggle room if some mistakes were discovered after the
> release, but thinking about it, we could probably just change the
> "magic" number and add new code in that branch only to ignore the old
> magic number, perhaps with a WARNING to analyze the table again. The
> magic field seems sufficiently early in the struct that we could do
> that.  In the master branch we'd just error if the magic number didn't
> match, since we wouldn't have to deal with stats generated by the
> previous version's bug.
> 

OK. I've decided to keep the field for now, for sake of consistency with
the already existing statistic types. I think we can rethink that in the
future, if needed.

>>> 29. Looking at the tests I see you're testing that you get bad
>>> estimates without extended stats.  That does not really seem like
>>> something that should be done in tests that are meant for extended
>>> statistics.
>>>
>>
>> True, it might be a bit unnecessary. Initially the tests were meant to
>> show old/new estimates for development purposes, but it might not be
>> appropriate for regression tests. I don't think it's a big issue, it's
>> not like it'd slow down the tests significantly. Opinions?
> 
> My thoughts were that if someone did something to improve non-MV
> stats, then is it right for these tests to fail? What should the
> developer do in the case? update the expected result? remove the test?
> It's not so clear.
> 

I think such changes would affect a number of other places in regression
tests (changing plans, ...), so I don't see why fixing these tests would
be any different.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: [HACKERS] PATCH: multivariate histograms and MCV lists