Re: [HACKERS] multivariate statistics (v19)

Поиск

Список

Период

Сортировка

От	Tomas Vondra
Тема	Re: [HACKERS] multivariate statistics (v19)
Дата	4 января 2017 г. 00:55:01
Msg-id	96973208-28b2-0444-ddfd-45a1f63911b6@2ndquadrant.com обсуждение исходный текст
Ответ на	Re: [HACKERS] multivariate statistics (v19) (Petr Jelinek <petr.jelinek@2ndquadrant.com>)
Список	pgsql-hackers

Дерево обсуждения

On 12/30/2016 02:12 PM, Petr Jelinek wrote:
> On 12/12/16 22:50, Tomas Vondra wrote:
>> On 12/12/2016 12:26 PM, Amit Langote wrote:
>>>
>>> Hi Tomas,
>>>
>>> On 2016/10/30 4:23, Tomas Vondra wrote:
>>>> Hi,
>>>>
>>>> Attached is v20 of the multivariate statistics patch series, doing
>>>> mostly
>>>> the changes outlined in the preceding e-mail from October 11.
>>>>
>>>> The patch series currently has these parts:
>>>>
>>>> * 0001 : (FIX) teach pull_varno about RestrictInfo
>>>> * 0002 : (PATCH) shared infrastructure and ndistinct coefficients
>
> Hi,
>
> I went over these two (IMHO those could easily be considered as minimal
> committable set even if the user visible functionality they provide is
> rather limited).
>

Yes, although I still have my doubts 0001 is the right way to make 
pull_varnos work. It's probably related to the bigger design question, 
because moving the statistics selection to an earlier phase could make 
it unnecessary I guess.

>> dropping statistics
>> -------------------
>>
>> The statistics may be dropped automatically using DROP STATISTICS.
>>
>> After ALTER TABLE ... DROP COLUMN, statistics referencing are:
>>
>>   (a) dropped, if the statistics would reference only one column
>>
>>   (b) retained, but modified on the next ANALYZE
>
> This should be documented in user visible form if you plan to keep it
> (it does make sense to me).
>

Yes, I plan to keep it. I agree it should be documented, probably on the 
ALTER TABLE page (and linked from CREATE/DROP statistics pages).

>> +   therefore perfectly correlated. Providing additional information about
>> +   correlation between columns is the purpose of multivariate statistics,
>> +   and the rest of this section thoroughly explains how the planner
>> +   leverages them to improve estimates.
>> +  </para>
>> +
>> +  <para>
>> +   For additional details about multivariate statistics, see
>> +   <filename>src/backend/utils/mvstats/README.stats</>. There are additional
>> +   <literal>READMEs</> for each type of statistics, mentioned in the following
>> +   sections.
>> +  </para>
>> +
>> + </sect1>
>
> I don't think this qualifies as "thoroughly explains" ;)
>

OK, I'll drop the "thoroughly" ;-)

>> +
>> +Oid
>> +get_statistics_oid(List *names, bool missing_ok)
>
> No comment?
>
>> +        case OBJECT_STATISTICS:
>> +            msg = gettext_noop("statistics \"%s\" does not exist, skipping");
>> +            name = NameListToString(objname);
>> +            break;
>
> This sounds somewhat weird (plural vs singular).
>

Ah, right - it should be either "statistic ... does not" or "statistics 
... do not". I think "statistics" is the right choice here, because (a) 
we have CREATE STATISTICS and (b) it may be a combination of statistics, 
e.g. histogram + MCV.

>> + * XXX Maybe this should check for duplicate stats. Although it's not clear
>> + * what "duplicate" would mean here (wheter to compare only keys or also
>> + * options). Moreover, we don't do such checks for indexes, although those
>> + * store tuples and recreating a new index may be a way to fix bloat (which
>> + * is a problem statistics don't have).
>> + */
>> +ObjectAddress
>> +CreateStatistics(CreateStatsStmt *stmt)
>
> I don't think we should check duplicates TBH so I would remove the XXX
> (also "wheter" is typo but if you remove that paragraph it does not matter).
>

Yes, I came to the same conclusion - we can only really check for exact 
matches (same set of columns, same choice of statistic types), but 
that's fairly useless. I'll remove the XXX.

>> +    if (true)
>> +    {
>
> Huh?
>

Yeah, that's a bit weird pattern. It's a remainder of copy-pasting the 
preceding block, which looks like this
    if (hasindex)    {        ...    }

But we've decided to not add similar flag for the statistics. I'll move 
the block to a separate function (instead of merging it directly into 
the function, which is already a bit largeish).

>> +
>> +List *
>> +RelationGetMVStatList(Relation relation)
>> +{
> ...
>> +
>> +void
>> +update_mv_stats(Oid mvoid, MVNDistinct ndistinct,
>> +                int2vector *attrs, VacAttrStats **stats)
> ...
>> +static double
>> +ndistinct_for_combination(double totalrows, int numrows, HeapTuple *rows,
>> +                   int2vector *attrs, VacAttrStats **stats,
>> +                   int k, int *combination)
>> +{
>
>
> Again, these deserve comment.
>

OK, will add.

> I'll try to look at other patches in the series as time permits.

thanks

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [HACKERS] multivariate statistics (v19)