Re: [HACKERS] multivariate statistics (v19)

Поиск
Список
Период
Сортировка
От Petr Jelinek
Тема Re: [HACKERS] multivariate statistics (v19)
Дата
Msg-id 777a9240-61a4-c09b-a4c7-ff0fdcad34f9@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: [HACKERS] multivariate statistics (v19)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: [HACKERS] multivariate statistics (v19)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
On 12/12/16 22:50, Tomas Vondra wrote:
> On 12/12/2016 12:26 PM, Amit Langote wrote:
>>
>> Hi Tomas,
>>
>> On 2016/10/30 4:23, Tomas Vondra wrote:
>>> Hi,
>>>
>>> Attached is v20 of the multivariate statistics patch series, doing
>>> mostly
>>> the changes outlined in the preceding e-mail from October 11.
>>>
>>> The patch series currently has these parts:
>>>
>>> * 0001 : (FIX) teach pull_varno about RestrictInfo
>>> * 0002 : (PATCH) shared infrastructure and ndistinct coefficients

Hi,

I went over these two (IMHO those could easily be considered as minimal
committable set even if the user visible functionality they provide is
rather limited).

> dropping statistics
> -------------------
> 
> The statistics may be dropped automatically using DROP STATISTICS.
> 
> After ALTER TABLE ... DROP COLUMN, statistics referencing are:
> 
>   (a) dropped, if the statistics would reference only one column
> 
>   (b) retained, but modified on the next ANALYZE

This should be documented in user visible form if you plan to keep it
(it does make sense to me).

> +   therefore perfectly correlated. Providing additional information about
> +   correlation between columns is the purpose of multivariate statistics,
> +   and the rest of this section thoroughly explains how the planner
> +   leverages them to improve estimates.
> +  </para>
> +
> +  <para>
> +   For additional details about multivariate statistics, see
> +   <filename>src/backend/utils/mvstats/README.stats</>. There are additional
> +   <literal>READMEs</> for each type of statistics, mentioned in the following
> +   sections.
> +  </para>
> +
> + </sect1>

I don't think this qualifies as "thoroughly explains" ;)

> +
> +Oid
> +get_statistics_oid(List *names, bool missing_ok)

No comment?

> +        case OBJECT_STATISTICS:
> +            msg = gettext_noop("statistics \"%s\" does not exist, skipping");
> +            name = NameListToString(objname);
> +            break;

This sounds somewhat weird (plural vs singular).

> + * XXX Maybe this should check for duplicate stats. Although it's not clear
> + * what "duplicate" would mean here (wheter to compare only keys or also
> + * options). Moreover, we don't do such checks for indexes, although those
> + * store tuples and recreating a new index may be a way to fix bloat (which
> + * is a problem statistics don't have).
> + */
> +ObjectAddress
> +CreateStatistics(CreateStatsStmt *stmt)

I don't think we should check duplicates TBH so I would remove the XXX
(also "wheter" is typo but if you remove that paragraph it does not matter).

> +    if (true)
> +    {

Huh?

> +
> +List *
> +RelationGetMVStatList(Relation relation)
> +{
...
> +
> +void
> +update_mv_stats(Oid mvoid, MVNDistinct ndistinct,
> +                int2vector *attrs, VacAttrStats **stats)
...
> +static double
> +ndistinct_for_combination(double totalrows, int numrows, HeapTuple *rows,
> +                   int2vector *attrs, VacAttrStats **stats,
> +                   int k, int *combination)
> +{


Again, these deserve comment.

I'll try to look at other patches in the series as time permits.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Petr Jelinek
Дата:
Сообщение: Re: [HACKERS] multivariate statistics (v19)
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] Potential data loss of 2PC files