Re: [HACKERS] multivariate statistics (v19)
| От | Tomas Vondra |
|---|---|
| Тема | Re: [HACKERS] multivariate statistics (v19) |
| Дата | |
| Msg-id | 96973208-28b2-0444-ddfd-45a1f63911b6@2ndquadrant.com обсуждение исходный текст |
| Ответ на | Re: [HACKERS] multivariate statistics (v19) (Petr Jelinek <petr.jelinek@2ndquadrant.com>) |
| Список | pgsql-hackers |
On 12/30/2016 02:12 PM, Petr Jelinek wrote:
> On 12/12/16 22:50, Tomas Vondra wrote:
>> On 12/12/2016 12:26 PM, Amit Langote wrote:
>>>
>>> Hi Tomas,
>>>
>>> On 2016/10/30 4:23, Tomas Vondra wrote:
>>>> Hi,
>>>>
>>>> Attached is v20 of the multivariate statistics patch series, doing
>>>> mostly
>>>> the changes outlined in the preceding e-mail from October 11.
>>>>
>>>> The patch series currently has these parts:
>>>>
>>>> * 0001 : (FIX) teach pull_varno about RestrictInfo
>>>> * 0002 : (PATCH) shared infrastructure and ndistinct coefficients
>
> Hi,
>
> I went over these two (IMHO those could easily be considered as minimal
> committable set even if the user visible functionality they provide is
> rather limited).
>
Yes, although I still have my doubts 0001 is the right way to make
pull_varnos work. It's probably related to the bigger design question,
because moving the statistics selection to an earlier phase could make
it unnecessary I guess.
>> dropping statistics
>> -------------------
>>
>> The statistics may be dropped automatically using DROP STATISTICS.
>>
>> After ALTER TABLE ... DROP COLUMN, statistics referencing are:
>>
>> (a) dropped, if the statistics would reference only one column
>>
>> (b) retained, but modified on the next ANALYZE
>
> This should be documented in user visible form if you plan to keep it
> (it does make sense to me).
>
Yes, I plan to keep it. I agree it should be documented, probably on the
ALTER TABLE page (and linked from CREATE/DROP statistics pages).
>> + therefore perfectly correlated. Providing additional information about
>> + correlation between columns is the purpose of multivariate statistics,
>> + and the rest of this section thoroughly explains how the planner
>> + leverages them to improve estimates.
>> + </para>
>> +
>> + <para>
>> + For additional details about multivariate statistics, see
>> + <filename>src/backend/utils/mvstats/README.stats</>. There are additional
>> + <literal>READMEs</> for each type of statistics, mentioned in the following
>> + sections.
>> + </para>
>> +
>> + </sect1>
>
> I don't think this qualifies as "thoroughly explains" ;)
>
OK, I'll drop the "thoroughly" ;-)
>> +
>> +Oid
>> +get_statistics_oid(List *names, bool missing_ok)
>
> No comment?
>
>> + case OBJECT_STATISTICS:
>> + msg = gettext_noop("statistics \"%s\" does not exist, skipping");
>> + name = NameListToString(objname);
>> + break;
>
> This sounds somewhat weird (plural vs singular).
>
Ah, right - it should be either "statistic ... does not" or "statistics
... do not". I think "statistics" is the right choice here, because (a)
we have CREATE STATISTICS and (b) it may be a combination of statistics,
e.g. histogram + MCV.
>> + * XXX Maybe this should check for duplicate stats. Although it's not clear
>> + * what "duplicate" would mean here (wheter to compare only keys or also
>> + * options). Moreover, we don't do such checks for indexes, although those
>> + * store tuples and recreating a new index may be a way to fix bloat (which
>> + * is a problem statistics don't have).
>> + */
>> +ObjectAddress
>> +CreateStatistics(CreateStatsStmt *stmt)
>
> I don't think we should check duplicates TBH so I would remove the XXX
> (also "wheter" is typo but if you remove that paragraph it does not matter).
>
Yes, I came to the same conclusion - we can only really check for exact
matches (same set of columns, same choice of statistic types), but
that's fairly useless. I'll remove the XXX.
>> + if (true)
>> + {
>
> Huh?
>
Yeah, that's a bit weird pattern. It's a remainder of copy-pasting the
preceding block, which looks like this
if (hasindex) { ... }
But we've decided to not add similar flag for the statistics. I'll move
the block to a separate function (instead of merging it directly into
the function, which is already a bit largeish).
>> +
>> +List *
>> +RelationGetMVStatList(Relation relation)
>> +{
> ...
>> +
>> +void
>> +update_mv_stats(Oid mvoid, MVNDistinct ndistinct,
>> + int2vector *attrs, VacAttrStats **stats)
> ...
>> +static double
>> +ndistinct_for_combination(double totalrows, int numrows, HeapTuple *rows,
>> + int2vector *attrs, VacAttrStats **stats,
>> + int k, int *combination)
>> +{
>
>
> Again, these deserve comment.
>
OK, will add.
> I'll try to look at other patches in the series as time permits.
thanks
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: