Re: [HACKERS] multivariate statistics (v19)
От | Tomas Vondra |
---|---|
Тема | Re: [HACKERS] multivariate statistics (v19) |
Дата | |
Msg-id | 96973208-28b2-0444-ddfd-45a1f63911b6@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] multivariate statistics (v19) (Petr Jelinek <petr.jelinek@2ndquadrant.com>) |
Список | pgsql-hackers |
On 12/30/2016 02:12 PM, Petr Jelinek wrote: > On 12/12/16 22:50, Tomas Vondra wrote: >> On 12/12/2016 12:26 PM, Amit Langote wrote: >>> >>> Hi Tomas, >>> >>> On 2016/10/30 4:23, Tomas Vondra wrote: >>>> Hi, >>>> >>>> Attached is v20 of the multivariate statistics patch series, doing >>>> mostly >>>> the changes outlined in the preceding e-mail from October 11. >>>> >>>> The patch series currently has these parts: >>>> >>>> * 0001 : (FIX) teach pull_varno about RestrictInfo >>>> * 0002 : (PATCH) shared infrastructure and ndistinct coefficients > > Hi, > > I went over these two (IMHO those could easily be considered as minimal > committable set even if the user visible functionality they provide is > rather limited). > Yes, although I still have my doubts 0001 is the right way to make pull_varnos work. It's probably related to the bigger design question, because moving the statistics selection to an earlier phase could make it unnecessary I guess. >> dropping statistics >> ------------------- >> >> The statistics may be dropped automatically using DROP STATISTICS. >> >> After ALTER TABLE ... DROP COLUMN, statistics referencing are: >> >> (a) dropped, if the statistics would reference only one column >> >> (b) retained, but modified on the next ANALYZE > > This should be documented in user visible form if you plan to keep it > (it does make sense to me). > Yes, I plan to keep it. I agree it should be documented, probably on the ALTER TABLE page (and linked from CREATE/DROP statistics pages). >> + therefore perfectly correlated. Providing additional information about >> + correlation between columns is the purpose of multivariate statistics, >> + and the rest of this section thoroughly explains how the planner >> + leverages them to improve estimates. >> + </para> >> + >> + <para> >> + For additional details about multivariate statistics, see >> + <filename>src/backend/utils/mvstats/README.stats</>. There are additional >> + <literal>READMEs</> for each type of statistics, mentioned in the following >> + sections. >> + </para> >> + >> + </sect1> > > I don't think this qualifies as "thoroughly explains" ;) > OK, I'll drop the "thoroughly" ;-) >> + >> +Oid >> +get_statistics_oid(List *names, bool missing_ok) > > No comment? > >> + case OBJECT_STATISTICS: >> + msg = gettext_noop("statistics \"%s\" does not exist, skipping"); >> + name = NameListToString(objname); >> + break; > > This sounds somewhat weird (plural vs singular). > Ah, right - it should be either "statistic ... does not" or "statistics ... do not". I think "statistics" is the right choice here, because (a) we have CREATE STATISTICS and (b) it may be a combination of statistics, e.g. histogram + MCV. >> + * XXX Maybe this should check for duplicate stats. Although it's not clear >> + * what "duplicate" would mean here (wheter to compare only keys or also >> + * options). Moreover, we don't do such checks for indexes, although those >> + * store tuples and recreating a new index may be a way to fix bloat (which >> + * is a problem statistics don't have). >> + */ >> +ObjectAddress >> +CreateStatistics(CreateStatsStmt *stmt) > > I don't think we should check duplicates TBH so I would remove the XXX > (also "wheter" is typo but if you remove that paragraph it does not matter). > Yes, I came to the same conclusion - we can only really check for exact matches (same set of columns, same choice of statistic types), but that's fairly useless. I'll remove the XXX. >> + if (true) >> + { > > Huh? > Yeah, that's a bit weird pattern. It's a remainder of copy-pasting the preceding block, which looks like this if (hasindex) { ... } But we've decided to not add similar flag for the statistics. I'll move the block to a separate function (instead of merging it directly into the function, which is already a bit largeish). >> + >> +List * >> +RelationGetMVStatList(Relation relation) >> +{ > ... >> + >> +void >> +update_mv_stats(Oid mvoid, MVNDistinct ndistinct, >> + int2vector *attrs, VacAttrStats **stats) > ... >> +static double >> +ndistinct_for_combination(double totalrows, int numrows, HeapTuple *rows, >> + int2vector *attrs, VacAttrStats **stats, >> + int k, int *combination) >> +{ > > > Again, these deserve comment. > OK, will add. > I'll try to look at other patches in the series as time permits. thanks -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: