Re: [HACKERS] PATCH: multivariate histograms and MCV lists
| От | David Rowley |
|---|---|
| Тема | Re: [HACKERS] PATCH: multivariate histograms and MCV lists |
| Дата | |
| Msg-id | CAKJS1f81TbFHJtfzYYVH19NWroxX4NOyZ2vvd5y+jb6VLr3WRg@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: [HACKERS] PATCH: multivariate histograms and MCV lists (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
| Ответы |
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
|
| Список | pgsql-hackers |
I've started looking over 0002. Here are a few things so far:
1. I think this should be pg_statistic_ext.stxhistogram?
Values of the <type>pg_histogram</type> can be obtained only from the
<literal>pg_statistic.stxhistogram</literal> column.
2. I don't think this bms_copy is needed anymore. I think it was
previously since there were possibly multiple StatisticExtInfo objects
per pg_statistic_ext row, but now it's 1 for 1.
+ info->keys = bms_copy(keys);
naturally, the bms_free() will need to go too.
3. I've not really got into understanding how the new statistics types
are applied yet, but I found this:
* If asked to build both MCV and histogram, first build the MCV part
* and then histogram on the remaining rows.
I guess that means we'll get different estimates with:
create statistic a_stats (mcv,histogram) on a,b from t;
vs
create statistic a_stats1 (mcv) on a,b from t;
create statistic a_stats2 (histogram) on a,b from t;
Is that going to be surprising to people?
4. I guess you can replace "(histogram == NULL);" with "false". The
compiler would likely do it anyway, but...
if (histogram != NULL)
{
/* histogram already is a bytea value, not need to serialize */
nulls[Anum_pg_statistic_ext_stxhistogram - 1] = (histogram == NULL);
values[Anum_pg_statistic_ext_stxhistogram - 1] = PointerGetDatum(histogram);
}
but, hmm. Shouldn't you serialize this, like you are with the others?
5. serialize_histogram() and statext_histogram_deserialize(), should
these follow the same function naming format?
6. IIRC some compilers may warn about this:
if (stat->kinds & requiredkinds)
making it:
if ((stat->kinds & requiredkinds))
should fix that.
UPDATE: Tried to make a few compilers warn about this and failed.
Perhaps I've misremembered.
7. Comment claims function has a parameter named 'requiredkind', but
it no longer does. The comment also needs updated to mention that it
finds statistics with any of the required kinds.
* choose_best_statistics
* Look for and return statistics with the specified 'requiredkind' which
* have keys that match at least two of the given attnums. Return NULL if
* there's no match.
*
* The current selection criteria is very simple - we choose the statistics
* object referencing the most of the requested attributes, breaking ties
* in favor of objects with fewer keys overall.
*
* XXX If multiple statistics objects tie on both criteria, then which object
* is chosen depends on the order that they appear in the stats list. Perhaps
* further tiebreakers are needed.
*/
StatisticExtInfo *
choose_best_statistics(List *stats, Bitmapset *attnums, int requiredkinds)
8. Looking at statext_clauselist_selectivity() I see it calls
choose_best_statistics() passing requiredkinds as STATS_EXT_INFO_MCV |
STATS_EXT_INFO_HISTOGRAM, do you think the function now needs to
attempt to find the best match plus the one with the most statistics
kinds?
It might only matter if someone had:
create statistic a_stats1 (mcv) on a,b from t;
create statistic a_stats2 (histogram) on a,b from t;
create statistic a_stats3 (mcv,histogram) on a,b from t;
Is it fine to just return a_stats1 and ignore the fact that a_stats3
is probably better? Or too corner case to care?
9. examine_equality_clause() assumes it'll get a Var. I see we should
only allow clauses that pass statext_is_compatible_clause_internal(),
so maybe it's worth an Assert(IsA(var, Var)) along with a comment to
mention anything else could not have been allowed.
10. Does examine_equality_clause need 'root' as an argument?
11. UINT16_MAX -> PG_UINT16_MAX
/* make sure we fit into uint16 */
Assert(count <= UINT16_MAX);
(Out of energy for today.)
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
В списке pgsql-hackers по дате отправления: