Re: [HACKERS] PATCH: multivariate histograms and MCV lists

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Дата
Msg-id CAKJS1f9C+=PZ4Ei-OCYTVk5xo84d6LmFQDkJwp9wGNQ38Ajoag@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (David Rowley <david.rowley@2ndquadrant.com>)
Ответы Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Список pgsql-hackers
On Thu, 17 Jan 2019 at 01:56, David Rowley <david.rowley@2ndquadrant.com> wrote:
> At this stage I'm trying to get to know the patch.  I read a lot of
> discussing between you and Dean ironing out how the stats should be
> used to form selectivities.  At the time I'd not read the patch yet,
> so most of it went over my head.
>
> I did note down a few things on my read.  I've included them below.
> Hopefully, they're useful.
>
> MCV list review

Part 2:


35. The evaluation order of this macro is wrong.

#define ITEM_SIZE(ndims) \
(ndims * (sizeof(uint16) + sizeof(bool)) + 2 * sizeof(double))

You'd probably want ITEM_SIZE(10) to return 170, but:

select (10 * (2 + 1) + 2 * 8);
 ?column?
----------
       46

Unsure why this does not cause a crash.

ndims should also have parenthesis around it in case someone does
ITEM_SIZE(x + y), likewise for the other ITEM_* macros.

36. Could do with some comments in get_mincount_for_mcv_list(). What's
magic about 0.04?

37. I think statext_mcv_build() needs some comments to detail out the
arguments.  For example can attrs be empty? Must it contain at least 2
members? etc.

38. Too many "it"s

* we simply treat it as a special item in the MCV list (it it makes it).

39. I don't see analyze_mcv_list() being used anywhere around this comment:

* If we can fit all the items onto the MCV list, do that. Otherwise use
* analyze_mcv_list to decide how many items to keep in the MCV list, just
* like for the single-dimensional MCV list.

40. The comment in the above item seems to indicate the condition for
when all items can fit in the number of groups, but the if condition
does not seem to allow for an exact match?

if (ngroups > nitems)

if you want to check if the number of items can fit in the number of
groups should it be: if (ngroups >= nitems) or if (nitems <= ngroups)
? Perhaps I've misunderstood. The comment is a little confusing as I'm
not sure where the "Otherwise" code is located.

41. I don't think palloc0() is required here. palloc() should be fine
since you're initialising each element in the loop.

mcvlist->items = (MCVItem * *) palloc0(sizeof(MCVItem *) * nitems);

for (i = 0; i < nitems; i++)
{
mcvlist->items[i] = (MCVItem *) palloc(sizeof(MCVItem));
mcvlist->items[i]->values = (Datum *) palloc(sizeof(Datum) * numattrs);
mcvlist->items[i]->isnull = (bool *) palloc(sizeof(bool) * numattrs);
}

I think I agree with the comment above that chunk about reducing the
number of pallocs, even if it's just allocating the initial array as
MCVItems instead of pointers to MCVItems


42. I don't think palloc0() is required in build_distinct_groups().
palloc() should be ok.

Maybe it's worth an Assert(j + 1 == ngroups) to ensure
count_distinct_groups got them all?

43. You're assuming size_t and long are the same size here.

elog(ERROR, "serialized MCV list exceeds 1MB (%ld)", total_length);

I know at least one platform where that's not true.

44. Should use DatumGetCString() instead of DatumGetPointer().

else if (info[dim].typlen == -2) /* cstring */
{
memcpy(data, DatumGetPointer(v), strlen(DatumGetPointer(v)) + 1);
data += strlen(DatumGetPointer(v)) + 1; /* terminator */
}

45. No need to set this to NULL.

Datum    *v = NULL;

Is "value" a better name than "v"?

46. What's the extra 'd' for in:

elog(ERROR, "invalid MCV magic %d (expected %dd)",

and

elog(ERROR, "invalid MCV type %d (expected %dd)",

47. Wondering about the logic behind the variation between elog() and
ereport() in statext_mcv_deserialize(). They all looks like "can't
happen" type errors.

48. format assumes size_t is the same size as long.

elog(ERROR, "invalid MCV size %ld (expected %ld)",
VARSIZE_ANY_EXHDR(data), expected_size);

49. palloc0() followed by memset(). Can just use palloc().

matches = palloc0(sizeof(char) * mcvlist->nitems);
memset(matches, (is_or) ? STATS_MATCH_NONE : STATS_MATCH_FULL,
   sizeof(char) * mcvlist->nitems);

50. The coding style in mcv_get_match_bitmap I think needs to be
postgresqlified. We normally declare all our variables in a chunk then
start setting them, unless the assignment is very simple. I don't
recall places in the code where have a comment when declaring a
variable, for example.

FmgrInfo gtproc;
Var    *var = (varonleft) ? linitial(expr->args) : lsecond(expr->args);
Const    *cst = (varonleft) ? lsecond(expr->args) : linitial(expr->args);
bool isgt = (!varonleft);

TypeCacheEntry *typecache
= lookup_type_cache(var->vartype, TYPECACHE_GT_OPR);

/* match the attribute to a dimension of the statistic */
int idx = bms_member_index(keys, var->varattno);

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: WIP: Avoid creation of the free space map for small tables
Следующее
От: Amit Langote
Дата:
Сообщение: Re: Delay locking partitions during query execution