Re: [HACKERS] PATCH: multivariate histograms and MCV lists

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Дата
Msg-id CAKJS1f_NLyMM7KXHt5+aK7zes-brCk0yRR9=0Us_n0fR6frNWA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (David Rowley <david.rowley@2ndquadrant.com>)
Ответы Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Wed, 23 Jan 2019 at 12:46, David Rowley <david.rowley@2ndquadrant.com> wrote:
> (Stopped in statext_mcv_build(). Need to take a break)

Continuing...

27. statext_mcv_build() could declare the int j,k variables in the
scope that they're required in.

28. "an array"

 * build array of SortItems for distinct groups and counts matching items

29. No need to set isnull to false in statext_mcv_load()

30. Wondering about the reason in statext_mcv_serialize() that you're
not passing the collation to sort the array.

You have:

ssup[dim].ssup_collation = DEFAULT_COLLATION_OID;

should it not be:

ssup[dim].ssup_collation = stats[dim]->attrcollid;
?

31. uint32 should use %u, not %d:

if (mcvlist->magic != STATS_MCV_MAGIC)
elog(ERROR, "invalid MCV magic %d (expected %d)",
mcvlist->magic, STATS_MCV_MAGIC);

and

if (mcvlist->type != STATS_MCV_TYPE_BASIC)
elog(ERROR, "invalid MCV type %d (expected %d)",
mcvlist->type, STATS_MCV_TYPE_BASIC);

and

ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg("invalid length (%d) item array in MCVList",
mcvlist->nitems)));

I don't think %ld is the correct format for VARSIZE_ANY_EXHDR. %u or
%d seem more suited. I see that value is quite often assigned to int,
so probably can't argue much with %d.

elog(ERROR, "invalid MCV size %ld (expected %zu)",
VARSIZE_ANY_EXHDR(data), expected_size);

32. I think the format is wrong here too:

elog(ERROR, "invalid MCV size %ld (expected %ld)",
VARSIZE_ANY_EXHDR(data), expected_size);

I'd expect "invalid MCV size %d (expected %zu)"

33. How do you allocate a single chunk non-densely?

* Allocate one large chunk of memory for the intermediate data, needed
* only for deserializing the MCV list (and allocate densely to minimize
* the palloc overhead).

34. I thought I saw a few issues with pg_stats_ext_mcvlist_items() so
tried to test it:

create table ab (a int, b int);
insert into ab select x,x from generate_serieS(1,10)x;
create statistics ab_ab_stat (mcv) on a,b from ab;
analyze ab;
select pg_mcv_list_items(stxmcv) from pg_Statistic_ext where stxmcv is not null;
ERROR:  cache lookup failed for type 2139062143

The issues I saw were:

You do:
appendStringInfoString(&itemValues, "{");
appendStringInfoString(&itemNulls, "{");

but never append '}' after building the string.

(can use appendStringInfoChar() BTW)

also:

if (i == 0)
{
appendStringInfoString(&itemValues, ", ");
appendStringInfoString(&itemNulls, ", ");
}

I'd have expected you to append the ", " only when i > 0.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: [HACKERS] Block level parallel vacuum