Re: [HACKERS] PATCH: multivariate histograms and MCV lists

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Дата
Msg-id e410d526-ab5c-0ca3-672b-6a1876a42ea8@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Список pgsql-hackers
Hi,

Attached is an updated version of the patch series, fixing the issues
reported by Mark Dilger:

1) Fix fabs() issue in histogram.c.

2) Do not rely on extra_data being StdAnalyzeData, and instead lookup
the LT operator explicitly. This also adds a simple regression tests to
make sure ANALYZE on arrays works fine, but perhaps we should invent
some simple queries too.

3) I've removed / clarified some of the comments mentioned by Mark.

4) I haven't changed how the statistics kinds are defined in relation.h,
but I agree there should be a comment explaining how STATS_EXT_INFO_*
relate to StatisticExtInfo.kinds.

5) The most significant change happened histograms. There used to be two
structures for histograms:

  - MVHistogram - expanded (no deduplication etc.), result of histogram
    build and never used for estimation

  - MVSerializedHistogram - deduplicated to save space, produced from
    MVHistogram before storing in pg_statistic_ext and never used for
    estimation

So there wasn't really any reason to expose the "non-serialized" version
outside histogram.c. It was just confusing and unnecessary, so I've
moved MVHistogram to histogram.c (and renamed it to MVHistogramBuild),
and renamed MVSerializedHistogram. And same for the MVBucket stuff.

So now we only deal with MVHistogram everywhere, except in histogram.c.

6) I've also made MVHistogram to include a varlena header directly (and
be packed as a bytea), which allows us to store it without having to
call any serialization functions).

I guess if we should do (5) and (6) for the MCV lists too, it seems more
convenient than the current approach. And perhaps even for the
statistics added to 9.6 (it does not change the storage format).


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Erik Rijkers
Дата:
Сообщение: Re: Add RANGE with values and exclusions clauses to the WindowFunctions
Следующее
От: amul sul
Дата:
Сообщение: Re: [HACKERS] Parallel Append implementation