Re: Extended Statistics set/restore/clear functions.

Поиск
Список
Период
Сортировка
От Corey Huinker
Тема Re: Extended Statistics set/restore/clear functions.
Дата
Msg-id CADkLM=fv9sszLWJJvbG1meOvHEueN0fubHr2hGQZOYVfv5rwzQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Extended Statistics set/restore/clear functions.  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: Extended Statistics set/restore/clear functions.
Список pgsql-hackers
Thanks for the new patch.  And FWIW I disagree with this approach:
cleanup and refactoring pieces make more sense if done first, as these
lead to less code churn in the final result.  So...  I've begun to put
my hands on the patch set.  The whole has been restructured a bit, as
per the attached.  Patch 0001 to 0004 feel OK here, these include two
code moves and the two output functions:
- Two new files for adt/, that I'm planning to apply soon as a
separate cleanup.
- New output functions, with keys added to a new header named
statistics_format.h, for frontend and backend consumption.

Agreed, 0001-0004 all look good.
 
Next comes the input functions.  First, I am unhappy with the amount
of testing that has been put into ndistinct, first and only input
facility I've looked at in details for the moment.  I have quickly
spotted a couple a few issues while testing buggy input, like this one
that crashes on pointer dereference, not good obviously:
SELECT '[]'::pg_ndistinct;

- I put some work into more specific error messages for invalid values for both pg_ndistinct and pg_dependencies.
- The check for empty attribute lists and item lists now occur in the array-end event handler.
- Also tried to standardize conventions between the two data types (switch statements, similar utility functions, etc).
 

These are checked in the patches that introduce the functions like
with pg_ndistinct_validate_items(), based on the list of stxkeys we
have.  However, I think that this is not enough by itself.  Shouldn't
we check that the list of items in the array is what we expect based
on the longest "attributes" array at least, even after a JSON that was
parsed?  That would be cheap to check in the output function itself,
at least as a first layer of checks before trying something with the
import function and cross-checking the list of attributes for the
extended statistics object.

I added tests for both duplicate attribute sequences as well as making the first-longest attribute sequence the template by which all later and shorter sequences are checked. I had been reluctant to add checks like this, because so many similar validations were removed from the earlier statistics code like histograms and the like.
 

I suspect a similar family of issues with pg_dependencies, and it
would be nice to move the tests with the input function into a new
regression file, like the other one.

Did so.
 
0001-0004,0007,0009 unchanged.

Significant modification of the stats_import.sql regression tests in 0008 to conform to stricter datatype rules enacted in 0005, 0006.

Вложения

В списке pgsql-hackers по дате отправления: