Re: Statistics Import and Export

Поиск
Список
Период
Сортировка
От Corey Huinker
Тема Re: Statistics Import and Export
Дата
Msg-id CADkLM=evyx0L96XxrLSUDVS+iXgHszLbWMrGwVbBnNVO1ejfJA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Statistics Import and Export  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: Statistics Import and Export
Список pgsql-hackers
  * pg_set_relation_stats(): the warning: "cannot export statistics
prior to version 9.2" doesn't make sense because the function is for
importing. Reword.

+1
 
  * I really think there should be a transactional option, just another
boolean, and if it has a default it should be true. This clearly has
use cases for testing plans, etc., and often transactions will be the
right thing there. This should be a trivial code change, and it will
also be easier to document.

For it to have a default, the parameter would have to be at the end of the list, and it's a parameter list that will grow in the future. And when that happens we have a jumbled parameter list, which is fine if we only ever call params by name, but I know some people won't do that. Which means it's up front right after `version`. Since `version` is already in there, and we can't default that, I feel ok about moving it there, but alas no default.

If there was some way that the function could detect that it was in a binary upgrade, then we could use that to determine if it should update inplace or transactionally.

  * The return type is documented as 'void'? Please change to bool and
be clear about what true/false returns really mean. I think false means
"no updates happened at all, and a WARNING was printed indicating why"
whereas true means "all updates were applied successfully".

Good point, that's a holdover.
 
  * An alternative would be to have an 'error_ok' parameter to say
whether to issue WARNINGs or ERRORs. I think we already discussed that
and agreed on the boolean return, but I just want to confirm that this
was a conscious choice?

That had been discussed as well. If we're adding parameters, then we could add one for that too. It's making the function call progressively more unwieldy, but anyone who chooses to wield these on a regular basis can certainly write a SQL wrapper function to reduce the function call to their presets, I suppose.
 
  * tests should be called stats_import.sql; there's no exporting going
on

Sigh. True.
 
  * Aside from the above comments and some other cleanup, I think this
is a simple patch and independently useful. I am looking to commit this
one soon.

v24-0002:

  * Documented return type is 'void'

  * I'm not totally sure what should be returned in the event that some
updates were applied and some not. I'm inclined to say that true should
mean that all updates were applied -- otherwise it's hard to
automatically detect some kind of typo.

Me either. Suggestions welcome.

I suppose we could return two integers: number of stats input, and number of stats applied. But that could be confusing, as some parameter pairs form one stat ( MCV, ELEM_MCV, etc).

I suppose we could return a set of (param_name text, was_set boolean, applied boolean), without trying to organize them into their pairs, but that would get really verbose.

We should decide on something soon, because we'd want relation stats to follow a similar signature.
 

  * Can you describe your approach to error checking? What kinds of
errors are worth checking, and which should we just put into the
catalog and let the planner deal with?

1. When the parameters given make for something nonsensical Such as providing most_common_elems with no corresponding most_common_freqs, then you can't form an MCV stat, so you must throw out the one you did receive. That gets a warning.

2. When the data provided is antithetical to the type of statistic. For instance, most array-ish parameters can't have NULL values in them (there are separate stats for nulls (null-frac, empty_frac). I don't remember if doing so crashes the server or just creates a hard error, but it's a big no-no, and we have to reject such stats, which for now means a warning and trying to carry on with the stats that remain.

3. When the stats provided would overflow the data structure. We attack this from two directions: First, we eliminate stat kinds that are meaningless for the data type (scalars can't have most-common-elements, only ranges can have range stats, etc), issue warnings for those and move on with the remaining stats. If, however, the number of those statkinds exceeds the number of statkind slots available, then we give up because now we'd have to CHOOSE which N-5 stats to ignore, and the caller is clearly just having fun with us.

We let the planner have fun with other error-like things:

1. most-common-element arrays where the elements are not sorted per spec.

2. frequency histograms where the numbers are not monotonically non-increasing per spec.

3. frequency histograms that have corresponding low bound and high bound values embedded in the array, and the other values in that array must be between the low-high.
 

  * I'd check stakindidx at the time that it's incremented rather than
summing boolean values cast to integers.

Which means that we're checking that and potentially raising the same error in 3-4 places (and growing, unless we raise the max slots), rather than 1. That struck me as worse.
 

v24-0003:

  * I'm not convinced that we should continue when a stat name is not
text. The argument for being lenient is that statistics may change over
time, and we might have to ignore something that can't be imported from
an old version into a new version because it's either gone or the
meaning has changed too much. But that argument doesn't apply to a
bogus call, where the name/value pairs get misaligned or something.

I agree with that.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: jian he
Дата:
Сообщение: Re: Virtual generated columns
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Allow logical failover slots to wait on synchronous replication