Re: Add expressions to pg_restore_extended_stats()
| От | Corey Huinker |
|---|---|
| Тема | Re: Add expressions to pg_restore_extended_stats() |
| Дата | |
| Msg-id | CADkLM=dHfy2YT1J57TNt4KViqxHGHiEU-72hKA_TX1gb2u==Ow@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: Add expressions to pg_restore_extended_stats() (Michael Paquier <michael@paquier.xyz>) |
| Ответы |
Re: Add expressions to pg_restore_extended_stats()
|
| Список | pgsql-hackers |
This one would not actually exist if using a 1-D text[] with elements
made of key/value/key/value as the array deconstruction would take
care of that, wouldn't it?
If it were 1-D we'd have to know where one expression ended and the next one started. If you're asking what I think you're asking, an array like:
array[
array['null_frac','0.4','avg_width', '...', ...].
array['null_frac','0.5','avg_width', '...', ...].
]this would force us to either require an order for the keys, in which case, why bother, or it would require a loop through each 1-D array comparing each keyword against the list of known keywords so we can put the values in order (and resolve duplicates). It's not the cleanest thing, but I have done it before, and it eventually resulted in stats_fill_fcinfo_from_arg_pairs(), which could be adapted to handle this as well.
Hmm. We'd still need to deal with buggy inputs even if this
conversion is enforced in pg_dump.
True, but we're already doing it in both the 2-D text array version and the json version, the only difference is the number of hoops we have to jump through to get an input to a *Safe() input function.
> - utility function for converting jbvString values to TextDatum.
> - utility function to simplify json iteration in a simple key/value object
> - deciding that unknown parameters aren't worth a warning
Do you actually need check_valid_expr_argnames() at all?
No, not if we're ok silently missing the "ok, with warning: extra exprs param" test. It seems like something we should report, but also something we shouldn't fail on.
At the end
of import_pg_statistic() we know that all the parameters have been
found.
We don't, actually. Any un-found parameters would be left at their default values without error. It would be easy enough to set a flag to true for each one found and report on that, but I haven't implemented that yet.
So we could just check on the numbers of values fetched and
complain based on NUM_ATTRIBUTE_STATS_ELEMS? That would remove some
code, not impact the safety. key_lookup_notnull() could be changed to
return three possible states rather than a boolean: okay and NULL
value, okay and not-NULL value, error if key not found in JSON blob or
incorrect parsing of the input value (aka infinite, input function
error, etc). Then increment the counter if one of the first two states
is found as NULL can be valid for some of the key/value pairs.
In the situation you're describing, the *lack* of an expected key is itself grounds for a WARNING and failing the whole import_expressions() assuming that the version parameter indicated that servers of that vintage knew about that parameter. I'd be ok with it.
bv_get_float4_datum() and jbv_get_int4_datum() are alike, they could
be refactored as a single function that takes a type name in put for
the error message and an input function pointer. That would cut some
code.
Yes, a bit. That's also true of their predecessors text_to_float4 and text_to_int4, so it's a savings, but not unique to this implementation.
> So...does this look better? Is it worth it?
Well, it does to me. The pg_dump bits based on jsonb_build_object()
are rather attractive in shape. This addresses the concerns I had
about item ordering and unicity of the keys in a natural way through
the data type filtering.
I had actually figured that the pg_dump part would actually turn you OFF of this approach.
The JSONB type is a bit of a saving grace in that if you had duplicate input keys {"x": "y", "x": "z"} it'll rewrite that as {"x": ["z","y"]} and that *is* easy for us to catch in the existing tooling. That's one of the few areas where I liked what the jsonb api did for us.
I don't see a need for key_lookup(), its sole caller is
key_lookup_notnull().
A lot of this code was exhumed from the v1 patch of the relation/attribute stats functions [1], and they can be taken further to verify the json-type of the value (in our case we'd only want numeric/string test variants, or perhaps only the string-test variant.
+ 'exprs', '{{0,4,-0.75,"{1}","{0.5}","{-1,0}",-0.6,NULL,NULL,NULL},{0.25,4,-0.5,"{2}","{0.5}",NULL,1,NULL,NULL,NULL}}'::text[]);
This is in the documentation, and would need fixing.
Lots of things need fixing if we're going this route. Part of this patch was to see if the results horrified you, and since they clearly haven't, the fixing is now worth it.
+WARNING: invalid expression elements most_common_vals and most_common_freqs: conflicting NULL/NOT NULL
For this warning in the regression tests, you should only need one
element, reducing the number of input lines?
That's true IF we decide that missing expression keys are a thing that we allow, and per conversation above it's not clear yet that we do.
The patch has a query checking the contents of pg_stats_ext_exprs for
test_mr_stat, no imports. This links to the multirange case mentioned
in import_pg_statistic().
There *is* an input, we just have to check output from pg_stats_ext() first, and then pg_stats_ext_exprs(). I suppose I could combine the queries but chose not to for simplicity.
As those comments mention, there are attribute-level stat types specific for ranges that are not reflected in extended stats via ANALYZE, so importing expressions on test_mr isn't all that interesting, but it's there for completeness.
Something that looks missing here is that we do not attach to which
negative attribute an item of "exprs" links to. It seems to me that
this needs to include an extra key called "attribute" with a negative
associated with it, so as we can rely on the attnum instead of the
order of the items restored.
Oof. So there is no such attribute number exposed in pg_stat_ext_exprs(), we have to rely on the deterministic order that they are fetched from the view. The "expr" text definition is essentially a node tree, which we can't guarantee stays stable across major versions, so that's no help. jsonb_agg() doesn't guarantee order on it's own, but it can be enforced via an ORDINALITY-like trick. I had to do this before, can I excavate that code again. Thanks for the reminder.
> (Many other corrections redacted).
+1 to all cases.
Since clearly, you're on board with the jsonb idea, I'll make the next version all about cleanups and tightening up this method.
The outstanding questions from the above are, given an jsonb tree of [ { "null_frac": 0.5, ...}, {"avg_width": "1", ...}], which for notation I'll call an array of expr1 and expr2
1. If an expected key is missing from either expr1 or expr2, is that worth failing the whole import_expressions()?
2. Is an unexpected key in either expr1 or expr2 worth failing the whole import_expressions()?
3. If the answer to #2 is no, do we need to warn about the existence of the weird parameter at all, or just silently skip it?
4. Does casting the numeric scalar values (null_frac, correlation, avg_width, n_distinct) to text make sense, since we have to put them through type-specific input functions anyway?
5. Do any of these jsonb-twiddling operations look generically useful enough to put in jsonb_utils.c?
(my answers would be 1. no, 2. no, 3. can't skip it, have to issue a warning on the first weird param, but it doesn't fail anything 4. yes, and 5. maybe jbvString to cstring and/or jbvString to TextDatum)
В списке pgsql-hackers по дате отправления: