Re: Add expressions to pg_restore_extended_stats()
| От | Michael Paquier |
|---|---|
| Тема | Re: Add expressions to pg_restore_extended_stats() |
| Дата | |
| Msg-id | aYFR_UqFhO4ZnoSq@paquier.xyz обсуждение исходный текст |
| Ответ на | Re: Add expressions to pg_restore_extended_stats() (Corey Huinker <corey.huinker@gmail.com>) |
| Ответы |
Re: Add expressions to pg_restore_extended_stats()
|
| Список | pgsql-hackers |
On Mon, Feb 02, 2026 at 03:21:35AM -0500, Corey Huinker wrote: > I had actually figured that the pg_dump part would actually turn you OFF of > this approach. Sorry, I guess. :) >> For this warning in the regression tests, you should only need one >> element, reducing the number of input lines? > > That's true IF we decide that missing expression keys are a thing that we > allow, and per conversation above it's not clear yet that we do. If ANALYZE can generate some partial data for a set of expression (aka generating some data for a portion of the expressions defined but skip some of them because of reason x_y_z), the restore function needs to cope with this policy (I'd need to double-check the code to be sure, not sure on top of my mind now and I cannot do that today, unfortunately). >> Something that looks missing here is that we do not attach to which >> negative attribute an item of "exprs" links to. It seems to me that >> this needs to include an extra key called "attribute" with a negative >> associated with it, so as we can rely on the attnum instead of the >> order of the items restored. > > Oof. So there is no such attribute number exposed in pg_stat_ext_exprs(), > we have to rely on the deterministic order that they are fetched from the > view. The "expr" text definition is essentially a node tree, which we can't > guarantee stays stable across major versions, so that's no help. > jsonb_agg() doesn't guarantee order on it's own, but it can be enforced via > an ORDINALITY-like trick. I had to do this before, can I excavate that code > again. Thanks for the reminder. Hmm. I would not rely on the ordering of the items as they are scanned, that seems like a recipe for disaster. We have a text representation of the expression, as of pg_stats_ext_exprs.expr. This could be mapped with the elements of hte text[] in pg_stats_ext.exprs. Here is a wilder idea: why not just putting the expression text itself in the data given in input of the restore function rather than a guessed argument number? For the case of manual stats injections, that kinds of makes things simpler. > Since clearly, you're on board with the jsonb idea, I'll make the next > version all about cleanups and tightening up this method. Yeah, this approach makes sense to me in terms of clarity and item handling for the expression parts. Now, as you have mentioned me offline, there was a discussion about applying this format to other parts of the arguments, which would be around here I guess: https://www.postgresql.org/message-id/CADkLM%3DfhVk3BJ8z20iQW2wGfuOk%2BZSgzNHUVGtLpmvzbQ9Ontg%40mail.gmail.com This area of the discussion focused on the catalog data types we have for dependencies and ndistinct, as well. > The outstanding questions from the above are, given an jsonb tree of [ { > "null_frac": 0.5, ...}, {"avg_width": "1", ...}], which for notation I'll > call an array of expr1 and expr2 > > 1. If an expected key is missing from either expr1 or expr2, is that worth > failing the whole import_expressions()? > 2. Is an unexpected key in either expr1 or expr2 worth failing the whole > import_expressions()? > 3. If the answer to #2 is no, do we need to warn about the existence of the > weird parameter at all, or just silently skip it? Yeah, it sounds to me that we should just set ok=false and give up rather than have a semi-filled set of numbers for a single extended stats object. There is an argument in favor of that: it can simplify the detection of missing stats for a single extended stats definition. I understand that you'd want to keep going with loading the data even if it's partial. My question is: is it possible for ANALYZE to fill in only a portion of the expressions and can these be partially skipped? If the answer to my question is yes, the restore function should do the same and my idea of the matter is wrong. If the answer to my question is no, then your idea on this matter is the right one. > 4. Does casting the numeric scalar values (null_frac, correlation, > avg_width, n_distinct) to text make sense, since we have to put them > through type-specific input functions anyway? In terms of JSON, it makes the use of a representation simpler. I don't think that we need to apply a strict casting when fetching them. > 5. Do any of these jsonb-twiddling operations look generically useful > enough to put in jsonb_utils.c? This one I cannot say yet. It depends on the patch as written. @Tomas (now added in CC for confirmation): would you see a problem against applying a JSONB data type to the argument for the restore of extended stats? This level of data serialization would be required when inserting data for what would show up into pg_stats_ext_exprs. -- Michael
Вложения
В списке pgsql-hackers по дате отправления: