Re: Add expressions to pg_restore_extended_stats()
| От | Michael Paquier |
|---|---|
| Тема | Re: Add expressions to pg_restore_extended_stats() |
| Дата | |
| Msg-id | aYRvI45hkV5oKvIC@paquier.xyz обсуждение исходный текст |
| Ответ на | Re: Add expressions to pg_restore_extended_stats() (Corey Huinker <corey.huinker@gmail.com>) |
| Список | pgsql-hackers |
On Wed, Feb 04, 2026 at 11:42:16PM -0500, Corey Huinker wrote: >> v5-0001 still requires a lot of changes at quick glance, as of: >> - Some diffs not required in some comment blocks. > > Kept the s/stxdexprs/stxdexpr, as that is the correct column name, the > other changes are reverted. As in the attribute of pg_statistic_ext_data. Fixed this one separately. > We did need to do this, and it has been implemented, along with a new test > on a new statistics object which has a to_tsvector(text_column) expression > in it. Thanks. I guess that makes more sense now. The coverage is good to have. > Multirange types in expressions generate exactly 2 > stakinds: STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM (6) > and STATISTIC_KIND_BOUNDS_HISTOGRAM (7), neither of which are exposed by > pg_stats_ext_exprs, so they can't be imported, so at present there is > nothing to test and we can skip the multirange step-down for now. > Inspecting the stavalues1 and stavalues2 values in pg_statistic_ext_data > gives me the impression that we don't need the step-down, as the values > appear to be regular range types, and the exiting function to get the > element type should suffice. > > We should probably modify the view to include these new-ish statistic > kinds, but that's for another patch. Hmm. Okay. I'll study more this point. > Moving to the found[] and vals[] arrays cleans up the logic a lot, I think, > which is all the more important now that we keep building the expressions > after components of that statistic fail. v6-0001 has less fuzz, thanks for cleaning up the whole. I am looking at the patch, and immediately noted two concepts that bump into my eyes and look rather non-reliable. + if (!sta_ok) + *exprs_is_perfect = false; + isnull = false; This bit looks incorrect to me? If !sta_ok (or !row_is_perfect in import_pg_statistic()), then why is a non-NULL value inserted into the resulting array? If we fail to parse even one field or miss one key, we should force NULL for this single expression in the worst case and attempt to move on with the rest. But it does not matter anyway because import_expressions() would fail the import. Why don't we just skip the rest of the expressions then? We know that the resulting array will go to the garbage bit and that the restore failed, issuing a WARNING for the statext object. I think that import_expressions() has its logic going backwards, by this I mean that intializing exprs_is_perfect to true could be risky. It seems to me that we should do the exact opposite: initialize it at false, and switch to true *if and only if* all the correct conditions we want are reached. I'd suggest set of gotos and a single exit path at the end of import_expressions() where exprs_is_perfect is set to true to let the caller know that the expression can be safely used. Remember import_mcv(), as one example. Similarly, the same concept should be applied to import_pg_statistic(). row_is_perfect should be false to start, and switched to true once we are sure that everything we want is valid. Also, I think that the format of the dump should be better when it comes to a set of expressions where some of them have invalid data. Even if the stats of an expression are invalid, pg_dump builds a JSON blob for the element of the expression with all the keys and their values set to NULL. I'd suggest to just publish a NULL for the element where invalid stats are found. That makes the dumps shorter as well. -- Michael
Вложения
В списке pgsql-hackers по дате отправления: