Обсуждение: Re: Extended Statistics set/restore/clear functions.
hi.
I reviewed 0001 only.
in src/backend/statistics/mvdistinct.c
no need #include "nodes/pg_list.h" since
src/include/statistics/statistics.h sub level include "nodes/pg_list.h"
no need #include "utils/palloc.h"
sicne #include "postgres.h"
already included it.
select '[{"6, -32768,,": -11}]'::pg_ndistinct;
ERROR: malformed pg_ndistinct: "[{"6, -32768,,": -11}]"
LINE 1: select '[{"6, -32768,,": -11}]'::pg_ndistinct;
^
DETAIL: All ndistinct count values are scalar doubles.
imho, this errdetail message is not good.
select '{}'::pg_ndistinct ;
segfault
select '{"1,":"1"}'::pg_ndistinct ;
ERROR: malformed pg_ndistinct: "{"1,":"1"}"
LINE 1: select '{"1,":"1"}'::pg_ndistinct ;
^
DETAIL: All ndistinct attnum lists must be a comma separated list of attnums.
imho, this errdetail message is not good. would be better saying that
"length of list of attnums must be larger than 1".
select pt1.typnamespace, pt1.typarray, pt1.typcategory, pt1.typname
from pg_type pt1
where pt1.typname ~*'distinct';
typnamespace | typarray | typcategory | typname
--------------+----------+-------------+--------------
11 | 0 | Z | pg_ndistinct
typcategory (Z) marked as Internal-use types. and there is no
pg_ndistinct array type,
not sure this is fine.
all the errcode one pair of the parenthesis is unnecessary.
for example
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
errmsg("malformed pg_ndistinct: \"%s\"", parse->str),
errdetail("Must begin with \"{\"")));
can change to
errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
errmsg("malformed pg_ndistinct: \"%s\"", parse->str),
errdetail("Must begin with \"{\""));
see https://www.postgresql.org/docs/current/error-message-reporting.html
hi.
select '{"1, 0B100101":"NaN"}'::pg_ndistinct;
pg_ndistinct
------------------------
{"1, 37": -2147483648}
(1 row)
this is not what we expected?
For the VALUE part of pg_ndistinct, float8 has 3 special values: inf, -inf, NaN.
For the key part of pg_ndistinct, see example.
select '{"1, 16\t":"1"}'::pg_ndistinct;
here \t is not tab character, ascii 9. it's two characters: backslash
and character "t".
so here it should error out?
(apply this to \n, \r, \b)
pg_ndistinct_in(PG_FUNCTION_ARGS)
ending part should be:
freeJsonLexContext(lex);
if (result == JSON_SUCCESS)
{
......
}
else
{
ereturn(parse_state.escontext, (Datum) 0,
errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
errmsg("malformed pg_ndistinct: \"%s\"", str),
errdetail("Must be valid JSON."));
PG_RETURN_NULL();
}
result should be either JSON_SUCCESS or anything else.
all these functions:
ndistinct_object_start, ndistinct_array_start,
ndistinct_object_field_start, ndistinct_array_element_start
have
ndistinctParseState *parse = state;
do we need to change it to
ndistinctParseState *parse = (ndistinctParseState *)state;
?
ndistinctParseState need to add to src/tools/pgindent/typedefs.list
probably change it to "typedef struct ndistinctParseState".
also struct ndistinctParseState need placed in the top of
src/backend/statistics/mvdistinct.c
not in line 340?
On Tue, Jan 28, 2025 at 11:25 AM jian he <jian.universality@gmail.com> wrote:
hi.
I reviewed 0001 only.
in src/backend/statistics/mvdistinct.c
no need #include "nodes/pg_list.h" since
src/include/statistics/statistics.h sub level include "nodes/pg_list.h"
no need #include "utils/palloc.h"
sicne #include "postgres.h"
already included it.
Noted.
select '[{"6, -32768,,": -11}]'::pg_ndistinct;
ERROR: malformed pg_ndistinct: "[{"6, -32768,,": -11}]"
LINE 1: select '[{"6, -32768,,": -11}]'::pg_ndistinct;
^
DETAIL: All ndistinct count values are scalar doubles.
imho, this errdetail message is not good.
What error message do you think is appropriate in that situation?
select '{}'::pg_ndistinct ;
segfault
Mmm, gotta look into that!
select '{"1,":"1"}'::pg_ndistinct ;
ERROR: malformed pg_ndistinct: "{"1,":"1"}"
LINE 1: select '{"1,":"1"}'::pg_ndistinct ;
^
DETAIL: All ndistinct attnum lists must be a comma separated list of attnums.
imho, this errdetail message is not good. would be better saying that
"length of list of attnums must be larger than 1".
That sounds better.
typcategory (Z) marked as Internal-use types. and there is no
pg_ndistinct array type,
not sure this is fine.
I think it's probably ok for now. The datatype currently has no utility other than extended statistics, and I'm doubtful that it ever will.
On Wed, Jan 29, 2025 at 2:50 AM jian he <jian.universality@gmail.com> wrote:
hi.
select '{"1, 0B100101":"NaN"}'::pg_ndistinct;
pg_ndistinct
------------------------
{"1, 37": -2147483648}
(1 row)
I think my initial reaction is to just refuse those special values, but I'll look into the parsing code to see what can be done.
this is not what we expected?
For the VALUE part of pg_ndistinct, float8 has 3 special values: inf, -inf, NaN.
For the key part of pg_ndistinct, see example.
select '{"1, 16\t":"1"}'::pg_ndistinct;
here \t is not tab character, ascii 9. it's two characters: backslash
and character "t".
so here it should error out?
(apply this to \n, \r, \b)
I don't have a good answer as to what should happen here. Special cases like this make Tomas' suggestion to change the in/out format more attractive.
pg_ndistinct_in(PG_FUNCTION_ARGS)
ending part should be:
freeJsonLexContext(lex);
if (result == JSON_SUCCESS)
{
......
}
else
{
ereturn(parse_state.escontext, (Datum) 0,
errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
errmsg("malformed pg_ndistinct: \"%s\"", str),
errdetail("Must be valid JSON."));
PG_RETURN_NULL();
}
result should be either JSON_SUCCESS or anything else.
all these functions:
ndistinct_object_start, ndistinct_array_start,
ndistinct_object_field_start, ndistinct_array_element_start
have
ndistinctParseState *parse = state;
do we need to change it to
ndistinctParseState *parse = (ndistinctParseState *)state;
?
The compiler isn't complaining so far, but I see no harm in it.
select '{"1, 0B100101":"NaN"}'::pg_ndistinct;
pg_ndistinct
------------------------
{"1, 37": -2147483648}
(1 row)I think my initial reaction is to just refuse those special values, but I'll look into the parsing code to see what can be done.
I noticed that the output function for pg_ndistinct casts that value to an integer before formatting it %d, so it's being treated as an integer even if it is not stored as one. After some consultation with Tomas, it made the most sense to just replicate this on the input side as well, and that is addressed in the patches below.
I've updated and rebased the patches.
The existing pg_ndistinct and pg_dependences formats were kept as-is. The formats are clumsy, more processing-friendly formats would be easier, but the need for such processing is minimal bordering on theoretical, so there is little impact in keeping the historical format.
There are now checks to ensure that the pg_ndistinct or pg_dependencies value assigned to an extended statistics object actually makes sense for that object. What this amounts to is checking that for every attnum cited, the positive attnums are also ones found the in the stxkeys of the pg_statistic_ext tuple, and the negative attnums correspond do not exceed the number of expressions in the attnum. In other words, if the stats object has no expressions in it, then no negative numbers will be accepted, if it has 2 expressions than any value -3 or lower will be rejected, etc.
All patches rebased to 71f17823ba010296da9946bd906bb8bcad6325bc.
There are now checks to ensure that the pg_ndistinct or pg_dependencies value assigned to an extended statistics object actually makes sense for that object. What this amounts to is checking that for every attnum cited, the positive attnums are also ones found the in the stxkeys of the pg_statistic_ext tuple, and the negative attnums correspond do not exceed the number of expressions in the attnum. In other words, if the stats object has no expressions in it, then no negative numbers will be accepted, if it has 2 expressions than any value -3 or lower will be rejected, etc.
All patches rebased to 71f17823ba010296da9946bd906bb8bcad6325bc.
Вложения
I think my initial reaction is to just refuse those special values, but I'll look into the parsing code to see what can be done.I noticed that the output function for pg_ndistinct casts that value to an integer before formatting it %d, so it's being treated as an integer even if it is not stored as one. After some consultation with Tomas, it made the most sense to just replicate this on the input side as well, and that is addressed in the patches below.I've updated and rebased the patches.The existing pg_ndistinct and pg_dependences formats were kept as-is. The formats are clumsy, more processing-friendly formats would be easier, but the need for such processing is minimal bordering on theoretical, so there is little impact in keeping the historical format.
There are now checks to ensure that the pg_ndistinct or pg_dependencies value assigned to an extended statistics object actually makes sense for that object. What this amounts to is checking that for every attnum cited, the positive attnums are also ones found the in the stxkeys of the pg_statistic_ext tuple, and the negative attnums correspond do not exceed the number of expressions in the attnum. In other words, if the stats object has no expressions in it, then no negative numbers will be accepted, if it has 2 expressions than any value -3 or lower will be rejected, etc.
All patches rebased to 71f17823ba010296da9946bd906bb8bcad6325bc.
A rebasing, and a few changes
* regnamespace and name parameters changed to statistics_schemaname as text and statistics_name as text, so that there's one less thing that can potentially fail in an upgrade
* schema lookup and stat name lookup failures now issue a warning and return false, rather than ERROR
* elevel replaced with hardcoded WARNING most everywhere, as has been done with relation/attribute stats
Вложения
On Mon, Mar 31, 2025 at 1:10 AM Corey Huinker <corey.huinker@gmail.com> wrote:
Just rebasing.
At pgconf.dev this year, the subject of changing the formats of pg_ndistinct and pg_depdentencies came up again.
To recap: presently these datatypes have no working input function, but would need one for statistics import to work on extended statistics. The existing input formats are technically JSON, but the keys themselves are a comma-separated list of attnums, so they require additional parsing. That parsing is already done in the patches in this thread, but overall the format is terrible for any sort of manipulation, like the manipulation that people might want to do to translate the values to a table with a different column order (say, after a restore of a table that had dropped columns), or to do query planner experiments.
Because the old formats don't have a corresponding input function, there is no risk of the ouptut not matching required inputs, but there will be once we add new input functions, so this is our last chance to change the format to something we like better.
The old format can be trivially translated via functions posted earlier in this thread back in January (pg_xlat_ndistinct_to_attnames, pg_xlat_dependencies_to_attnames) as well as the reverse (s/_to_/_from_/), so dumping values from older versions will not be difficult.
The old format can be trivially translated via functions posted earlier in this thread back in January (pg_xlat_ndistinct_to_attnames, pg_xlat_dependencies_to_attnames) as well as the reverse (s/_to_/_from_/), so dumping values from older versions will not be difficult.
I believe that we should take this opportunity to make the change. While we don't have a pressing need to manipulate these structures now, we might in the future and failing to do so now makes a later change much harder.
With that in mind, I'd like people to have a look at the proposed format change if pg_ndistinct (the changes to pg_dependencies are similar), to see if they want to make any improvements or comments. As you can see, the new format is much less compact (about 3x as large), which could get bad if the number of elements grew by a lot, but the number of elements is tied to the number of factors in the extended support (N choose N, then N choose N-1, etc, excluding choose 1), so this can't get too out of hand.
Existing format (newlines/formatting added by me to make head-to-head comparison easier):
'{"2, 3": 4,
'{"2, 3": 4,
"2, -1": 4,
"2, -2": 4,
"3, -1": 4,
"3, -2": 4,
"-1, -2": 3,
"2, 3, -1": 4,
"2, 3, -2": 4,
"2, -1, -2": 4,
"3, -1, -2": 4}'::pg_ndistinct
Proposed new format (again, all formatting here is just for ease of humans reading):
' [ {"attributes" : [2,3], "ndistinct" : 4},
{"attributes" : [2,-1], "ndistinct" : 4},
{"attributes" : [2,-2], "ndistinct" : 4},
{"attributes" : [3,-1], "ndistinct" : 4},
{"attributes" : [3,-2], "ndistinct" : 4},
{"attributes" : [-1,-2], "ndistinct" : 3},
{"attributes" : [2,3,-1], "ndistinct" : 4},
{"attributes" : [2,3,-2], "ndistinct" : 4},
{"attributes" : [2,-1,-2], "ndistinct" : 4},
{"attributes" : [3,-1,-2], "ndistinct" : 4}]'::pg_ndistinct
Proposed new format (again, all formatting here is just for ease of humans reading):
' [ {"attributes" : [2,3], "ndistinct" : 4},
{"attributes" : [2,-1], "ndistinct" : 4},
{"attributes" : [2,-2], "ndistinct" : 4},
{"attributes" : [3,-1], "ndistinct" : 4},
{"attributes" : [3,-2], "ndistinct" : 4},
{"attributes" : [-1,-2], "ndistinct" : 3},
{"attributes" : [2,3,-1], "ndistinct" : 4},
{"attributes" : [2,3,-2], "ndistinct" : 4},
{"attributes" : [2,-1,-2], "ndistinct" : 4},
{"attributes" : [3,-1,-2], "ndistinct" : 4}]'::pg_ndistinct
The pg_dependencies structure is only slightly more complex:
An abbreviated example:
{"2 => 1": 1.000000, "2 => -1": 1.000000, ..., "2, -2 => -1": 1.000000, "3, -1 => 2": 1.000000},
Becomes:
[ {"attributes": [2], "dependency": 1, "degree": 1.000000},
{"attributes": [2], "dependency": -1, "degree": 1.000000},
{"attributes": [2, -2], "dependency": -1, "degree": 1.000000},
...,
...,
{"attributes": [2, -2], "dependency": -1, "degree": 1.000000},
{"attributes": [3, -1], "dependency": 2, "degree": 1.000000}]
{"attributes": [3, -1], "dependency": 2, "degree": 1.000000}]
Any thoughts on using/improving these structures?
Any thoughts on using/improving these structures?
Hearing no objections, here is the latest patchset.
0001 - Changes input/output functions of pg_ndistinct to the format described earlier.
0002 - Changes input/output functions of pg_dependencies to the format described earlier.
0003 - Makes some previously internal/static attribute stats functions visible to extended_stats.c, because the exprs attribute is basically an array of partially filled-out pg_statistic rows.
0004 - Adds pg_restore_attribute_stats(), pg_clear_attribute_stats(), in the pattern of their relation/attribute brethren.
0005 - adds the dumping and restoring of extended statistics back to v10. No command line flag changes needed.
Вложения
- v4-0001-Refactor-output-format-of-pg_ndistinct-and-add-wo.patch
- v4-0002-Refactor-output-format-of-pg_dependencies-and-add.patch
- v4-0003-Expose-attribute-statistics-functions-for-use-in-.patch
- v4-0004-Add-extended-statistics-support-functions.patch
- v4-0005-Include-Extended-Statistics-in-pg_dump.patch
On Sat, Jun 21, 2025 at 8:12 PM Corey Huinker <corey.huinker@gmail.com> wrote:
Any thoughts on using/improving these structures?Hearing no objections, here is the latest patchset.0001 - Changes input/output functions of pg_ndistinct to the format described earlier.0002 - Changes input/output functions of pg_dependencies to the format described earlier.0003 - Makes some previously internal/static attribute stats functions visible to extended_stats.c, because the exprs attribute is basically an array of partially filled-out pg_statistic rows.0004 - Adds pg_restore_attribute_stats(), pg_clear_attribute_stats(), in the pattern of their relation/attribute brethren.0005 - adds the dumping and restoring of extended statistics back to v10. No command line flag changes needed.
Rebased. Enjoy.
Вложения
- v5-0001-Refactor-output-format-of-pg_ndistinct-and-add-wo.patch
- v5-0002-Refactor-output-format-of-pg_dependencies-and-add.patch
- v5-0003-Expose-attribute-statistics-functions-for-use-in-.patch
- v5-0004-Add-extended-statistics-support-functions.patch
- v5-0005-Include-Extended-Statistics-in-pg_dump.patch
Rebased. Enjoy.
Rebased.
Вложения
- v6-0001-Refactor-output-format-of-pg_ndistinct-and-add-wo.patch
- v6-0002-Refactor-output-format-of-pg_dependencies-and-add.patch
- v6-0003-Expose-attribute-statistics-functions-for-use-in-.patch
- v6-0004-Add-extended-statistics-support-functions.patch
- v6-0005-Include-Extended-Statistics-in-pg_dump.patch
On Fri, Sep 12, 2025 at 12:55 AM Corey Huinker <corey.huinker@gmail.com> wrote:
Rebased. Enjoy.Rebased.
And rebased again to conform to 688dc6299 and 4bd919129.
Вложения
- v7-0001-Refactor-output-format-of-pg_ndistinct-and-add-wo.patch
- v7-0002-Refactor-output-format-of-pg_dependencies-and-add.patch
- v7-0003-Expose-attribute-statistics-functions-for-use-in-.patch
- v7-0004-Add-extended-statistics-support-functions.patch
- v7-0005-Include-Extended-Statistics-in-pg_dump.patch
On Sat, Oct 18, 2025 at 08:27:58PM -0400, Corey Huinker wrote: > And rebased again to conform to 688dc6299 and 4bd919129. The format redesign for extended stats is pretty nice, as done in 0001. I think that this patch should be split in two, actually, as it tackles two issues: - One patch for the format change. - Second patch for the introduction of the input function, useful on its own to allow more regression tests paths for the format generated. Similar split comment for 0002 regarding pg_dependencies. The format change is one thing. The input function is cool to have for input validation, still not absolutely mandatory for the core part of the format change. The functions exposed in 0003 should be renamed to match more with the style of the rest, aka it is a bit hard to figure out what they do at first sight. Presumably, these should be prefixed with some "statext_", except text_to_stavalues() which could still be named the same. Do you have some numbers regarding the increase in size this generates for the catalogs? 0004 has been designed following the same model as the relation and attribute stats. That sounds OK here. +enum extended_stats_argnum [...] +enum extended_stats_exprs_element It would be nice to document why such things are around. That would be less guessing for somebody reading the code. Reusing this small sequence from your pg_dump patch, executed on a v14 backend: create schema dump_test; CREATE TABLE dump_test.has_ext_stats AS SELECT g.g AS x, g.g / 2 AS y FROM generate_series(1,100) AS g(g); CREATE STATISTICS dump_test.es1 ON x, (y % 2) FROM dump_test.has_ext_stats; ANALYZE dump_test.has_ext_stats; Then pg_dump fails: pg_dump: error: query failed: ERROR: column e.inherited does not exist LINE 2: ...hemaname = $1 AND e.statistics_name = $2 ORDER BY e.inherite... + * TODO: Until v18 is released the master branch has a + * server_version_num of 180000. We will update this to 190000 as soon + * as the master branch updates. This part has not been updated. + Assert(item.nattributes > 0); /* TODO: elog? */ [...] + Assert(dependency->nattributes > 1); /* TODO: elog? */ Yes and yes. It seems like it should be possible to craft some input that triggers these.. +void +free_pg_dependencies(MVDependencies *dependencies); Double declaration of this routine in dependencies.c. Perhaps some of the regression tests could use some jsonb_pretty() in the outputs generated. Some of the results generated are very hard to parse, something that would become harder in the buildfarm. This comment starts with 0001 for stxdndistinct. I have mixed feelings about 0005, FWIW. I am wondering if we should not lift the needle a bit here and only support the dump of extended statistics when dealing with a backend of at least v19. This would mean that we would only get the full benefit of this feature once people upgrade to v20 or dump from a pg_dump with --statistics from at least v19, but with the long-term picture in mind this would also make the dump/restore picture of the patch dead simple (spoiler: I like simple). Tomas, what is your take about the format changes and my argument about the backward requirements of pg_dump (about not dumping these stats if connecting to a server older than v18, included)? -- Michael
Вложения
On Tue, Oct 21, 2025 at 02:48:37PM +0900, Michael Paquier wrote: > Tomas, what is your take about the format changes and my argument > about the backward requirements of pg_dump (about not dumping these > stats if connecting to a server older than v18, included)? By the way, the patch is still needed in the previous CF of September: https://commitfest.postgresql.org/patch/5517/ You may want to move it. -- Michael
Вложения
The functions exposed in 0003 should be renamed to match more with the
style of the rest, aka it is a bit hard to figure out what they do at
first sight. Presumably, these should be prefixed with some
"statext_", except text_to_stavalues() which could still be named the
same.
Do you have some numbers regarding the increase in size this generates
for the catalogs?
0004 has been designed following the same model as the relation and
attribute stats. That sounds OK here.
+enum extended_stats_argnum
[...]
+enum extended_stats_exprs_element
It would be nice to document why such things are around. That would
be less guessing for somebody reading the code.
Reusing this small sequence from your pg_dump patch, executed on a v14
backend:
create schema dump_test;
CREATE TABLE dump_test.has_ext_stats
AS SELECT g.g AS x, g.g / 2 AS y FROM generate_series(1,100) AS g(g);
CREATE STATISTICS dump_test.es1 ON x, (y % 2) FROM dump_test.has_ext_stats;
ANALYZE dump_test.has_ext_stats;
Then pg_dump fails:
pg_dump: error: query failed: ERROR: column e.inherited does not exist
LINE 2: ...hemaname = $1 AND e.statistics_name = $2 ORDER BY e.inherite...
Noted.
+ * TODO: Until v18 is released the master branch has a
+ * server_version_num of 180000. We will update this to 190000
as soon
+ * as the master branch updates.
This part has not been updated.
+ Assert(item.nattributes > 0); /* TODO: elog? */
[...]
+ Assert(dependency->nattributes > 1); /* TODO: elog? */
Yes and yes. It seems like it should be possible to craft some input
that triggers these..
+1
+void
+free_pg_dependencies(MVDependencies *dependencies);
Double declaration of this routine in dependencies.c.
Perhaps some of the regression tests could use some jsonb_pretty() in
the outputs generated. Some of the results generated are very hard to
parse, something that would become harder in the buildfarm. This
comment starts with 0001 for stxdndistinct.
I have mixed feelings about 0005, FWIW. I am wondering if we should
not lift the needle a bit here and only support the dump of extended
statistics when dealing with a backend of at least v19. This would
mean that we would only get the full benefit of this feature once
people upgrade to v20 or dump from a pg_dump with --statistics from at
least v19, but with the long-term picture in mind this would also make
the dump/restore picture of the patch dead simple (spoiler: I like
simple).
I also like simple.
Right now we have a situation where the vast majority of databases can carry forward all of their stats via pg_upgrade, except for those databases that have extended stats. The trouble is, most customers don't know if their database uses extended statistics or not, and those that do are in for some bad query plans if they haven't run vacuumdb --missing-stats-only. Explaining that to customers is complicated, especially when most of them do not know what extended stats are, let alone whether they have them. It would be a lot simpler to just say "all stats are carried over on upgrade", and vacuumdb becomes unnecessary, making upgrades one step simpler as well.
Given that, I think that the admittedly ugly transformation is worth it, and sequestering it inside pg_dump is the smallest footprint it can have. Earlier in this thread I posted some functions that did the translation from the existing formats to the proposed new formats. We could include those as new system functions, and that would make the dump code very simple. Having said that, I don't know that there would be use for those functions except inside pg_dump, hence the decision to do the transforms right in the dump query.
If the format translation is a barrier to fetching existing extended stats, then I'd be more inclined to keep the existing pg_ndistinct and pg_dependencies data formats as they are now.
On Wed, Oct 22, 2025 at 02:55:31PM +0300, Corey Huinker wrote: >> Do you have some numbers regarding the increase in size this generates >> for the catalogs? > > Sorry, I don't understand. There shouldn't be any increase inside the > catalogs as the internal storage of the datatypes hasn't changed, so I can > only conclude that you're referring to something else. The new format meant more characters, perhaps I've just missed something while quickly testing the patch.. Anyway, that's OK at this stage. > The equivalent structures in attribute_stats.c will need documenting too. Right. This sounds like a separate patch to me, impacting HEAD. > Right now we have a situation where the vast majority of databases can > carry forward all of their stats via pg_upgrade, except for those databases > that have extended stats. The trouble is, most customers don't know if > their database uses extended statistics or not, and those that do are in > for some bad query plans if they haven't run vacuumdb --missing-stats-only. > Explaining that to customers is complicated, especially when most of them > do not know what extended stats are, let alone whether they have them. It > would be a lot simpler to just say "all stats are carried over on upgrade", > and vacuumdb becomes unnecessary, making upgrades one step simpler as well. Okay. > Given that, I think that the admittedly ugly transformation is worth it, > and sequestering it inside pg_dump is the smallest footprint it can have. > Earlier in this thread I posted some functions that did the translation > from the existing formats to the proposed new formats. We could include > those as new system functions, and that would make the dump code very > simple. Having said that, I don't know that there would be use for those > functions except inside pg_dump, hence the decision to do the transforms > right in the dump query. I'd prefer the new format. One killer pushing in favor of the new format that you are making upthread in favor of is that it makes much easier the viewing, editing and injecting of these stats. It's the part of the patch where we would need Tomas' input on the matter before deciding anything, I guess, as primary author of the original facilities. My view of the problem is just one opinion. > If the format translation is a barrier to fetching existing extended stats, > then I'd be more inclined to keep the existing pg_ndistinct and > pg_dependencies data formats as they are now. Not necessarily, it can be possible to also take that in multiple steps rather than a single one: - First do the basics in v19 with the new format. - Raise the bar to older versions. -- Michael
Вложения
On Thu, Oct 23, 2025 at 08:46:27AM +0900, Michael Paquier wrote: > I'd prefer the new format. One killer pushing in favor of the new > format that you are making upthread in favor of is that it makes much > easier the viewing, editing and injecting of these stats. This is missing an "argument", as in "One killer argument pushing in favor.." :D -- Michael