Re: Statistics Import and Export

Поиск
Список
Период
Сортировка
От Corey Huinker
Тема Re: Statistics Import and Export
Дата
Msg-id CADkLM=chb8YT+OD40aDDj1MGz52FMw6-SW3atgEpsw6Rp-3mhg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Statistics Import and Export  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers
The "restore" use case is the primary point of your patch, and that
should be as simple and future-proof as possible. The parameters should
be name/value pairs and there shouldn't be any "control" parameters --
it's not the job of pg_dump to specify whether the restore should be
transactional or in-place, it should just output the necessary stats.

That restore function might be good enough to satisfy the "ad-hoc stats
hacking" use case as well, but I suspect we want slightly different
behavior. Specifically, I think we'd want the updates to be
transactional rather than in-place, or at least optional.

Point well taken.

Both function pairs now call a generic internal function.

Which is to say that pg_set_relation_stats and pg_restore_relation_stats both accept parameters in their own way, and both call 
an internal function relation_statistics_update(), each with their own defaults.

pg_set_relation_stats always leaves "version" NULL, does transactional updates, and treats any data quality issue as an ERROR. This is is in line with a person manually tweaking stats to check against a query to see if the plan changes.

pg_restore_relation_stats does in-place updates, and steps down all errors to warnings. The stats may not write, but at least it won't fail the pg_upgrade for you.

pg_set_attribute_stats is error-maximalist like pg_set_relation_stats. pg_restore_attribute_stats never had an in-place option to begin with.


 

> The leading OUT parameters tell us the rel/attribute/inh affected (if
> any), and which params had to be rejected for whatever reason. The
> kwargs is the variadic key-value pairs that we were using for all
> stat functions, but now we will be using it for all parameters, both
> statistics and control, the control parameters will be:

I don't like the idea of mixing statistics and control parameters in
the same list.

There's no way around it, at least now we need never worry about a confusing order for the parameters in the _restore_ functions because they can now be in any order you like. But that speaks to another point: there is no "you" in using the restore functions, those function calls will almost exclusively be generated by pg_dump and we can all live rich and productive lives never having seen one written down. I kid, but they're actually not that gross.

Here is a -set function taken from the regression tests:

SELECT pg_catalog.pg_set_attribute_stats(
    relation => 'stats_import.test'::regclass::oid,
    attname => 'arange'::name,
    inherited => false::boolean,
    null_frac => 0.5::real,
    avg_width => 2::integer,
    n_distinct => -0.1::real,
    range_empty_frac => 0.5::real,
    range_length_histogram => '{399,499,Infinity}'::text
    );
 pg_set_attribute_stats
------------------------
 
(1 row)


and here is a restore function

-- warning: mcv cast failure
SELECT *
FROM pg_catalog.pg_restore_attribute_stats(
    'relation', 'stats_import.test'::regclass::oid,
    'attname', 'id'::name,
    'inherited', false::boolean,
    'version', 150000::integer,
    'null_frac', 0.5::real,
    'avg_width', 2::integer,
    'n_distinct', -0.4::real,
    'most_common_vals', '{2,four,3}'::text,
    'most_common_freqs', '{0.3,0.25,0.05}'::real[]
    );
WARNING:  invalid input syntax for type integer: "four"
 row_written |          stats_applied           |            stats_rejected            | params_rejected
-------------+----------------------------------+--------------------------------------+-----------------
 t           | {null_frac,avg_width,n_distinct} | {most_common_vals,most_common_freqs} |
(1 row)


There's a few things going on here:

1. An intentionally bad, impossible to write, value was put in 'most_common_vals'. 'four' cannot cast to integer, so the value fails, and we get a warning
2. Because most_common_values failed, we can no longer construct a legit STAKIND_MCV, so we have to throw out most_common_freqs with it.
3. Those failures aren't enough to prevent us from writing the other stats, so we write the record, and report the row written, the stats we could write, the stats we couldn't, and a list of other parameters we entered that didn't make sense and had to be rejected (empty).

Overall, I'd say the format is on the pedantic side, but it's far from unreadable, and mixing control parameters (version) with stats parameters isn't that big a deal.


I do like the idea of returning a set, but I think it should be the
positive set (effectively a representation of what is now in the
pg_stats view) and any ignored settings would be output as WARNINGs.

Displaying the actual stats in pg_stats could get very, very big. So I wouldn't recommend that.

What do you think of the example presented earlier?

Attached is v25.

Key changes:
- Each set/restore function pair now each call a common function that does the heavy lifting, and the callers mostly marshall parameters into the right spot and form the result set (really just one row).
- The restore functions now have all parameters passed in via a variadic any[].
- the set functions now error out on just about any discrepancy, and do not have a result tuple.
- test cases simplified a bit. There's still a lot of them, and I think that's a good thing.
- Documentation to reflect significant reorganization.
- pg_dump modified to generate new function signatures.
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: Fix overflow in pg_size_pretty
Следующее
От: Joseph Koshakow
Дата:
Сообщение: Re: Fix overflow in pg_size_pretty