Re: Statistics Import and Export

Поиск
Список
Период
Сортировка
От Corey Huinker
Тема Re: Statistics Import and Export
Дата
Msg-id CADkLM=cercsx5t7SoznVBgnqFxO1U5ZE4pb-cJ6mMRRutHqg=A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Statistics Import and Export  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Список pgsql-hackers


On Sun, Mar 10, 2024 at 11:57 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Fri, Mar 8, 2024 at 12:06 PM Corey Huinker <corey.huinker@gmail.com> wrote:
>
> Anyway, here's v7. Eagerly awaiting feedback.

Thanks for working on this. It looks useful to have the ability to
restore the stats after upgrade and restore. But, the exported stats
are valid only until the next ANALYZE is run on the table. IIUC,
postgres collects stats during VACUUM, autovacuum and ANALYZE, right?
 Perhaps there are other ways to collect stats. I'm thinking what
problems does the user face if they are just asked to run ANALYZE on
the tables (I'm assuming ANALYZE doesn't block concurrent access to
the tables) instead of automatically exporting stats.

Correct. These are just as temporary as any other analyze of the table. Another analyze will happen later, probably through autovacuum and wipe out these values. This is designed to QUICKLY get stats into a table to enable the database to be operational sooner. This is especially important after an upgrade/restore, when all stats were wiped out. Other uses could be adapting this for use the postgres_fdw so that we don't have to do table sampling on the remote table, and of course statistics injection to test the query planner.
 
2.
+        they are replaced by the next auto-analyze. This function is used by
+        <command>pg_upgrade</command> and <command>pg_restore</command> to
+        convey the statistics from the old system version into the new one.
+       </para>

Is there any demonstration of pg_set_relation_stats and
pg_set_attribute_stats being used either in pg_upgrade or in
pg_restore? Perhaps, having them as 0002, 0003 and so on patches might
show real need for functions like this. It also clarifies how these
functions pull stats from tables on the old cluster to the tables on
the new cluster.

That code was adapted from do_analyze(), and yes, there is a patch for pg_dump, but as I noted earlier it is on hold pending feedback.
 

3. pg_set_relation_stats and pg_set_attribute_stats seem to be writing
to pg_class and might affect the plans as stats can get tampered. Can
we REVOKE the execute permissions from the public out of the box in
src/backend/catalog/system_functions.sql? This way one can decide who
to give permissions to.

You'd have to be the table owner to alter the stats. I can envision these functions getting a special role, but they could also be fine as superuser-only.
 

4.
+SELECT pg_set_relation_stats('stats_export_import.test'::regclass,
3.6::float4, 15000);
+ pg_set_relation_stats
+-----------------------
+ t
+(1 row)
+
+SELECT reltuples, relpages FROM pg_class WHERE oid =
'stats_export_import.test'::regclass;
+ reltuples | relpages
+-----------+----------
+       3.6 |    15000

Isn't this test case showing a misuse of these functions? Table
actually has  no rows, but we are lying to the postgres optimizer on
stats.

Consider this case. You want to know at what point the query planner will start using a given index. You can generate dummy data for a thousand, a million, a billion rows, and wait for that to complete, or you can just tell the table "I say you have a billion rows, twenty million pages, etc" and see when it changes.

But again, in most cases, you're setting the values to the same values the table had on the old database just before the restore/upgrade. 
 
I think altering stats of a table mustn't be that easy for the
end user.

Only easy for the end users that happen to be the table owner or a superuser.
 
As mentioned in comment #3, permissions need to be
tightened. In addition, we can also mark the functions pg_upgrade only
with CHECK_IS_BINARY_UPGRADE, but that might not work for pg_restore
(or I don't know if we have a way to know within the server that the
server is running for pg_restore).

I think they will have usage both in postgres_fdw and for tuning.
 

5. In continuation to the comment #2, is pg_dump supposed to generate
pg_set_relation_stats and pg_set_attribute_stats statements for each
table? When pg_dump does that , pg_restore can automatically load the
stats.

Current plan is to have one TOC entry in the post-data section with a dependency on the table/index/matview. That let's us leverage existing filters. The TOC entry will have a series of statements in it, one pg_set_relation_stats() and one pg_set_attribute_stats() per attribute.
 
9. Isn't it good to add a test case where the plan of a query on table
after exporting the stats would remain same as that of the original
table from which the stats are exported? IMO, this is a more realistic
than just comparing pg_statistic of the tables because this is what an
end-user wants eventually.

I'm sure we can add something like that, but query plan formats change a lot and are greatly dependent on database configuration, so maintaining such a test would be a lot of work.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Magnus Hagander
Дата:
Сообщение: Re: pg_dump include/exclude fails to error
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Failures in constraints regression test, "read only 0 of 8192 bytes"