Re: Autovacuum on partitioned table (autoanalyze)
От | Alvaro Herrera |
---|---|
Тема | Re: Autovacuum on partitioned table (autoanalyze) |
Дата | |
Msg-id | 202108112233.xpz74i5pkgkr@alvherre.pgsql обсуждение исходный текст |
Ответ на | Re: Autovacuum on partitioned table (autoanalyze) (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: Autovacuum on partitioned table (autoanalyze)
(Andres Freund <andres@anarazel.de>)
|
Список | pgsql-hackers |
After thinking about the described issues for a while, my proposal is to completely revamp the way this feature works. See below. Now, the proposal seems awfully invasive, but it's *the* way I see to avoid the pgstat traffic. For pg14, maybe we can live with it, and just use the smaller patches that Horiguchi-san and I have posted, which solve the other issues; also, Euler Taveira suggested that we could add a reloption to turn the feature off completely for some tables (maybe make it off by default and have a reloption to turn it on for specific partition hierarchies), so that it doesn't cause unduly pain for people with large partitioning hierarchies. * PgStat_StatTabEntry gets a new "Oid reportAncestorOid" member. This is the OID of a single partitioned ancestor, to which the changed-tuple counts are propagated up. Normally this is the topmost ancestor; but if the user wishes some intermediate ancestor to receive the counts they can use ALTER TABLE the_intermediate_ancestor SET (autovacuum_enabled=on). * Corollary 1: for the normal case of single-level partitioning, the parent partitioned table behaves as currently. * Corollary 2: for multi-level partitioning with no especially configured intermediate ancestors, only the leaf partitions and the top-level partitioned table will be analyzed. Intermediate ancestors are ignored by autovacuum. * Corollary 3: for multi-level partitioning with some intermediate ancestor(s) marked as autovacuum_enabled=on, that ancestor will receive all the counts from all of its partitions, so it will get analyzed itself; and it'll also forward those counts up to its report-ancestor. * On ALTER TABLE .. ATTACH PARTITION or CREATE TABLE PARTITION AS, we send a message to collector with the analyze-ancestor OID. * Backends running manual ANALYZE as well as autovacuum will examine each table's "relispartition" flag and its pgstat table entry; if it is a partition and doesn't have reportAncestorOid set, determine which ancestor should analyze counts be reported to; include this OID in the regular PgStat_MsgAnalyze. This fixes the situation after a crash or other stats reset. Also, it's not unduly expensive to do, because it's only in the rare case that the value sent by ATTACH was lost. * Possible race condition in the previous step may cause multiple backends to send the same info. Not a serious problem so we don't try to handle it. * When tuple change counts for a partition are received by pgstat_recv_tabstat, they are propagated up to the indicated parent table in addition to being saved in the table itself. (Bonus points: when a table is attached or detached as a partition, the live tuples count is propagated to the newly acquired parent.) What do people think of this? -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Thomas MunroДата:
Сообщение: Re: Worth using personality(ADDR_NO_RANDOMIZE) for EXEC_BACKEND on linux?
Следующее
От: Tomas VondraДата:
Сообщение: Re: Use extended statistics to estimate (Var op Var) clauses