Обсуждение: Vacuum statistics
Hello, everyone! I think we don't have enough information to analyze vacuum functionality. Needless to say that the vacuum is the most important process for a database system. It prevents problems like table and index bloating and emergency freezing if we have a wraparound problem. Furthermore, it keeps the visibility map up to date. On the other hand, because of incorrectly adjusted aggressive settings of autovacuum it can consume a lot of computing resources that lead to all queries to the system running longer. Nowadays the vacuum gathers statistical information about tables, but it is important not for optimizer only. Because the vacuum is an automation process, there are a lot of settings that determine their aggressive functionality to other objects of the database. Besides, sometimes it is important to set a correct parameter for the specified table, because of its dynamic changes. An administrator of a database needs to set the settings of autovacuum to have a balance between the vacuum's useful action in the database system on the one hand, and the overhead of its workload on the other. However, it is not enough for him to decide on vacuum functionality through statistical information about the number of vacuum passes through tables and operational data from progress_vacuum, because it is available only during vacuum operation and does not provide a strategic overview over the considered period. To sum up, an automation vacuum has a strategic behavior because the frequency of its functionality and resource consumption depends on the workload of the database. Its workload on the database is minimal for an append-only table and it is a maximum for the table with a high-frequency updating. Furthermore, there is a high dependence of the vacuum load on the number and volume of indexes. Because of the absence of the visibility map for indexes, the vacuum scans the index completely, and the worst situation when it needs to do it during a bloating index situation in a small table. I suggest gathering information about vacuum resource consumption for processing indexes and tables and storing it in the table and index relationships (for example, PgStat_StatTabEntry structure like it has realized for usual statistics). It will allow us to determine how well the vacuum is configured and evaluate the effect of overhead on the system at the strategic level, the vacuum has gathered this information already, but this valuable information doesn't store it. -- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 30.05.2024 10:33, Alena Rybakina wrote: > > I suggest gathering information about vacuum resource consumption for > processing indexes and tables and storing it in the table and index > relationships (for example, PgStat_StatTabEntry structure like it has > realized for usual statistics). It will allow us to determine how well > the vacuum is configured and evaluate the effect of overhead on the > system at the strategic level, the vacuum has gathered this > information already, but this valuable information doesn't store it. > My colleagues and I have prepared a patch that can help to solve this problem. We are open to feedback. -- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Вложения
Hi, Th, 30/05/2024 at 10:33 -0700, Alena Rybakina wrote: > I suggest gathering information about vacuum resource consumption for > processing indexes and tables and storing it in the table and index > relationships (for example, PgStat_StatTabEntry structure like it has > realized for usual statistics). It will allow us to determine how > well > the vacuum is configured and evaluate the effect of overhead on the > system at the strategic level, the vacuum has gathered this > information > already, but this valuable information doesn't store it. > It seems a little bit unclear to me, so let me explain a little the point of a proposition. As the vacuum process is a backend it has a workload instrumentation. We have all the basic counters available such as a number of blocks read, hit and written, time spent on I/O, WAL stats and so on.. Also, we can easily get some statistics specific to vacuum activity i.e. number of tuples removed, number of blocks removed, number of VM marks set and, of course the most important metric - time spent on vacuum operation. All those statistics must be stored by the Cumulative Statistics System on per-relation basis. I mean individual cumulative counters for every table and every index in the database. Such counters will provide us a clear view about vacuum workload on individual objects of the database, providing means to measure the efficiency of performed vacuum fine tuning. -- Andrei Zubkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Thu, May 30, 2024 at 11:57 PM Alena Rybakina <lena.ribackina@yandex.ru> wrote: > > On 30.05.2024 10:33, Alena Rybakina wrote: > > > > I suggest gathering information about vacuum resource consumption for > > processing indexes and tables and storing it in the table and index > > relationships (for example, PgStat_StatTabEntry structure like it has > > realized for usual statistics). It will allow us to determine how well > > the vacuum is configured and evaluate the effect of overhead on the > > system at the strategic level, the vacuum has gathered this > > information already, but this valuable information doesn't store it. > > > My colleagues and I have prepared a patch that can help to solve this > problem. > > We are open to feedback. I was reading through the patch here are some initial comments. -- +typedef struct LVExtStatCounters +{ + TimestampTz time; + PGRUsage ru; + WalUsage walusage; + BufferUsage bufusage; + int64 VacuumPageMiss; + int64 VacuumPageHit; + int64 VacuumPageDirty; + double VacuumDelayTime; + PgStat_Counter blocks_fetched; + PgStat_Counter blocks_hit; +} LVExtStatCounters; I noticed that you are storing both pgBufferUsage and VacuumPage(Hit/Miss/Dirty) stats. Aren't these essentially the same? It seems they both exist in the system because some code, like heap_vacuum_rel(), uses pgBufferUsage, while do_analyze_rel() still relies on the old counters. And there is already a patch to remove those old counters. -- +static Datum +pg_stats_vacuum(FunctionCallInfo fcinfo, ExtVacReportType type, int ncolumns) +{ I don't think you need this last parameter (ncolumns) we can anyway fetch that from tupledesc, so adding an additional parameter just for checking doesn't look good to me. -- + /* Tricky turn here: enforce pgstat to think that our database us dbid */ + + MyDatabaseId = dbid; typo /think that our database us dbid/think that our database has dbid Also, remove the blank line between the comment and the next code block that is related to that comment. -- VacuumPageDirty = 0; + VacuumDelayTime = 0.; There is an extra "." after 0 -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Hi! Thank you for your interest in this topic!
I agree with you and I have fixed it.On Thu, May 30, 2024 at 11:57 PM Alena Rybakina <lena.ribackina@yandex.ru> wrote:On 30.05.2024 10:33, Alena Rybakina wrote:I suggest gathering information about vacuum resource consumption for processing indexes and tables and storing it in the table and index relationships (for example, PgStat_StatTabEntry structure like it has realized for usual statistics). It will allow us to determine how well the vacuum is configured and evaluate the effect of overhead on the system at the strategic level, the vacuum has gathered this information already, but this valuable information doesn't store it.My colleagues and I have prepared a patch that can help to solve this problem. We are open to feedback.I was reading through the patch here are some initial comments. -- +typedef struct LVExtStatCounters +{ + TimestampTz time; + PGRUsage ru; + WalUsage walusage; + BufferUsage bufusage; + int64 VacuumPageMiss; + int64 VacuumPageHit; + int64 VacuumPageDirty; + double VacuumDelayTime; + PgStat_Counter blocks_fetched; + PgStat_Counter blocks_hit; +} LVExtStatCounters; I noticed that you are storing both pgBufferUsage and VacuumPage(Hit/Miss/Dirty) stats. Aren't these essentially the same? It seems they both exist in the system because some code, like heap_vacuum_rel(), uses pgBufferUsage, while do_analyze_rel() still relies on the old counters. And there is already a patch to remove those old counters.
To be honest, I'm not sure if ncolumns should be deleted at all, because the pg_stats_vacuum function is used to display three different types of statistics: for tables, indexes, and databases. We use this parameter to pass information about the number of parameters (or how many statistics we expect) depending on the type of statistics. For example, table vacuum statistics contain 27 parameters, while indexes and databases contain 19 and 15 parameters, respectively. You can see that the pg_stats_vacuum function contains an Assert that checks that the expected number of tupledesc parameters matches the actual number.-- +static Datum +pg_stats_vacuum(FunctionCallInfo fcinfo, ExtVacReportType type, int ncolumns) +{ I don't think you need this last parameter (ncolumns) we can anyway fetch that from tupledesc, so adding an additional parameter just for checking doesn't look good to me.
Assert(tupdesc->natts == ncolumns);
Perhaps I can convert it to a local parameter and determine its value already in the function, for example:
pg_stats_vacuum(FunctionCallInfo fcinfo, ExtVacReportType type, int ncolumns)
{
int columns = 0;
switch (type)
{
case PGSTAT_EXTVAC_HEAP:
ncolumns = EXTVACHEAPSTAT_COLUMNS;
break;
case PGSTAT_EXTVAC_INDEX:
ncolumns = EXTVACINDEXSTAT_COLUMNS;
break;
case PGSTAT_EXTVAC_DB:
ncolumns = EXTVACDBSTAT_COLUMNS;
break; }
...
}
What do you think?
-- + /* Tricky turn here: enforce pgstat to think that our database us dbid */ + + MyDatabaseId = dbid; typo /think that our database us dbid/think that our database has dbid Also, remove the blank line between the comment and the next code block that is related to that comment. -- VacuumPageDirty = 0; + VacuumDelayTime = 0.; There is an extra "." after 0
Thank you, I fixed it.
In addition to these changes, I fixed the problem with displaying vacuum statistics for databases: I found an error in defining the pg_stats_vacuum_database system view. In addition, I rewrote the tests and converted them into a regression test. In addition, I have divided the test to test the functionality of the output of vacuum statistics into two tests: one of them checks the functionality of tables and databases, and the other - indexes. This is caused by a problem with the vacuum functionality when the table contains an index. You can find more information about this here: [0] and [1].
I attached the diff to this letter.
[0] https://www.postgresql.org/message-id/d1ca3a1d-7ead-41a7-bfd0-5b66ad97b1cd%40yandex.ru
I am currently working on dividing this patch into three parts to simplify the review process: one of them will contain code for collecting vacuum statistics on tables, the second on indexes and the last on databases. I also write the documentation.
-- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Вложения
Hi!
I have divided the patch into three: the first patch contains code for the functionality of collecting and storage for tables, the second one for indexes and the last one for databases.On 08.06.2024 09:30, Alena Rybakina wrote:I am currently working on dividing this patch into three parts to simplify the review process: one of them will contain code for collecting vacuum statistics on tables, the second on indexes and the last on databases.
-- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Hi!
I have divided the patch into three: the first patch contains code for the functionality of collecting and storage for tables, the second one for indexes and the last one for databases.On 08.06.2024 09:30, Alena Rybakina wrote:I am currently working on dividing this patch into three parts to simplify the review process: one of them will contain code for collecting vacuum statistics on tables, the second on indexes and the last on databases.
-- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Вложения
I have written the documentary and attached the patch.
I am currently working on dividing this patch into three parts to simplify the review process: one of them will contain code for collecting vacuum statistics on tables, the second on indexes and the last on databases. I also write the documentation.
-- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Вложения
On Fri, May 31, 2024 at 4:19 AM Andrei Zubkov <zubkov@moonset.ru> wrote: > > Hi, > > Th, 30/05/2024 at 10:33 -0700, Alena Rybakina wrote: > > I suggest gathering information about vacuum resource consumption for > > processing indexes and tables and storing it in the table and index > > relationships (for example, PgStat_StatTabEntry structure like it has > > realized for usual statistics). It will allow us to determine how > > well > > the vacuum is configured and evaluate the effect of overhead on the > > system at the strategic level, the vacuum has gathered this > > information > > already, but this valuable information doesn't store it. > > > It seems a little bit unclear to me, so let me explain a little the > point of a proposition. > > As the vacuum process is a backend it has a workload instrumentation. > We have all the basic counters available such as a number of blocks > read, hit and written, time spent on I/O, WAL stats and so on.. Also, > we can easily get some statistics specific to vacuum activity i.e. > number of tuples removed, number of blocks removed, number of VM marks > set and, of course the most important metric - time spent on vacuum > operation. I've not reviewed the patch closely but it sounds helpful for users. I would like to add a statistic, the high-water mark of memory usage of dead tuple TIDs. Since the amount of memory used by TidStore is hard to predict, I think showing the high-water mark would help users to predict how much memory they set to maintenance_work_mem. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Hello! On Thu, 27/06/2024 at 10:39 +0900, Masahiko Sawada: > On Fri, May 31, 2024 at 4:19 AM Andrei Zubkov <zubkov@moonset.ru> > wrote: > > As the vacuum process is a backend it has a workload > > instrumentation. > > We have all the basic counters available such as a number of blocks > > read, hit and written, time spent on I/O, WAL stats and so on.. > > Also, > > we can easily get some statistics specific to vacuum activity i.e. > > number of tuples removed, number of blocks removed, number of VM > > marks > > set and, of course the most important metric - time spent on vacuum > > operation. > > I've not reviewed the patch closely but it sounds helpful for users. > I > would like to add a statistic, the high-water mark of memory usage of > dead tuple TIDs. Since the amount of memory used by TidStore is hard > to predict, I think showing the high-water mark would help users to > predict how much memory they set to maintenance_work_mem. > Thank you for your interest on this patch. I've understand your idea. The obvious goal of it is to avoid expensive index multi processing during vacuum of the heap. Provided statistics in the patch contain the index_vacuum_count counter for each table which can be compared to the pg_stat_all_tables.vacuum_count to detect specific relation index multi-passes. Previous setting of maintenance_work_mem is known. Usage of TidStore should be proportional to the amount of dead-tuples vacuum workload on the table, so as the first evaluation we can take the number of index passes per one heap pass as a maintenance_work_mem multiplier. But there is a better way. Once we detected the index multiprocessing we can lower the vacuum workload for the heap pass making vacuum a little bit more aggressive for this particular relation. I mean, in such case increasing maintenance_work_mem is not only decision. Suggested high-water mark statistic can't be used as cumulative statistic - any high-water mark statistic as maximim-like statistic is valid for certain time period thus should be reset on some kind of schedule. Without resets it should reach 100% once under the heavy load and stay there forever. Said that such high-water mark seems a little bit unclear and complicated for the DBA. It seems redundant to me right now. I can see the main value of such statistic is to avoid too large maintenance_work_mem setting. But I can't see really dramatic consequences of that. Maybe I've miss something.. -- Andrei Zubkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company