Re: Add statistics to pg_stat_wal view for wal related parameter tuning
От | Fujii Masao |
---|---|
Тема | Re: Add statistics to pg_stat_wal view for wal related parameter tuning |
Дата | |
Msg-id | e28c5f5e-363e-70c1-a200-28a8798ed8e2@oss.nttdata.com обсуждение исходный текст |
Ответ на | Re: Add statistics to pg_stat_wal view for wal related parameter tuning (Masahiro Ikeda <ikedamsh@oss.nttdata.com>) |
Ответы |
Re: Add statistics to pg_stat_wal view for wal related parameter tuning
(Fujii Masao <masao.fujii@oss.nttdata.com>)
Re: Add statistics to pg_stat_wal view for wal related parameter tuning (Masahiro Ikeda <ikedamsh@oss.nttdata.com>) |
Список | pgsql-hackers |
On 2020/11/06 10:25, Masahiro Ikeda wrote: > On 2020-10-30 11:50, Fujii Masao wrote: >> On 2020/10/29 17:03, Masahiro Ikeda wrote: >>> Hi, >>> >>> Thanks for your comments and advice. I updated the patch. >>> >>> On 2020-10-21 18:03, Kyotaro Horiguchi wrote: >>>> At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda >>>> <ikedamsh@oss.nttdata.com> wrote in >>>>> On 2020-10-20 12:46, Amit Kapila wrote: >>>>> > I see that we also need to add extra code to capture these stats (some >>>>> > of which is in performance-critical path especially in >>>>> > XLogInsertRecord) which again makes me a bit uncomfortable. It might >>>>> > be that it is all fine as it is very important to collect these stats >>>>> > at cluster-level in spite that the same information can be gathered at >>>>> > statement-level to help customers but I don't see a very strong case >>>>> > for that in your proposal. >>>> >>>> We should avoid that duplication as possible even if the both number >>>> are important. >>>> >>>>> Also about performance, I thought there are few impacts because it >>>>> increments stats in memory. If I can implement to reuse pgWalUsage's >>>>> value which already collects these stats, there is no impact in >>>>> XLogInsertRecord. >>>>> For example, how about pg_stat_wal() calculates the accumulated >>>>> value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's >>>>> value? >>>> >>>> I don't think that works, but it would work that pgstat_send_wal() >>>> takes the difference of that values between two successive calls. >>>> >>>> WalUsage prevWalUsage; >>>> ... >>>> pgstat_send_wal() >>>> { >>>> .. >>>> /* fill in some values using pgWalUsage */ >>>> WalStats.m_wal_bytes = pgWalUsage.wal_bytes - prevWalUsage.wal_bytes; >>>> WalStats.m_wal_records = pgWalUsage.wal_records - prevWalUsage.wal_records; >>>> WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi - prevWalUsage.wal_fpi; >>>> ... >>>> pgstat_send(&WalStats, sizeof(WalStats)); >>>> >>>> /* remember the current numbers */ >>>> prevWalUsage = pgWalUsage; >>> >>> Thanks for Horiguchi-san's advice, I changed to reuse pgWalUsage >>> which is already defined and eliminates the extra overhead. >> >> + /* fill in some values using pgWalUsage */ >> + WalStats.m_wal_bytes = pgWalUsage.wal_bytes - prevWalUsage.wal_bytes; >> + WalStats.m_wal_records = pgWalUsage.wal_records - prevWalUsage.wal_records; >> + WalStats.m_wal_fpi = pgWalUsage.wal_fpi - prevWalUsage.wal_fpi; >> >> It's better to use WalUsageAccumDiff() here? > > Yes, thanks. I fixed it. > >> prevWalUsage needs to be initialized with pgWalUsage? >> >> + if (AmWalWriterProcess()){ >> + WalStats.m_wal_write_walwriter++; >> + } >> + else >> + { >> + WalStats.m_wal_write_backend++; >> + } >> >> I think that it's better not to separate m_wal_write_xxx into two for >> walwriter and other processes. Instead, we can use one m_wal_write_xxx >> counter and make pgstat_send_wal() send also the process type to >> the stats collector. Then the stats collector can accumulate the counters >> per process type if necessary. If we adopt this approach, we can easily >> extend pg_stat_wal so that any fields can be reported per process type. > > I'll remove the above source code because these counters are not useful. > > > On 2020-10-30 12:00, Fujii Masao wrote: >> On 2020/10/20 11:31, Masahiro Ikeda wrote: >>> Hi, >>> >>> I think we need to add some statistics to pg_stat_wal view. >>> >>> Although there are some parameter related WAL, >>> there are few statistics for tuning them. >>> >>> I think it's better to provide the following statistics. >>> Please let me know your comments. >>> >>> ``` >>> postgres=# SELECT * from pg_stat_wal; >>> -[ RECORD 1 ]-------+------------------------------ >>> wal_records | 2000224 >>> wal_fpi | 47 >>> wal_bytes | 248216337 >>> wal_buffers_full | 20954 >>> wal_init_file | 8 >>> wal_write_backend | 20960 >>> wal_write_walwriter | 46 >>> wal_write_time | 51 >>> wal_sync_backend | 7 >>> wal_sync_walwriter | 8 >>> wal_sync_time | 0 >>> stats_reset | 2020-10-20 11:04:51.307771+09 >>> ``` >>> >>> 1. Basic statistics of WAL activity >>> >>> - wal_records: Total number of WAL records generated >>> - wal_fpi: Total number of WAL full page images generated >>> - wal_bytes: Total amount of WAL bytes generated >>> >>> To understand DB's performance, first, we will check the performance >>> trends for the entire database instance. >>> For example, if the number of wal_fpi becomes higher, users may tune >>> "wal_compression", "checkpoint_timeout" and so on. >>> >>> Although users can check the above statistics via EXPLAIN, auto_explain, >>> autovacuum and pg_stat_statements now, >>> if users want to see the performance trends for the entire database, >>> they must recalculate the statistics. >>> >>> I think it is useful to add the sum of the basic statistics. >>> >>> >>> 2. WAL segment file creation >>> >>> - wal_init_file: Total number of WAL segment files created. >>> >>> To create a new WAL file may have an impact on the performance of >>> a write-heavy workload generating lots of WAL. If this number is reported high, >>> to reduce the number of this initialization, we can tune WAL-related parameters >>> so that more "recycled" WAL files can be held. >>> >>> >>> >>> 3. Number of when WAL is flushed >>> >>> - wal_write_backend : Total number of WAL data written to the disk by backends >>> - wal_write_walwriter : Total number of WAL data written to the disk by walwriter >>> - wal_sync_backend : Total number of WAL data synced to the disk by backends >>> - wal_sync_walwriter : Total number of WAL data synced to the disk by walwrite >>> >>> I think it's useful for tuning "synchronous_commit" and "commit_delay" for query executions. >>> If the number of WAL is flushed is high, users can know "synchronous_commit" is useful for the workload. >> >> I just wonder how useful these counters are. Even without these counters, >> we already know synchronous_commit=off is likely to cause the better >> performance (but has the risk of data loss). So ISTM that these counters are >> not so useful when tuning synchronous_commit. > > Thanks, my understanding was wrong. > I agreed that your comments. > > I merged the statistics of *_backend and *_walwriter. > I think the sum of them is useful to calculate the average per write/sync time. > For example, per write time is equals wal_write_time / wal_write. Understood. Thanks for updating the patch! patching file src/include/catalog/pg_proc.dat Hunk #1 FAILED at 5491. 1 out of 1 hunk FAILED -- saving rejects to file src/include/catalog/pg_proc.dat.rej I got this failure when applying the patch. Could you update the patch? - Number of times WAL data was written to the disk because WAL buffers got full + Total number of times WAL data written to the disk because WAL buffers got full Isn't "was" necessary between "data" and "written"? + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>wal_bytes</structfield> <type>bigint</type> Shouldn't the type of wal_bytes be numeric because the total number of WAL bytes can exceed the range of bigint? I think that the type of pg_stat_statements.wal_bytes is also numeric for the same reason. + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>wal_write_time</structfield> <type>bigint</type> Shouldn't the type of wal_xxx_time be double precision, like pg_stat_database.blk_write_time? Even when fsync is set to off or wal_sync_method is set to open_sync, wal_sync is incremented. Isn't this behavior confusing? + Total amount of time that has been spent in the portion of + WAL data was written to disk by backend and walwriter, in milliseconds + (if <xref linkend="guc-track-io-timing"/> is enabled, otherwise zero) With the patch, track_io_timing controls both IO for data files and WAL files. But we may want to track only either of them. So it's better to extend track_io_timing so that we can specify the tracking target in the parameter? For example, we can make track_io_timing accept data, wal and all. Or we should introduce new GUC for WAL, e.g., track_wal_io_timing? Thought? I'm afraid that "by backend and walwriter" part can make us thinkg incorrectly that WAL writes by other processes like autovacuum are not tracked. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
В списке pgsql-hackers по дате отправления:
Следующее
От: Dilip KumarДата:
Сообщение: Re: logical streaming of xacts via test_decoding is broken