Обсуждение: Metadata and record block access stats for indexes
Hi, For the purpose of writing a blog post I was checking the index stats recorded for a workload, but found them rather confusing. Following along the code with the debugger it eventually made sense, and I could eventually understand what's counted. Looking around a bit, I discovered an older discussion [1] in the mailing lists and learned that the issue is known. The proposal in that thread is to start counting separate metadata and record stats depending on what type of index block is retrieved. I realized those would have helped me better understand the collected index stats, so I started working on a patch to add these in the system views. Attached is a WIP patch file with partial coverage of the B-Tree index code. The implementation follows the existing stats collection approach and the naming convention proposed in [1]. Let me know if what I'm doing is feasible and if there's any concerns I could address. Next steps would be to replace all places where I currently pass in NULL with proper counting, as well as update tests and docs. Looking forward to your feedback! Thanks! Cheers, Mircea [1]: https://www.postgresql.org/message-id/flat/CAH2-WzmdZqxCS1widYzjDAM%2BZ-Jz%3DejJoaWXDVw9Qy1UsK0tLA%40mail.gmail.com
Вложения
Hi, Just attaching the complete patch which now covers all index types, docs and tests. You can run the following to see it in action: create table test (id serial primary key); insert into test select * from generate_series(0,30000); select pg_stat_reset(); select * from test where id=3000; select * from pg_statio_all_indexes where indexrelname = 'test_pkey'; This will show that there were 2 index blocks read from shared buffers (hit): 1 metadata and one record. Cheers, Mircea On 28/02/2025 21:58, Mircea Cadariu wrote: > Hi, > > For the purpose of writing a blog post I was checking the index stats > recorded for a workload, but found them rather confusing. Following > along the code with the debugger it eventually made sense, and I could > eventually understand what's counted. Looking around a bit, I > discovered an older discussion [1] in the mailing lists and learned > that the issue is known. The proposal in that thread is to start > counting separate metadata and record stats depending on what type of > index block is retrieved. > > I realized those would have helped me better understand the collected > index stats, so I started working on a patch to add these in the > system views. Attached is a WIP patch file with partial coverage of > the B-Tree index code. The implementation follows the existing stats > collection approach and the naming convention proposed in [1]. Let me > know if what I'm doing is feasible and if there's any concerns I could > address. Next steps would be to replace all places where I currently > pass in NULL with proper counting, as well as update tests and docs. > > Looking forward to your feedback! Thanks! > > Cheers, > Mircea > > [1]: > https://www.postgresql.org/message-id/flat/CAH2-WzmdZqxCS1widYzjDAM%2BZ-Jz%3DejJoaWXDVw9Qy1UsK0tLA%40mail.gmail.com >
Вложения
On 7/4/25 18:00, Mircea Cadariu wrote:
> Just attaching v2 of the patch.
Hi Mircea,
Your patch applies cleanly and seems to work well.
IIUC, the index hit ratio should be computed with the following formula:
(idx_blks_hit - idx_metadata_blks) / (idx_blks_hit - idx_metadata_blks +
idx_blks_read)
because most of the index non-leaf pages should be in the cache. Right?
This should probably be documented somewhere?
Here is my testing:
# select tree_level, internal_pages, leaf_pages from
pgstatindex('pgbench_accounts_pkey');
tree_level | internal_pages | leaf_pages
------------+----------------+------------
2 | 20 | 5465
(1 row)
# SELECT DISTINCT pg_buffercache_evict(bufferid)
FROM pg_buffercache
WHERE relfilenode = pg_relation_filenode('pgbench_accounts_pkey');
pg_buffercache_evict
----------------------
(t,f)
(1 row)
# SELECT pg_stat_reset();
pg_stat_reset
---------------
(1 row)
# SELECT max(abalance) FROM pgbench_accounts WHERE aid = 100;
max
-----
0
(1 row)
# select idx_blks_read, idx_blks_hit, idx_metadata_blks from
pg_statio_all_indexes where indexrelname = 'pgbench_accounts_pkey';
idx_blks_read | idx_blks_hit | idx_metadata_blks
---------------+--------------+-------------------
3 | 0 | 2
(1 row)
--> 3 pages: the root of the tree, one internal page and one leaf
#
\q
fyhuel@framework:~$ psql bench
psql (19devel)
Type "help" for help.
# SELECT max(abalance) FROM pgbench_accounts WHERE aid = 100;
max
-----
0
(1 row)
primary sleaf bench [42323] # select idx_blks_read, idx_blks_hit,
idx_metadata_blks from pg_statio_all_indexes where indexrelname =
'pgbench_accounts_pkey';
idx_blks_read | idx_blks_hit | idx_metadata_blks
---------------+--------------+-------------------
4 | 3 | 5
--> 4 more pages: same as before, already in cache, plus the index meta
page, read outside shared buffers because we started a new session?
Hi Frédéric, Thanks a lot for trying out my (first) patch! Much appreciated. On 20/07/2025 21:54, Frédéric Yhuel wrote: > Your patch applies cleanly and seems to work well. Cool! > because most of the index non-leaf pages should be in the cache. Right? Yes indeed, it's an assumption in the implementation, that the non-leaf pages will roughly always be in the cache. > This should probably be documented somewhere? I'm still familiarising myself about what to document where, whether things should be in the official docs or separate blog posts. In the patch I only documented the new column next to the existing ones for now. > --> 3 pages: the root of the tree, one internal page and one leaf Yes, this is correct. > primary sleaf bench [42323] # select idx_blks_read, idx_blks_hit, > idx_metadata_blks from pg_statio_all_indexes where indexrelname = > 'pgbench_accounts_pkey'; > idx_blks_read | idx_blks_hit | idx_metadata_blks > ---------------+--------------+------------------- > 4 | 3 | 5 > > --> 4 more pages: same as before, already in cache, plus the index > meta page, read outside shared buffers because we started a new session? Yes, that's my understanding too. Thanks! Kind regards, Mircea Cadariu
Rebased and dusted off this patch.
Вложения
Hi Mircea, > Rebased and dusted off this patch. Thanks for the patch. Here are my two cents. IMO it would be helpful if you could come up with a few more practical use cases. This change is going to affect pretty much everyone. If only a few users will benefit from it once in several years, the value of the patch is arguably low. As an example, can you think of how the new counters can be used for debugging, checking index integrity, writing new access methods or perhaps writing property-based tests? Just several examples that came to my mind first. Also I'm a bit concerned about the performance impact. It's probably next to nothing, but if you could measure it on a relatively large amount of data that would be great. Note that it's not uncommon to have dozens of different indexes for a single table. -- Best regards, Aleksander Alekseev