Re: Add parallel columns for seq scan and index scan on pg_stat_all_tables and _indexes
| От | Bertrand Drouvot |
|---|---|
| Тема | Re: Add parallel columns for seq scan and index scan on pg_stat_all_tables and _indexes |
| Дата | |
| Msg-id | aWCz5D3vkbhIlCpX@ip-10-97-1-34.eu-west-3.compute.internal обсуждение исходный текст |
| Ответ на | Re: Add parallel columns for seq scan and index scan on pg_stat_all_tables and _indexes (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>) |
| Список | pgsql-hackers |
Hi, On Tue, Nov 12, 2024 at 12:41:19PM +0900, Michael Paquier wrote: > On Mon, Nov 11, 2024 at 11:06:43AM -0500, Robert Haas wrote: > > But it is unclear to me what sort of tuning we would do based on > > knowing how many of the scans on a certain table or a certain index > > were parallel vs non-parallel. I have not fully reviewed the threads > > linked in the original post; but I did look at them briefly and did > > not immediately see discussion of the specific counters proposed here. > > I also don't see anything in this thread that clearly explains why we > > should want this exact thing. I don't want to make it sound like I > > know that this is useless; I'm sure that Guillaume probably has lots > > of hands-on tuning experience with this stuff that I lack. But the > > reasons aren't clearly spelled out as far as I can see, and I'm having > > some trouble imagining what they are. > > Thanks for the summary. My main worry is that these are kind of hard > to act on for tuning when aggregated at relation level (Guillaume, > feel free to counter-argue!). The main point that comes into mind is > that single table scans would be mostly involved with OLTP workloads > or simple joins, where parallel workers are of little use. That could > be much more interesting for analytical-ish workloads with more > complex plan pattern where one or more Gather or GatherMerge nodes are > involved. Still, even in this case I suspect that most users will > finish by looking at plan patterns, and that these counters added for > index or tables would have a limited impact at the end. While working on flushing stats outside of transaction boundaries (patch not shared yet but linked to [1]), I realized that parallel workers could lead to incomplete and misleading statistics. Indeed, they update "their" relation stats during their shutdown regardless of the "main" transaction status. It means that, for example, stats like seq_scan, last_seq_scan and seq_tup_read are updated by the parallel workers during their shutdown while the main transaction has not finished. The stats are then somehow incomplete because the main worker has not updated its stats yet. I think that could lead to misleading stats that a patch like this one could help to address. For example, parallel workers could update parallel_* dedicated stats and leave the non parallel_* stats update responsibility to the main worker when the transaction finishes. That would make the non parallel_* stats consistent whether parallel workers are used or not. Thoughts? [1]: https://www.postgresql.org/message-id/aVvgJu0BhnmzBWZ1@ip-10-97-1-34.eu-west-3.compute.internal Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
В списке pgsql-hackers по дате отправления: