Re: Potential bug introduced in PG17 with query parallelization - plan flip
| От | David Rowley | 
|---|---|
| Тема | Re: Potential bug introduced in PG17 with query parallelization - plan flip | 
| Дата | |
| Msg-id | CAApHDvqgT-xap-+GZAxBUmvJOoqGE4jqf1TibpSrew0zMBqXAQ@mail.gmail.com обсуждение исходный текст  | 
		
| Ответ на | Potential bug introduced in PG17 with query parallelization - plan flip (Jon Jenkins <jjenkins@gitlab.com>) | 
| Список | pgsql-hackers | 
On Fri, 31 Oct 2025 at 10:40, Jon Jenkins <jjenkins@gitlab.com> wrote: > -> Index Scan using index_on_merge_requests_for_latest_diffs > on public.merge_requests (cost=0.57..1011.36 rows=773 width=4) > (actual time=0.015..10.901 rows=9271 loops=1) > Index Cond: (merge_requests.target_project_id = 16567575) > However, on 17, a "more expensive" plan is proposed that does a gather > between workers, but largely performs the exact same: > -> Index Scan using > index_merge_requests_on_target_project_id_and_created_at_and_id > (cost=0.57..19664.54 rows=8262 width=4) (actual time=0.021..7.890 > rows=4498 loops=2) In the two fragments above, you'll notice the row estimates are quite different. I suspect this is the reason v17 went with the parallel plan. What could cause this would be just a difference in sampled rows from ANALYZE. If you're still in a testing environment, do you see the plan change back to the non-parallel version if you run ANALYZE on the merge_requests table and then run the query again? Try it a few times. You might find that the rows that are sampled sometimes leads to better or worse estimates. It's a bit of a fact of life if you have large skewed data as it's not practical to remember how many of each and every value exist in the statistics. You could increase the number by increasing the statistics targets for that column, per [1]. That might still not be enough, however. If you can't go high enough, you might be able to do something by setting it to not gather stats for this column and then tuning n_distinct for the column to some value that suits. The drawback there is that doing that is fraught with the possible difficulties of having the planner always thinking there's the same number of rows for every target_project_id. You might end up doing an Index Scan when a Seq Scan is better (which it could be if you had a particular target_project_id that dominated the table). > I don't know enough about planner internals to specifically track this > code, but I did see that this patch has been added to 17 that deals > with parallelizing: > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=e08d74ca1 This isn't related to what you're seeing. David [1] https://www.postgresql.org/docs/current/sql-altertable.html#SQL-ALTERTABLE-DESC-SET-STATISTICS
В списке pgsql-hackers по дате отправления: