Re: simple patch for discussion
От | David Rowley |
---|---|
Тема | Re: simple patch for discussion |
Дата | |
Msg-id | CAApHDvpnV9a5hOh6b+NjuQPy6cvu0nfLs+mW+QQd_ebHXv9T6g@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: simple patch for discussion (Andres Freund <andres@anarazel.de>) |
Ответы |
simple patch for discussion
|
Список | pgsql-hackers |
On Fri, 18 Jul 2025 at 05:03, Andres Freund <andres@anarazel.de> wrote: > Right now we basically assume that the benefit of parallelism reduces > substantially with every additional parallel worker, but for things like > seqscans that's really not true. I've seen reasonably-close-to-linear > scalability for parallel seqscans up to 48 workers (the CPUs in the system I > tested on). Given that our degree-of-parallelism logic doesn't really make > sense. What you're saying is true, but the problem with doing what's proposed is that giving so many more workers to 1 query just increases the chances that some other plan being executed gets no workers because they're all in use. The problem with that is that the disadvantage of giving a parallel plan zero workers is absolutely worse than the advantage you get from giving a parallel plan additional workers. The reason for this is that the planner doesn't parallelise the cheapest serial plan. It picks the cheapest plan based on the assumption that whatever number of workers compute_parallel_worker() calculates will be available for use during execution, and the larger the number that function returns, the greater the chance you have of getting a parallel plan due to how the planner divides the Path costs by the calculated number of parallel workers. I could imagine that there might be room to add more configuration to how compute_parallel_worker() calculates the return value. I don't think there's any room to swap it out with something as aggressive as what's being proposed without any means for users to have something slightly more conservative like what's there today. There is already a complaint in [1] that states we should be trying to reduce the number of concurrent backends in order to reduce context switching. What's being proposed here just makes that problem worse. I suggest to Greg that he might want to come up with a method to make this configurable and a means to get something close to what we get today and default the setting to that. It would also be easier to follow any proposed algorithms with some example output. I used the attached .c file to give me that. There's quite a jump in workers with the proposed algorithm, e.g.: Table Size = 1024 MB old workers = 5, new workers = 12 Table Size = 1048576 MB old workers = 11, new workers = 363 So I kinda doubt we could get away with such a radical change without upsetting people that run more than 1 concurrent parallelisable query on their server. David [1] https://www.postgresql.org/message-id/a5916f83-de79-4a40-933a-fb0d9ba2f5a0@app.fastmail.com
Вложения
В списке pgsql-hackers по дате отправления: