I can reproduce this problem with the query below.
explain (costs on) select * from tenk1 order by twenty;
QUERY PLAN
---------------------------------------------------------------------------------
Gather Merge (cost=772.11..830.93 rows=5882 width=244)
Workers Planned: 1
-> Sort (cost=772.10..786.80 rows=5882 width=244)
Sort Key: twenty
-> Parallel Seq Scan on tenk1 (cost=0.00..403.82 rows=5882 width=244)
(5 rows)
On Tue, Jul 16, 2024 at 3:56 PM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:
> The initial goal was to use the source tuples if available and avoid
> possible rounding errors. Though I realise that the difference would
> be minimal. For example, 200K tuples and 3 workers would yield
> int(int(200000 / 2.4) * 2.4)=199999. That is probably not worth the
> additional complexity, I've updated the patch to just use
> gather_rows_estimate.
I wonder if the changes in create_ordered_paths should also be reduced
to 'total_groups = gather_rows_estimate(path);'.
> I've also realised from the comments in optimizer.h that
> nodes/pathnodes.h should not be included there and fixed it.
I think perhaps it's better to declare gather_rows_estimate() in
cost.h rather than optimizer.h.
(BTW, I wonder if compute_gather_rows() would be a better name?)
I noticed another issue in generate_useful_gather_paths() -- *rowsp
would have a random value if override_rows is true and we use
incremental sort for gather merge. I think we should fix this too.
Thanks
Richard