Re: Possible incorrect row estimation for Gather paths

Поиск
Список
Период
Сортировка
От Richard Guo
Тема Re: Possible incorrect row estimation for Gather paths
Дата
Msg-id CAMbWs49xxNQZ+j+XU7paZnB20tqj9W-V+R11n-_g7KU31xKbQw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Possible incorrect row estimation for Gather paths  (Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>)
Ответы Re: Possible incorrect row estimation for Gather paths
Список pgsql-hackers
On Mon, Jul 22, 2024 at 4:47 PM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:
> On Wed, Jul 17, 2024 at 3:59 AM Richard Guo <guofenglinux@gmail.com> wrote:
> > I can reproduce this problem with the query below.
> >
> > explain (costs on) select * from tenk1 order by twenty;
> >                                    QUERY PLAN
> > ---------------------------------------------------------------------------------
> >  Gather Merge  (cost=772.11..830.93 rows=5882 width=244)
> >    Workers Planned: 1
> >    ->  Sort  (cost=772.10..786.80 rows=5882 width=244)
> >          Sort Key: twenty
> >          ->  Parallel Seq Scan on tenk1  (cost=0.00..403.82 rows=5882 width=244)
> > (5 rows)
> I was looking for a test to add in the regress checks that wouldn't
> rely on explain cost since it is disabled. However, I've realised I
> could do something similar to what's done in stats_ext with the
> check_estimated_rows function. I've added the get_estimated_rows test
> function that extracts the estimated rows from the top node and uses
> it to check the gather nodes' estimates. get_estimated_rows uses a
> simple explain compared to check_estimated_rows which relies on an
> explain analyze.

Hmm, I'm hesitant about adding the tests that verify the number of
estimated rows in this patch.  The table 'tenk1' isn't created with
autovacuum_enabled = off, so we may have unstable results from
auto-analyze happening.  I think the plan change in join_hash.out is
sufficient to verify the changes in this patch.

> > I wonder if the changes in create_ordered_paths should also be reduced
> > to 'total_groups = gather_rows_estimate(path);'.
>
> It should already be the case with the patch v2. It does create
> rounding errors that are visible in the tests but they shouldn't
> exceed +1 or -1.
>
> > I think perhaps it's better to declare gather_rows_estimate() in
> > cost.h rather than optimizer.h.
> > (BTW, I wonder if compute_gather_rows() would be a better name?)
>
> Good point, I've moved the declaration and renamed it.
>
> > I noticed another issue in generate_useful_gather_paths() -- *rowsp
> > would have a random value if override_rows is true and we use
> > incremental sort for gather merge.  I think we should fix this too.
>
> That seems to be the case. I've tried to find a query that could
> trigger this codepath without success. All grouping and distinct paths
> I've tried where fully sorted and didn't trigger an incremental sort.
> I will need a bit more time to check this.
>
> In the meantime, I've updated the patches with the review comments.

Otherwise I think the v3 patch looks good to me.

Attached is an updated version of this patch with cosmetic changes and
comment updates.  It also squishes the two patches together into one.
I'm planning to push it soon, barring any objections or comments.

Thanks
Richard

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Banck
Дата:
Сообщение: Re: why there is not VACUUM FULL CONCURRENTLY?
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: Windows default locale vs initdb