Re: Possible incorrect row estimation for Gather paths

Поиск

Список

Период

Сортировка

От	Richard Guo
Тема	Re: Possible incorrect row estimation for Gather paths
Дата	22 июля 12:55:47
Msg-id	CAMbWs49xxNQZ+j+XU7paZnB20tqj9W-V+R11n-_g7KU31xKbQw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Possible incorrect row estimation for Gather paths (Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>)
Ответы	Re: Possible incorrect row estimation for Gather paths
Список	pgsql-hackers

Дерево обсуждения

On Mon, Jul 22, 2024 at 4:47 PM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:
> On Wed, Jul 17, 2024 at 3:59 AM Richard Guo <guofenglinux@gmail.com> wrote:
> > I can reproduce this problem with the query below.
> >
> > explain (costs on) select * from tenk1 order by twenty;
> >                                    QUERY PLAN
> > ---------------------------------------------------------------------------------
> >  Gather Merge  (cost=772.11..830.93 rows=5882 width=244)
> >    Workers Planned: 1
> >    ->  Sort  (cost=772.10..786.80 rows=5882 width=244)
> >          Sort Key: twenty
> >          ->  Parallel Seq Scan on tenk1  (cost=0.00..403.82 rows=5882 width=244)
> > (5 rows)
> I was looking for a test to add in the regress checks that wouldn't
> rely on explain cost since it is disabled. However, I've realised I
> could do something similar to what's done in stats_ext with the
> check_estimated_rows function. I've added the get_estimated_rows test
> function that extracts the estimated rows from the top node and uses
> it to check the gather nodes' estimates. get_estimated_rows uses a
> simple explain compared to check_estimated_rows which relies on an
> explain analyze.

Hmm, I'm hesitant about adding the tests that verify the number of
estimated rows in this patch.  The table 'tenk1' isn't created with
autovacuum_enabled = off, so we may have unstable results from
auto-analyze happening.  I think the plan change in join_hash.out is
sufficient to verify the changes in this patch.

> > I wonder if the changes in create_ordered_paths should also be reduced
> > to 'total_groups = gather_rows_estimate(path);'.
>
> It should already be the case with the patch v2. It does create
> rounding errors that are visible in the tests but they shouldn't
> exceed +1 or -1.
>
> > I think perhaps it's better to declare gather_rows_estimate() in
> > cost.h rather than optimizer.h.
> > (BTW, I wonder if compute_gather_rows() would be a better name?)
>
> Good point, I've moved the declaration and renamed it.
>
> > I noticed another issue in generate_useful_gather_paths() -- *rowsp
> > would have a random value if override_rows is true and we use
> > incremental sort for gather merge.  I think we should fix this too.
>
> That seems to be the case. I've tried to find a query that could
> trigger this codepath without success. All grouping and distinct paths
> I've tried where fully sorted and didn't trigger an incremental sort.
> I will need a bit more time to check this.
>
> In the meantime, I've updated the patches with the review comments.

Otherwise I think the v3 patch looks good to me.

Attached is an updated version of this patch with cosmetic changes and
comment updates.  It also squishes the two patches together into one.
I'm planning to push it soon, barring any objections or comments.

Thanks
Richard

Вложения

v4-0001-Fix-rowcount-estimate-for-gather-merge-paths.patch

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Michael Banck
Дата: 22 июля, 12:53:58
Сообщение: Re: why there is not VACUUM FULL CONCURRENTLY?

Следующее

От: Thomas Munro
Дата: 22 июля, 13:12:06
Сообщение: Re: Windows default locale vs initdb

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Possible incorrect row estimation for Gather paths

Вложения

Предыдущее

Следующее