Обсуждение: Plan weirdness. A sort produces more rows than the node beneath it

Поиск

Список

Период

Сортировка

Plan weirdness. A sort produces more rows than the node beneath it

От

Dane Foster

Дата:

04 августа 2023 г., 14:59:19

Hello,

I'm trying to understand a bit of weirdness in a plan output. There is a sort node above a sequential scan node where the scan node produces 26,026 rows yet the sort node above it produces 42,995,408. How is it possible to sort more data than you received?

https://explain.dalibo.com/plan/1ee665h69f92chc5

The PostgreSQL version is 14.2 running on Amazon's RDS. Thanks.

Dane

Re: Plan weirdness. A sort produces more rows than the node beneath it

От

Jeff Janes

Дата:

04 августа 2023 г., 15:07:19

On Fri, Aug 4, 2023 at 11:00 AM Dane Foster <studdugie@gmail.com> wrote:

Hello,

I'm trying to understand a bit of weirdness in a plan output. There is a sort node above a sequential scan node where the scan node produces 26,026 rows yet the sort node above it produces 42,995,408. How is it possible to sort more data than you received?

This is normal for a merge join. For every tie in the first input, the qualifying part of the 2nd input must be rescanned, and the rows are tallied again (in the sort node) each time they are rescanned.

Cheers,

Jeff

Re: Plan weirdness. A sort produces more rows than the node beneath it

От

Dane Foster

Дата:

04 августа 2023 г., 15:09:09

Thanks for the explanation.

Dane

On Fri, Aug 4, 2023 at 11:07 AM Jeff Janes <jeff.janes@gmail.com> wrote:

On Fri, Aug 4, 2023 at 11:00 AM Dane Foster <studdugie@gmail.com> wrote:
Hello,

I'm trying to understand a bit of weirdness in a plan output. There is a sort node above a sequential scan node where the scan node produces 26,026 rows yet the sort node above it produces 42,995,408. How is it possible to sort more data than you received?

This is normal for a merge join. For every tie in the first input, the qualifying part of the 2nd input must be rescanned, and the rows are tallied again (in the sort node) each time they are rescanned.

Cheers,

Jeff

Re: Plan weirdness. A sort produces more rows than the node beneath it

От

Tom Lane

Дата:

04 августа 2023 г., 15:10:21

Dane Foster <studdugie@gmail.com> writes:
> I'm trying to understand a bit of weirdness in a plan output. There is a
> sort node above a sequential scan node where the scan node produces 26,026
> rows yet the sort node above it produces 42,995,408. How is it possible to
> sort more data than you received?

If the sort is the inner input to a merge join, this could reflect
mark-and-restore rescanning of the sort's output.  Are there a
whole lot of duplicate keys on the merge's other side?

            regards, tom lane

Re: Plan weirdness. A sort produces more rows than the node beneath it

От

Dane Foster

Дата:

04 августа 2023 г., 15:15:40

> If the sort is the inner input to a merge join, this could reflect
> mark-and-restore rescanning of the sort's output. Are there a
> whole lot of duplicate keys on the merge's other side?

Yes. The course_id column's values repeat a LOT on the merge side.

Dane

On Fri, Aug 4, 2023 at 11:10 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Dane Foster <studdugie@gmail.com> writes:
> I'm trying to understand a bit of weirdness in a plan output. There is a
> sort node above a sequential scan node where the scan node produces 26,026
> rows yet the sort node above it produces 42,995,408. How is it possible to
> sort more data than you received?

If the sort is the inner input to a merge join, this could reflect
mark-and-restore rescanning of the sort's output. Are there a
whole lot of duplicate keys on the merge's other side?

regards, tom lane

Re: Plan weirdness. A sort produces more rows than the node beneath it

От

Tom Lane

Дата:

04 августа 2023 г., 15:31:57

Dane Foster <studdugie@gmail.com> writes:
>> If the sort is the inner input to a merge join, this could reflect
>> mark-and-restore rescanning of the sort's output.  Are there a
>> whole lot of duplicate keys on the merge's other side?

> Yes. The course_id column's values repeat a LOT on the merge side.

Hmm.  The planner should avoid using a merge join if it knows that
to be true.  Maybe analyze'ing that table would prompt it to use
some other join method?

            regards, tom lane

Re: Plan weirdness. A sort produces more rows than the node beneath it

От

Dane Foster

Дата:

04 августа 2023 г., 16:41:28

Hmm. The planner should avoid using a merge join if it knows that
to be true. Maybe analyze'ing that table would prompt it to use
some other join method?

The planner has updated stats on the table and wants to use a nested loop:

https://explain.dalibo.com/plan/3814d5356cc82528

But the nested loop version is around 8 seconds slower so I forced the issue. But thanks to this conversation I now understand what's happening with the row count. This understanding helped make the nested loops' plan easier to understand. Unfortunately, there doesn't seem to be any hope for the merge join variant in terms of being easily understood. The uninitiated sees a scan node and its parent sort node and their brain defaults to thinking: the sort node will produce the same number of rows as the node feeding it.

Cheers,

Dane

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Plan weirdness. A sort produces more rows than the node beneath it

Plan weirdness. A sort produces more rows than the node beneath it

Re: Plan weirdness. A sort produces more rows than the node beneath it

Re: Plan weirdness. A sort produces more rows than the node beneath it

Re: Plan weirdness. A sort produces more rows than the node beneath it

Re: Plan weirdness. A sort produces more rows than the node beneath it

Re: Plan weirdness. A sort produces more rows than the node beneath it

Re: Plan weirdness. A sort produces more rows than the node beneath it