Обсуждение: Why does the planner reduce the planned rows when filtering single values in an array

Поиск

Список

Период

Сортировка

Why does the planner reduce the planned rows when filtering single values in an array

От

Isaiah Langford

Дата:

21 апреля 2023 г., 02:41:13

When I run the following, on PostgreSQL 13.5:

SELECT single_value

INTO pg_temp.single_value_table

FROM generate_series(1, 1) AS single_value;

SELECT

series_value, ARRAY[1] AS single_value_array

INTO pg_temp.series_with_arrays

FROM generate_series(1, 100000) AS series_value;

ANALYZE pg_temp.single_value_table;

ANALYZE pg_temp.series_with_arrays;

EXPLAIN

SELECT

FROM pg_temp.single_value_table

JOIN pg_temp.series_with_arrays ON single_value_table.single_value = ANY(series_with_arrays.single_value_array)

I get the following output:

Nested Loop (cost=0.00..3987.01 rows=4889 width=33)

Join Filter: (single_value_table.single_value = ANY (series_with_arrays.single_value_array))

-> Seq Scan on single_value_table (cost=0.00..1.01 rows=1 width=4)

-> Seq Scan on series_with_arrays (cost=0.00..1736.00 rows=100000 width=29)

Why is the planner expecting the number of rows to be reduced by the join filter? Is there any way I can correct the planner here?

I've tried looking into pg_stats to figure this out, but I can't see why this is happening. I'm seeing the same issue on multiple databases so if anyone has time to help me understand this it would be greatly appreciated.

In case it helps, I've noticed that the ratio changes based on the number of values held in the single_value_table so if you bump the single_value_table to be formed from generate_series(1, 20) instead the planned rows is much closer to correctly estimating the number of rows returned.

Thanks for any help, Isaiah

Re: Why does the planner reduce the planned rows when filtering single values in an array

От

Tom Lane

Дата:

21 апреля 2023 г., 14:43:34

Isaiah Langford <ilangford@ushrauto.com> writes:
> Why is the planner expecting the number of rows to be reduced by the join filter? Is there any way I can correct the
plannerhere? 

I think you're running into a couple of issues here:

* ANALYZE fails to record any useful stats for a single-row table.
It can't form a histogram with only one entry, but it also fails to
put the value into the MCV list, because there's a heuristic that an
MCV must appear more than once.  Possibly we could think harder
about what to do with such cases, but at the moment the planner has
no knowledge that the only value in single_value_table is "1".

* Even if it did know that, the logic in scalararraysel() is quite
inadequate for the case of "variable = ANY(variable)".  It looks
like that's only been fleshed out for cases where one side or
the other is a constant.

Lot of unfinished work here :-(

            regards, tom lane

Re: Why does the planner reduce the planned rows when filtering single values in an array

От

Isaiah Langford

Дата:

21 апреля 2023 г., 23:02:55

Thank you Tom,

Every tool is a work in progress. Knowing it's just a fact of life for now is really helpful, I appreciate the response.

From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Saturday, April 22, 2023 12:43 AM
To: Isaiah Langford <ilangford@ushrauto.com>
Cc: pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Subject: Re: Why does the planner reduce the planned rows when filtering single values in an array

Isaiah Langford <ilangford@ushrauto.com> writes:
> Why is the planner expecting the number of rows to be reduced by the join filter? Is there any way I can correct the planner here?

I think you're running into a couple of issues here:

* ANALYZE fails to record any useful stats for a single-row table.
It can't form a histogram with only one entry, but it also fails to
put the value into the MCV list, because there's a heuristic that an
MCV must appear more than once. Possibly we could think harder
about what to do with such cases, but at the moment the planner has
no knowledge that the only value in single_value_table is "1".

* Even if it did know that, the logic in scalararraysel() is quite
inadequate for the case of "variable = ANY(variable)". It looks
like that's only been fleshed out for cases where one side or
the other is a constant.

Lot of unfinished work here :-(

regards, tom lane

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Why does the planner reduce the planned rows when filtering single values in an array

Why does the planner reduce the planned rows when filtering single values in an array

Re: Why does the planner reduce the planned rows when filtering single values in an array

Re: Why does the planner reduce the planned rows when filtering single values in an array