Обсуждение: Checking join outer relation uniqueness to prevent unnecessary memoization

Поиск

Список

Период

Сортировка

Checking join outer relation uniqueness to prevent unnecessary memoization

От

Jacob Jackson

Дата:

01 января, 18:30:02

The current memoization cost calculation method often results in memoize nodes being added even when they are useless due to the outer side of the join being guaranteed unique due to constraints, increasing overhead unnecessarily. To try to prevent this, I am exploring methods to check whether the outer side of the join is guaranteed unique before adding memoize nodes. The simplest method I have found thus far is to use innerrel_is_unique, passing in the outer relation where the inner relation normally would be. This appears to work (although it is logically different from the other uses of innerrel_is_unique), and adds negligible overhead for simple joins because it can often reuse cached values from other potential join orders.

However, with more complex joins, innerrel_is_unique fails because it can't handle joined relations. My original thought on how to get around this is to calculate innerrel_is_unique for each of the join constituent relations (which to my understanding should already be done and cached because of different join orders), and only remove memoize if all joins making up the outer relation are unique, making the assumption that if there is even one non unique join, that joins join keys will create cascading non unique result sets. Does this sound feasible, and could there be a better way? I am not entirely sure it will work naturally with bottom up query planning. I am new to postgres development, so I don't have a great understanding of how everything fits together yet.

Thanks!

Jacob

Re: Checking join outer relation uniqueness to prevent unnecessary memoization

От

David Rowley

Дата:

03 января, 04:44:59

On Fri, 2 Jan 2026 at 04:30, Jacob Jackson <jej.jackson.08@gmail.com> wrote:
> The current memoization cost calculation method often results in memoize nodes being added even when they are useless
dueto the outer side of the join being guaranteed unique due to constraints, increasing overhead unnecessarily. To try
toprevent this, I am exploring methods to check whether the outer side of the join is guaranteed unique before adding
memoizenodes. The simplest method I have found thus far is to use innerrel_is_unique, passing in the outer relation
wherethe inner relation normally would be. This appears to work (although it is logically different from the other uses
ofinnerrel_is_unique), and adds negligible overhead for simple joins because it can often reuse cached values from
otherpotential join orders. 

Do you have an example case of this happening? Ideally, the code that
should disfavour Memoize for this case is estimate_num_groups() as
called in cost_memoize_rescan() by returning that there's 1 group per
input row. I guess that's not happening for this case? Why?

David

Re: Checking join outer relation uniqueness to prevent unnecessary memoization

От

Jacob Jackson

Дата:

04 января, 08:05:32

On Fri, Jan 2, 2026 at 8:45 PM David Rowley <dgrowleyml@gmail.com> wrote:
> Do you have an example case of this happening? Ideally, the code that
> should disfavour Memoize for this case is estimate_num_groups() as
> called in cost_memoize_rescan() by returning that there's 1 group per
> input row. I guess that's not happening for this case? Why?

I have seen the issue pop up a few times when the unique constraint is
across multiple columns and the join is only on one of those columns
(e.g. https://www.postgresql.org/message-id/CAAiQw3yBPrCw6ZLeTwVS4QhKDWgJkmmp9LnGPdodxeQmn=kqVg@mail.gmail.com),
and a constant filter is on the other column. I think what happens is
that this introduces potential for error into the sample because
Postgres can now come across more duplicates of the join key than
expected, reducing n_distinct, or could come across more rows with the
constant filtered value, thus increasing its predicted frequency (in
the case I linked, the constant's frequency was stored in the columns
MCV list), leading to a nonzero hit ratio as the cardinality
estimation/estcalls > ndistinct. However, I am not certain this is the
case (while there probably wouldn't need to be much of an asymmetry to
cause memorization given the high cost in the planner for extra index
scans, it still seems odd that stats could be off enough to enable
this because of sampling alone). Maybe there is a statistics bug at
play? I am not certain.

Jacob

Re: Checking join outer relation uniqueness to prevent unnecessary memoization

От

David Rowley

Дата:

05 января, 02:08:56

On Sun, 4 Jan 2026 at 18:05, Jacob Jackson <jej.jackson.08@gmail.com> wrote:
> I have seen the issue pop up a few times when the unique constraint is
> across multiple columns and the join is only on one of those columns
> (e.g. https://www.postgresql.org/message-id/CAAiQw3yBPrCw6ZLeTwVS4QhKDWgJkmmp9LnGPdodxeQmn=kqVg@mail.gmail.com),
> and a constant filter is on the other column. I think what happens is
> that this introduces potential for error into the sample because
> Postgres can now come across more duplicates of the join key than
> expected, reducing n_distinct, or could come across more rows with the
> constant filtered value, thus increasing its predicted frequency (in
> the case I linked, the constant's frequency was stored in the columns
> MCV list), leading to a nonzero hit ratio as the cardinality
> estimation/estcalls > ndistinct. However, I am not certain this is the
> case (while there probably wouldn't need to be much of an asymmetry to
> cause memorization given the high cost in the planner for extra index
> scans, it still seems odd that stats could be off enough to enable
> this because of sampling alone). Maybe there is a statistics bug at
> play? I am not certain.

I've managed to reverse engineer some tables for this base on the
query you posted in the other thread. I didn't spend enough time to
figure out the exact row counts to insert to get the Memoize plan by
default, but I can get it from disabling merge and hash joins. This
doesn't mean it's not an issue as ideally the Memoize costing would
realise all lookups are unique and opt to not Memoize due to no repeat
lookups.

What seems to be happening is in estimate_num_groups(), we don't set
isunique == true because the unique index search in has_unique_index()
only considers very simplistic cases where there's a single column
unique index on the Var that's being looked up. Later in
estimate_num_groups(), because we have some other base quals doing
some filtering, the "if (reldistinct > 0 && rel->rows < rel->tuples)"
kicks in to attempt to estimate the number of distinct values
accounting for the other qual (user = 0, in this case). That's fairly
generic code that only does calculations based on the distinction
between rel->tuples and rel->rows, so it doesn't know about the
guarantees of uniqueness.

For a fix, I suppose has_unique_index() could be expanded to find a
unique index which mentions the particular Var and ensures that all
other columns that exist in the index have EquivalenceClasses with an
ec_has_const, but that would make that function much more expensive
than it is today and it still wouldn't cover cases where there is
another parameter in the Memoize lookup that matches to one of the
other unique index's columns. That makes me think that the UniqueKey
stuff might be the solution to these problems, as we'd easily be able
to determine which unique properties hold true at any level of the
join. That's a very complex patch, however.

David

Вложения

reverse_engineered_schema.sql

Re: Checking join outer relation uniqueness to prevent unnecessary memoization

От

Jacob Jackson

Дата:

06 января, 07:07:58

On Sun, Jan 4, 2026 at 6:09 PM David Rowley <dgrowleyml@gmail.com> wrote:
> For a fix, I suppose has_unique_index() could be expanded to find a
> unique index which mentions the particular Var and ensures that all
> other columns that exist in the index have EquivalenceClasses with an
> ec_has_const, but that would make that function much more expensive
> than it is today and it still wouldn't cover cases where there is
> another parameter in the Memoize lookup that matches to one of the
> other unique index's columns. That makes me think that the UniqueKey
> stuff might be the solution to these problems, as we'd easily be able
> to determine which unique properties hold true at any level of the
> join. That's a very complex patch, however.

Where can I find the past work on UniqueKey? Nothing turned up in the
git history.

Thanks!
Jacob

Re: Checking join outer relation uniqueness to prevent unnecessary memoization

От

David Rowley

Дата:

06 января, 14:44:02

On Tue, 6 Jan 2026 at 17:08, Jacob Jackson <jej.jackson.08@gmail.com> wrote:
> Where can I find the past work on UniqueKey? Nothing turned up in the
> git history.

Nothing was ever committed. There's a brief in [1].

David

[1] https://postgr.es/m/CAApHDvq7i0%3DO97r4Y1pv68%2BtprVczKsXRsV28rM9H-rVPOfeNQ%40mail.gmail.com

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Checking join outer relation uniqueness to prevent unnecessary memoization

Checking join outer relation uniqueness to prevent unnecessary memoization

Re: Checking join outer relation uniqueness to prevent unnecessary memoization

Re: Checking join outer relation uniqueness to prevent unnecessary memoization

Re: Checking join outer relation uniqueness to prevent unnecessary memoization

Вложения

Re: Checking join outer relation uniqueness to prevent unnecessary memoization

Re: Checking join outer relation uniqueness to prevent unnecessary memoization