Обсуждение: Checking join outer relation uniqueness to prevent unnecessary memoization
The current memoization cost calculation method often results in memoize nodes being added even when they are useless due to the outer side of the join being guaranteed unique due to constraints, increasing overhead unnecessarily. To try to prevent this, I am exploring methods to check whether the outer side of the join is guaranteed unique before adding memoize nodes. The simplest method I have found thus far is to use innerrel_is_unique, passing in the outer relation where the inner relation normally would be. This appears to work (although it is logically different from the other uses of innerrel_is_unique), and adds negligible overhead for simple joins because it can often reuse cached values from other potential join orders.
However, with more complex joins, innerrel_is_unique fails because it can't handle joined relations. My original thought on how to get around this is to calculate innerrel_is_unique for each of the join constituent relations (which to my understanding should already be done and cached because of different join orders), and only remove memoize if all joins making up the outer relation are unique, making the assumption that if there is even one non unique join, that joins join keys will create cascading non unique result sets. Does this sound feasible, and could there be a better way? I am not entirely sure it will work naturally with bottom up query planning. I am new to postgres development, so I don't have a great understanding of how everything fits together yet.
Thanks!
Jacob
Re: Checking join outer relation uniqueness to prevent unnecessary memoization
От
David Rowley
Дата:
On Fri, 2 Jan 2026 at 04:30, Jacob Jackson <jej.jackson.08@gmail.com> wrote: > The current memoization cost calculation method often results in memoize nodes being added even when they are useless dueto the outer side of the join being guaranteed unique due to constraints, increasing overhead unnecessarily. To try toprevent this, I am exploring methods to check whether the outer side of the join is guaranteed unique before adding memoizenodes. The simplest method I have found thus far is to use innerrel_is_unique, passing in the outer relation wherethe inner relation normally would be. This appears to work (although it is logically different from the other uses ofinnerrel_is_unique), and adds negligible overhead for simple joins because it can often reuse cached values from otherpotential join orders. Do you have an example case of this happening? Ideally, the code that should disfavour Memoize for this case is estimate_num_groups() as called in cost_memoize_rescan() by returning that there's 1 group per input row. I guess that's not happening for this case? Why? David
Re: Checking join outer relation uniqueness to prevent unnecessary memoization
От
Jacob Jackson
Дата:
On Fri, Jan 2, 2026 at 8:45 PM David Rowley <dgrowleyml@gmail.com> wrote: > Do you have an example case of this happening? Ideally, the code that > should disfavour Memoize for this case is estimate_num_groups() as > called in cost_memoize_rescan() by returning that there's 1 group per > input row. I guess that's not happening for this case? Why? I have seen the issue pop up a few times when the unique constraint is across multiple columns and the join is only on one of those columns (e.g. https://www.postgresql.org/message-id/CAAiQw3yBPrCw6ZLeTwVS4QhKDWgJkmmp9LnGPdodxeQmn=kqVg@mail.gmail.com), and a constant filter is on the other column. I think what happens is that this introduces potential for error into the sample because Postgres can now come across more duplicates of the join key than expected, reducing n_distinct, or could come across more rows with the constant filtered value, thus increasing its predicted frequency (in the case I linked, the constant's frequency was stored in the columns MCV list), leading to a nonzero hit ratio as the cardinality estimation/estcalls > ndistinct. However, I am not certain this is the case (while there probably wouldn't need to be much of an asymmetry to cause memorization given the high cost in the planner for extra index scans, it still seems odd that stats could be off enough to enable this because of sampling alone). Maybe there is a statistics bug at play? I am not certain. Jacob
Re: Checking join outer relation uniqueness to prevent unnecessary memoization
От
David Rowley
Дата:
On Sun, 4 Jan 2026 at 18:05, Jacob Jackson <jej.jackson.08@gmail.com> wrote: > I have seen the issue pop up a few times when the unique constraint is > across multiple columns and the join is only on one of those columns > (e.g. https://www.postgresql.org/message-id/CAAiQw3yBPrCw6ZLeTwVS4QhKDWgJkmmp9LnGPdodxeQmn=kqVg@mail.gmail.com), > and a constant filter is on the other column. I think what happens is > that this introduces potential for error into the sample because > Postgres can now come across more duplicates of the join key than > expected, reducing n_distinct, or could come across more rows with the > constant filtered value, thus increasing its predicted frequency (in > the case I linked, the constant's frequency was stored in the columns > MCV list), leading to a nonzero hit ratio as the cardinality > estimation/estcalls > ndistinct. However, I am not certain this is the > case (while there probably wouldn't need to be much of an asymmetry to > cause memorization given the high cost in the planner for extra index > scans, it still seems odd that stats could be off enough to enable > this because of sampling alone). Maybe there is a statistics bug at > play? I am not certain. I've managed to reverse engineer some tables for this base on the query you posted in the other thread. I didn't spend enough time to figure out the exact row counts to insert to get the Memoize plan by default, but I can get it from disabling merge and hash joins. This doesn't mean it's not an issue as ideally the Memoize costing would realise all lookups are unique and opt to not Memoize due to no repeat lookups. What seems to be happening is in estimate_num_groups(), we don't set isunique == true because the unique index search in has_unique_index() only considers very simplistic cases where there's a single column unique index on the Var that's being looked up. Later in estimate_num_groups(), because we have some other base quals doing some filtering, the "if (reldistinct > 0 && rel->rows < rel->tuples)" kicks in to attempt to estimate the number of distinct values accounting for the other qual (user = 0, in this case). That's fairly generic code that only does calculations based on the distinction between rel->tuples and rel->rows, so it doesn't know about the guarantees of uniqueness. For a fix, I suppose has_unique_index() could be expanded to find a unique index which mentions the particular Var and ensures that all other columns that exist in the index have EquivalenceClasses with an ec_has_const, but that would make that function much more expensive than it is today and it still wouldn't cover cases where there is another parameter in the Memoize lookup that matches to one of the other unique index's columns. That makes me think that the UniqueKey stuff might be the solution to these problems, as we'd easily be able to determine which unique properties hold true at any level of the join. That's a very complex patch, however. David
Вложения
Re: Checking join outer relation uniqueness to prevent unnecessary memoization
От
Jacob Jackson
Дата:
On Sun, Jan 4, 2026 at 6:09 PM David Rowley <dgrowleyml@gmail.com> wrote: > For a fix, I suppose has_unique_index() could be expanded to find a > unique index which mentions the particular Var and ensures that all > other columns that exist in the index have EquivalenceClasses with an > ec_has_const, but that would make that function much more expensive > than it is today and it still wouldn't cover cases where there is > another parameter in the Memoize lookup that matches to one of the > other unique index's columns. That makes me think that the UniqueKey > stuff might be the solution to these problems, as we'd easily be able > to determine which unique properties hold true at any level of the > join. That's a very complex patch, however. Where can I find the past work on UniqueKey? Nothing turned up in the git history. Thanks! Jacob
Re: Checking join outer relation uniqueness to prevent unnecessary memoization
От
David Rowley
Дата:
On Tue, 6 Jan 2026 at 17:08, Jacob Jackson <jej.jackson.08@gmail.com> wrote: > Where can I find the past work on UniqueKey? Nothing turned up in the > git history. Nothing was ever committed. There's a brief in [1]. David [1] https://postgr.es/m/CAApHDvq7i0%3DO97r4Y1pv68%2BtprVczKsXRsV28rM9H-rVPOfeNQ%40mail.gmail.com