Обсуждение: Postgres memoizing inner loop of join when outer loop join key is guaranteed unique?
Postgres memoizing inner loop of join when outer loop join key is guaranteed unique?
От
Jacob Jackson
Дата:
Hello. I was looking at some query plans recently and noticed something that didn't make sense. I have a query that joins a table of questions with results for each question (using a table with a composite primary key of question id and a user id), filtered by user id. The question IDs and the combined question-userIds are guaranteed unique due to being primary keys, and yet Postgres still memoizes the inner loop results. Any ideas why? Is this just a failure of the query planner (I would be happy to explore creating a PR), did I not properly guarantee uniqueness, or is there another reason for memoization? The memoized version is consistently slightly slower in my testing, despite the calculated cost being lower. Here are the query plans:
enable_memoize=on:
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM questions LEFT JOIN "QuestionUserStatus" ON questions.id = "QuestionUserStatus".question WHERE "QuestionUserStatus".user = 0;
Nested Loop (cost=0.71..514.09 rows=277 width=1381) (actual time=0.021..0.520 rows=231 loops=1)
├ Buffers: shared hit=859
├ Index Scan using "QuestionUserStatus_user_question_pk" on "QuestionUserStatus" (cost=0.42..178.88 rows=277 width=18) (actual time=0.014..0.114 rows=231 loops=1)
│ ├ Index Cond: ("user" = '0'::bigint)
│ └ Buffers: shared hit=166
└ Memoize (cost=0.29..1.25 rows=1 width=1363) (actual time=0.001..0.001 rows=1 loops=231)
├ Cache Key: "QuestionUserStatus".question
├ Cache Mode: logical
├ Hits: 0 Misses: 231 Evictions: 0 Overflows: 0 Memory Usage: 320kB
├ Buffers: shared hit=693
└ Index Scan using questions_pkey on questions (cost=0.28..1.24 rows=1 width=1363) (actual time=0.001..0.001 rows=1 loops=231)
├ Index Cond: (id = "QuestionUserStatus".question)
└ Buffers: shared hit=693
Planning:
└ Buffers: shared hit=6
Planning Time: 0.183 ms
Execution Time: 0.548 ms
├ Buffers: shared hit=859
├ Index Scan using "QuestionUserStatus_user_question_pk" on "QuestionUserStatus" (cost=0.42..178.88 rows=277 width=18) (actual time=0.014..0.114 rows=231 loops=1)
│ ├ Index Cond: ("user" = '0'::bigint)
│ └ Buffers: shared hit=166
└ Memoize (cost=0.29..1.25 rows=1 width=1363) (actual time=0.001..0.001 rows=1 loops=231)
├ Cache Key: "QuestionUserStatus".question
├ Cache Mode: logical
├ Hits: 0 Misses: 231 Evictions: 0 Overflows: 0 Memory Usage: 320kB
├ Buffers: shared hit=693
└ Index Scan using questions_pkey on questions (cost=0.28..1.24 rows=1 width=1363) (actual time=0.001..0.001 rows=1 loops=231)
├ Index Cond: (id = "QuestionUserStatus".question)
└ Buffers: shared hit=693
Planning:
└ Buffers: shared hit=6
Planning Time: 0.183 ms
Execution Time: 0.548 ms
enable_memoize=off:
Nested Loop (cost=0.70..521.98 rows=277 width=1381) (actual time=0.018..0.421 rows=231 loops=1)
├ Buffers: shared hit=859
├ Index Scan using "QuestionUserStatus_user_question_pk" on "QuestionUserStatus" (cost=0.42..178.88 rows=277 width=18) (actual time=0.014..0.099 rows=231 loops=1)
│ ├ Index Cond: ("user" = '3477145805513'::bigint)
│ └ Buffers: shared hit=166
└ Index Scan using questions_pkey on questions (cost=0.28..1.24 rows=1 width=1363) (actual time=0.001..0.001 rows=1 loops=231)
├ Index Cond: (id = "QuestionUserStatus".question)
└ Buffers: shared hit=693
Planning:
└ Buffers: shared hit=6
Planning Time: 0.197 ms
Execution Time: 0.444 ms
├ Buffers: shared hit=859
├ Index Scan using "QuestionUserStatus_user_question_pk" on "QuestionUserStatus" (cost=0.42..178.88 rows=277 width=18) (actual time=0.014..0.099 rows=231 loops=1)
│ ├ Index Cond: ("user" = '3477145805513'::bigint)
│ └ Buffers: shared hit=166
└ Index Scan using questions_pkey on questions (cost=0.28..1.24 rows=1 width=1363) (actual time=0.001..0.001 rows=1 loops=231)
├ Index Cond: (id = "QuestionUserStatus".question)
└ Buffers: shared hit=693
Planning:
└ Buffers: shared hit=6
Planning Time: 0.197 ms
Execution Time: 0.444 ms
Thanks for the help,
Jacob
What's the actual problem? Does enable_memoize=on return incorrect results?
Because a 45 microsecond (yes, 45 microseconds: 0.138 milliseconds = 138 microseconds; same for the others) slowdown isn't something I'd get too worked up about.
On Wed, Oct 29, 2025 at 2:29 PM Jacob Jackson <jej.jackson.08@gmail.com> wrote:
Hello. I was looking at some query plans recently and noticed something that didn't make sense. I have a query that joins a table of questions with results for each question (using a table with a composite primary key of question id and a user id), filtered by user id. The question IDs and the combined question-userIds are guaranteed unique due to being primary keys, and yet Postgres still memoizes the inner loop results. Any ideas why? Is this just a failure of the query planner (I would be happy to explore creating a PR), did I not properly guarantee uniqueness, or is there another reason for memoization? The memoized version is consistently slightly slower in my testing, despite the calculated cost being lower. Here are the query plans:enable_memoize=on:EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM questions LEFT JOIN "QuestionUserStatus" ON questions.id = "QuestionUserStatus".question WHERE "QuestionUserStatus".user = 0;Nested Loop (cost=0.71..514.09 rows=277 width=1381) (actual time=0.021..0.520 rows=231 loops=1)
├ Buffers: shared hit=859
├ Index Scan using "QuestionUserStatus_user_question_pk" on "QuestionUserStatus" (cost=0.42..178.88 rows=277 width=18) (actual time=0.014..0.114 rows=231 loops=1)
│ ├ Index Cond: ("user" = '0'::bigint)
│ └ Buffers: shared hit=166
└ Memoize (cost=0.29..1.25 rows=1 width=1363) (actual time=0.001..0.001 rows=1 loops=231)
├ Cache Key: "QuestionUserStatus".question
├ Cache Mode: logical
├ Hits: 0 Misses: 231 Evictions: 0 Overflows: 0 Memory Usage: 320kB
├ Buffers: shared hit=693
└ Index Scan using questions_pkey on questions (cost=0.28..1.24 rows=1 width=1363) (actual time=0.001..0.001 rows=1 loops=231)
├ Index Cond: (id = "QuestionUserStatus".question)
└ Buffers: shared hit=693
Planning:
└ Buffers: shared hit=6
Planning Time: 0.183 ms
Execution Time: 0.548 msenable_memoize=off:Nested Loop (cost=0.70..521.98 rows=277 width=1381) (actual time=0.018..0.421 rows=231 loops=1)
├ Buffers: shared hit=859
├ Index Scan using "QuestionUserStatus_user_question_pk" on "QuestionUserStatus" (cost=0.42..178.88 rows=277 width=18) (actual time=0.014..0.099 rows=231 loops=1)
│ ├ Index Cond: ("user" = '3477145805513'::bigint)
│ └ Buffers: shared hit=166
└ Index Scan using questions_pkey on questions (cost=0.28..1.24 rows=1 width=1363) (actual time=0.001..0.001 rows=1 loops=231)
├ Index Cond: (id = "QuestionUserStatus".question)
└ Buffers: shared hit=693
Planning:
└ Buffers: shared hit=6
Planning Time: 0.197 ms
Execution Time: 0.444 msThanks for the help,Jacob
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
Re: Postgres memoizing inner loop of join when outer loop join key is guaranteed unique?
От
Jacob Jackson
Дата:
I was curious to see whether there was any reason I wasn't seeing for Postgres to decide the memoized version was lower cost and try to memoize these operations.
On Wed, Oct 29, 2025 at 3:20 PM Ron Johnson <ronljohnsonjr@gmail.com> wrote:
What's the actual problem? Does enable_memoize=on return incorrect results?Because a 45 microsecond (yes, 45 microseconds: 0.138 milliseconds = 138 microseconds; same for the others) slowdown isn't something I'd get too worked up about.On Wed, Oct 29, 2025 at 2:29 PM Jacob Jackson <jej.jackson.08@gmail.com> wrote:Hello. I was looking at some query plans recently and noticed something that didn't make sense. I have a query that joins a table of questions with results for each question (using a table with a composite primary key of question id and a user id), filtered by user id. The question IDs and the combined question-userIds are guaranteed unique due to being primary keys, and yet Postgres still memoizes the inner loop results. Any ideas why? Is this just a failure of the query planner (I would be happy to explore creating a PR), did I not properly guarantee uniqueness, or is there another reason for memoization? The memoized version is consistently slightly slower in my testing, despite the calculated cost being lower. Here are the query plans:enable_memoize=on:EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM questions LEFT JOIN "QuestionUserStatus" ON questions.id = "QuestionUserStatus".question WHERE "QuestionUserStatus".user = 0;Nested Loop (cost=0.71..514.09 rows=277 width=1381) (actual time=0.021..0.520 rows=231 loops=1)
├ Buffers: shared hit=859
├ Index Scan using "QuestionUserStatus_user_question_pk" on "QuestionUserStatus" (cost=0.42..178.88 rows=277 width=18) (actual time=0.014..0.114 rows=231 loops=1)
│ ├ Index Cond: ("user" = '0'::bigint)
│ └ Buffers: shared hit=166
└ Memoize (cost=0.29..1.25 rows=1 width=1363) (actual time=0.001..0.001 rows=1 loops=231)
├ Cache Key: "QuestionUserStatus".question
├ Cache Mode: logical
├ Hits: 0 Misses: 231 Evictions: 0 Overflows: 0 Memory Usage: 320kB
├ Buffers: shared hit=693
└ Index Scan using questions_pkey on questions (cost=0.28..1.24 rows=1 width=1363) (actual time=0.001..0.001 rows=1 loops=231)
├ Index Cond: (id = "QuestionUserStatus".question)
└ Buffers: shared hit=693
Planning:
└ Buffers: shared hit=6
Planning Time: 0.183 ms
Execution Time: 0.548 msenable_memoize=off:Nested Loop (cost=0.70..521.98 rows=277 width=1381) (actual time=0.018..0.421 rows=231 loops=1)
├ Buffers: shared hit=859
├ Index Scan using "QuestionUserStatus_user_question_pk" on "QuestionUserStatus" (cost=0.42..178.88 rows=277 width=18) (actual time=0.014..0.099 rows=231 loops=1)
│ ├ Index Cond: ("user" = '3477145805513'::bigint)
│ └ Buffers: shared hit=166
└ Index Scan using questions_pkey on questions (cost=0.28..1.24 rows=1 width=1363) (actual time=0.001..0.001 rows=1 loops=231)
├ Index Cond: (id = "QuestionUserStatus".question)
└ Buffers: shared hit=693
Planning:
└ Buffers: shared hit=6
Planning Time: 0.197 ms
Execution Time: 0.444 msThanks for the help,Jacob--Death to <Redacted>, and butter sauce.Don't boil me, I'm still alive.<Redacted> lobster!
You're running slightly different queries:
enable_memoize=on: Index Cond: ("user" = '0'::bigint)
enable_memoize=off: Index Cond: ("user" = '3477145805513'::bigint)
All buffer counts look to be the same on both, and 514 is just 1.5% smaller than 522. That looks like statistical noise to me.
On Wed, Oct 29, 2025 at 5:34 PM Jacob Jackson <jej.jackson.08@gmail.com> wrote:
I was curious to see whether there was any reason I wasn't seeing for Postgres to decide the memoized version was lower cost and try to memoize these operations.On Wed, Oct 29, 2025 at 3:20 PM Ron Johnson <ronljohnsonjr@gmail.com> wrote:What's the actual problem? Does enable_memoize=on return incorrect results?Because a 45 microsecond (yes, 45 microseconds: 0.138 milliseconds = 138 microseconds; same for the others) slowdown isn't something I'd get too worked up about.On Wed, Oct 29, 2025 at 2:29 PM Jacob Jackson <jej.jackson.08@gmail.com> wrote:Hello. I was looking at some query plans recently and noticed something that didn't make sense. I have a query that joins a table of questions with results for each question (using a table with a composite primary key of question id and a user id), filtered by user id. The question IDs and the combined question-userIds are guaranteed unique due to being primary keys, and yet Postgres still memoizes the inner loop results. Any ideas why? Is this just a failure of the query planner (I would be happy to explore creating a PR), did I not properly guarantee uniqueness, or is there another reason for memoization? The memoized version is consistently slightly slower in my testing, despite the calculated cost being lower. Here are the query plans:enable_memoize=on:EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM questions LEFT JOIN "QuestionUserStatus" ON questions.id = "QuestionUserStatus".question WHERE "QuestionUserStatus".user = 0;Nested Loop (cost=0.71..514.09 rows=277 width=1381) (actual time=0.021..0.520 rows=231 loops=1)
├ Buffers: shared hit=859
├ Index Scan using "QuestionUserStatus_user_question_pk" on "QuestionUserStatus" (cost=0.42..178.88 rows=277 width=18) (actual time=0.014..0.114 rows=231 loops=1)
│ ├ Index Cond: ("user" = '0'::bigint)
│ └ Buffers: shared hit=166
└ Memoize (cost=0.29..1.25 rows=1 width=1363) (actual time=0.001..0.001 rows=1 loops=231)
├ Cache Key: "QuestionUserStatus".question
├ Cache Mode: logical
├ Hits: 0 Misses: 231 Evictions: 0 Overflows: 0 Memory Usage: 320kB
├ Buffers: shared hit=693
└ Index Scan using questions_pkey on questions (cost=0.28..1.24 rows=1 width=1363) (actual time=0.001..0.001 rows=1 loops=231)
├ Index Cond: (id = "QuestionUserStatus".question)
└ Buffers: shared hit=693
Planning:
└ Buffers: shared hit=6
Planning Time: 0.183 ms
Execution Time: 0.548 msenable_memoize=off:Nested Loop (cost=0.70..521.98 rows=277 width=1381) (actual time=0.018..0.421 rows=231 loops=1)
├ Buffers: shared hit=859
├ Index Scan using "QuestionUserStatus_user_question_pk" on "QuestionUserStatus" (cost=0.42..178.88 rows=277 width=18) (actual time=0.014..0.099 rows=231 loops=1)
│ ├ Index Cond: ("user" = '3477145805513'::bigint)
│ └ Buffers: shared hit=166
└ Index Scan using questions_pkey on questions (cost=0.28..1.24 rows=1 width=1363) (actual time=0.001..0.001 rows=1 loops=231)
├ Index Cond: (id = "QuestionUserStatus".question)
└ Buffers: shared hit=693
Planning:
└ Buffers: shared hit=6
Planning Time: 0.197 ms
Execution Time: 0.444 msThanks for the help,Jacob
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
Re: Postgres memoizing inner loop of join when outer loop join key is guaranteed unique?
От
David Rowley
Дата:
On Thu, 30 Oct 2025 at 07:29, Jacob Jackson <jej.jackson.08@gmail.com> wrote: > Hello. I was looking at some query plans recently and noticed something that didn't make sense. I have a query that joinsa table of questions with results for each question (using a table with a composite primary key of question id and auser id), filtered by user id. The question IDs and the combined question-userIds are guaranteed unique due to being primarykeys, and yet Postgres still memoizes the inner loop results. Any ideas why? I agree that when the outer side of the join has unique values that Memoize does not make any sense. The planner currenrly puts quite a bit of faith in the row estimates for this and if you're getting this plan, then the estimates came back indicating there'd be fewer unique values of "QuestionUserStatus".question in the input than there are input rows to the Memoize node. If you delve into cost_memoize_rescan(), you'll see the code for this (look for where "hit_ratio" is calculated). There are also a few prechecks in get_memoize_path() to try to avoid this sort of thing, but unfortunately, the information to avoid Memoize when the outer side of the join is unique isn't available. We do have an "inner_unique" in JoinPathExtraData, but what we'd need for this and don't have is "outer_unique". If we had that, we could just exit early in get_memoize_path() if that's set to true. Whether or not going to the trouble of calculating "outer_unique" is worth the trouble, I'm not sure. There was some work on UniqueKeys a few years ago, which could have helped in this scenario as we could have more easily identified uniqueness at different join levels. That's no longer being worked on, as I understand it. On the other hand, it may be better to somehow enhance estimate_num_groups() so it can be given more details about the context of the request, i.e the set of Relids that are joined already for the input_rows. That way the code could do more analysis into the RelOptInfo base quals for the relevant relations. Extended statistics for n_distinct could also be applied in some cases too by looking for baserestrictinfo with equality quals or EquivalenceClasses with ec_has_const = true and a member for other Vars/Exprs in the extended statistics. Unfortunately, neither of these is a trivial fix. David
Re: Postgres memoizing inner loop of join when outer loop join key is guaranteed unique?
От
Jacob Jackson
Дата:
The performance/cost difference persists with identical keys. Why would the planner for the memorized query use different statistics/calculations in the first place?
On Wed, Oct 29, 2025 at 5:48 PM Ron Johnson <ronljohnsonjr@gmail.com> wrote:
You're running slightly different queries:enable_memoize=on: Index Cond: ("user" = '0'::bigint)enable_memoize=off: Index Cond: ("user" = '3477145805513'::bigint)All buffer counts look to be the same on both, and 514 is just 1.5% smaller than 522. That looks like statistical noise to me.--On Wed, Oct 29, 2025 at 5:34 PM Jacob Jackson <jej.jackson.08@gmail.com> wrote:I was curious to see whether there was any reason I wasn't seeing for Postgres to decide the memoized version was lower cost and try to memoize these operations.On Wed, Oct 29, 2025 at 3:20 PM Ron Johnson <ronljohnsonjr@gmail.com> wrote:What's the actual problem? Does enable_memoize=on return incorrect results?Because a 45 microsecond (yes, 45 microseconds: 0.138 milliseconds = 138 microseconds; same for the others) slowdown isn't something I'd get too worked up about.On Wed, Oct 29, 2025 at 2:29 PM Jacob Jackson <jej.jackson.08@gmail.com> wrote:Hello. I was looking at some query plans recently and noticed something that didn't make sense. I have a query that joins a table of questions with results for each question (using a table with a composite primary key of question id and a user id), filtered by user id. The question IDs and the combined question-userIds are guaranteed unique due to being primary keys, and yet Postgres still memoizes the inner loop results. Any ideas why? Is this just a failure of the query planner (I would be happy to explore creating a PR), did I not properly guarantee uniqueness, or is there another reason for memoization? The memoized version is consistently slightly slower in my testing, despite the calculated cost being lower. Here are the query plans:enable_memoize=on:EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM questions LEFT JOIN "QuestionUserStatus" ON questions.id = "QuestionUserStatus".question WHERE "QuestionUserStatus".user = 0;Nested Loop (cost=0.71..514.09 rows=277 width=1381) (actual time=0.021..0.520 rows=231 loops=1)
├ Buffers: shared hit=859
├ Index Scan using "QuestionUserStatus_user_question_pk" on "QuestionUserStatus" (cost=0.42..178.88 rows=277 width=18) (actual time=0.014..0.114 rows=231 loops=1)
│ ├ Index Cond: ("user" = '0'::bigint)
│ └ Buffers: shared hit=166
└ Memoize (cost=0.29..1.25 rows=1 width=1363) (actual time=0.001..0.001 rows=1 loops=231)
├ Cache Key: "QuestionUserStatus".question
├ Cache Mode: logical
├ Hits: 0 Misses: 231 Evictions: 0 Overflows: 0 Memory Usage: 320kB
├ Buffers: shared hit=693
└ Index Scan using questions_pkey on questions (cost=0.28..1.24 rows=1 width=1363) (actual time=0.001..0.001 rows=1 loops=231)
├ Index Cond: (id = "QuestionUserStatus".question)
└ Buffers: shared hit=693
Planning:
└ Buffers: shared hit=6
Planning Time: 0.183 ms
Execution Time: 0.548 msenable_memoize=off:Nested Loop (cost=0.70..521.98 rows=277 width=1381) (actual time=0.018..0.421 rows=231 loops=1)
├ Buffers: shared hit=859
├ Index Scan using "QuestionUserStatus_user_question_pk" on "QuestionUserStatus" (cost=0.42..178.88 rows=277 width=18) (actual time=0.014..0.099 rows=231 loops=1)
│ ├ Index Cond: ("user" = '3477145805513'::bigint)
│ └ Buffers: shared hit=166
└ Index Scan using questions_pkey on questions (cost=0.28..1.24 rows=1 width=1363) (actual time=0.001..0.001 rows=1 loops=231)
├ Index Cond: (id = "QuestionUserStatus".question)
└ Buffers: shared hit=693
Planning:
└ Buffers: shared hit=6
Planning Time: 0.197 ms
Execution Time: 0.444 msThanks for the help,JacobDeath to <Redacted>, and butter sauce.Don't boil me, I'm still alive.<Redacted> lobster!
Re: Postgres memoizing inner loop of join when outer loop join key is guaranteed unique?
От
Jacob Jackson
Дата:
Thanks! This is very helpful.
On Wed, Oct 29, 2025 at 8:01 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Thu, 30 Oct 2025 at 07:29, Jacob Jackson <jej.jackson.08@gmail.com> wrote:
> Hello. I was looking at some query plans recently and noticed something that didn't make sense. I have a query that joins a table of questions with results for each question (using a table with a composite primary key of question id and a user id), filtered by user id. The question IDs and the combined question-userIds are guaranteed unique due to being primary keys, and yet Postgres still memoizes the inner loop results. Any ideas why?
I agree that when the outer side of the join has unique values that
Memoize does not make any sense. The planner currenrly puts quite a
bit of faith in the row estimates for this and if you're getting this
plan, then the estimates came back indicating there'd be fewer unique
values of "QuestionUserStatus".question in the input than there are
input rows to the Memoize node. If you delve into
cost_memoize_rescan(), you'll see the code for this (look for where
"hit_ratio" is calculated).
There are also a few prechecks in get_memoize_path() to try to avoid
this sort of thing, but unfortunately, the information to avoid
Memoize when the outer side of the join is unique isn't available. We
do have an "inner_unique" in JoinPathExtraData, but what we'd need for
this and don't have is "outer_unique". If we had that, we could just
exit early in get_memoize_path() if that's set to true. Whether or not
going to the trouble of calculating "outer_unique" is worth the
trouble, I'm not sure. There was some work on UniqueKeys a few years
ago, which could have helped in this scenario as we could have more
easily identified uniqueness at different join levels. That's no
longer being worked on, as I understand it.
On the other hand, it may be better to somehow enhance
estimate_num_groups() so it can be given more details about the
context of the request, i.e the set of Relids that are joined already
for the input_rows. That way the code could do more analysis into the
RelOptInfo base quals for the relevant relations. Extended statistics
for n_distinct could also be applied in some cases too by looking for
baserestrictinfo with equality quals or EquivalenceClasses with
ec_has_const = true and a member for other Vars/Exprs in the extended
statistics.
Unfortunately, neither of these is a trivial fix.
David