Обсуждение: MERGE INTO... WHEN NOT MATCHED BY SOURCE index usage
Hello! I've read through the new MERGE documentation with the new WHEN NOT MATCHED BY SOURCE functionality and tried it out. It seems to me like one very common use-case, and one I have bumped into often, is to update a n:m resolution table, like this for example: CREATE TABLE courses (course_id INTEGER PRIMARY KEY); CREATE TABLE students (student_id INTEGER PRIMARY KEY); CREATE TABLE students_courses (course_id INTEGER REFERENCES courses(course_id), student_id INTEGER REFERENCES students(student_id), PRIMARY KEY(student_id,course_id)); /* Insert example data... */ INSERT INTO students SELECT s FROM generate_series(1,100000) s; INSERT INTO courses SELECT c FROM generate_series(1,50) c; INSERT INTO students_courses (course_id, student_id) SELECT c, s FROM generate_series(1,50) c, generate_series(1,100000) s; /* Student nr 5 is only enrolled in courses 7,8,9 */ MERGE INTO students_courses USING (VALUES (7),(8),(9)) as s(source_course_id) ON students_courses.student_id=5 AND students_courses.course_id = source_course_id WHEN NOT MATCHED BY TARGET THEN INSERT VALUES (source_course_id, 5) WHEN NOT MATCHED BY SOURCE AND student_id=5 THEN DELETE; SELECT * FROM students_courses WHERE student_id=5; The example above works, but looking at the execution plan it does a full seq scan on students_course without any filter for student_id. Do I misunderstand the usage of MERGE WHEN NOT MATCHED BY SOURCE? Is this a non intended use-case? Is there another way of defining a filter for the MERGE command? Updating a many-to-many relationship seems like a very common use-case to me, and the new MERGE seems perfect for this, but in practice the fact it doesn't use an index seems to make it very ineffective performance wise for this.
On Mon, 23 Feb 2026 at 15:19, Lea Führer <lea@codecat.at> wrote:
> It seems to me like one very common use-case, and one I have bumped into
> often, is to update a n:m resolution table, like this for example:
Something similar was previously discussed in January 2025 as well:
https://www.postgresql.org/message-id/flat/CAK_s-G0NrC_KH7kn85arfqkdzvs80GOCCKvz9YbU2%3DE94qfdPA%40mail.gmail.com
To summarize:
Tom Lane wrote:
> I may not have fully wrapped my head around this example, but I think
> that the fact that "t.device_id = $1" appears in both the ON condition
> and the WHEN NOT MATCHED BY SOURCE condition means that only t rows
> meeting that condition are of interest, so that in principle we could
> optimize by pushing that down to the scan of t. But as you can see,
> we don't. Not sure if this pattern is common enough to be worth
> trying to implement such an optimization.
So at least there's 1 more who uses this pattern!
Kind regards
Feike
> It seems to me like one very common use-case, and one I have bumped into
> often, is to update a n:m resolution table, like this for example:
Something similar was previously discussed in January 2025 as well:
https://www.postgresql.org/message-id/flat/CAK_s-G0NrC_KH7kn85arfqkdzvs80GOCCKvz9YbU2%3DE94qfdPA%40mail.gmail.com
To summarize:
Tom Lane wrote:
> I may not have fully wrapped my head around this example, but I think
> that the fact that "t.device_id = $1" appears in both the ON condition
> and the WHEN NOT MATCHED BY SOURCE condition means that only t rows
> meeting that condition are of interest, so that in principle we could
> optimize by pushing that down to the scan of t. But as you can see,
> we don't. Not sure if this pattern is common enough to be worth
> trying to implement such an optimization.
So at least there's 1 more who uses this pattern!
Kind regards
Feike