Re: Use merge-based matching for MCVs in eqjoinsel
От | David Geier |
---|---|
Тема | Re: Use merge-based matching for MCVs in eqjoinsel |
Дата | |
Msg-id | 5c684d71-f618-4f4b-bd1b-608c0ae3f6d7@gmail.com обсуждение исходный текст |
Ответ на | Re: Use merge-based matching for MCVs in eqjoinsel (David Geier <geidav.pg@gmail.com>) |
Ответы |
Re: Use merge-based matching for MCVs in eqjoinsel
|
Список | pgsql-hackers |
Hi Ilia! On 05.09.2025 16:03, David Geier wrote: >>> I propose an optimization: when the column datatype supports >>> ordering(i.e., has < and >), we can sort both MCV lists and apply >>> mege-style algorithm to detect matches. This reduces runtime from >>> O(N^2) to O(NlogN), where N is the number of MCV entries. The patch >>> also applies the same optimization to semi-join clauses, which show >>> similar performance behavior. > > Why do you sort both lists and then merge instead of putting the smaller > list into a hash map and then doing hash lookups (if the type is hashable)? I've tested your and my code with the following script: CREATE TABLE foo(col TEXT); CREATE TABLE bar(col TEXT); SET default_statistics_target = 10000; -- Generate MCV values. PostgreSQL doesn't store MCVs if the table has -- only a single value or every value has exactly the same cardinality. DO $$ BEGIN FOR i IN 1..10000 LOOP FOR j IN 1..least(i, 50) LOOP INSERT INTO foo VALUES ('aaaaaaaaaaaaaaaaaaaa' || i::TEXT); INSERT INTO bar VALUES ('aaaaaaaaaaaaaaaaaaaa' || (i + 100000)::TEXT); END LOOP; END LOOP; END; $$; ANALYZE foo, bar; \timing on EXPLAIN SELECT * FROM foo f, bar b WHERE f.col = b.col; Results are: - master: 433 ms - Order+Merge: 11 ms - Hash map: 4 ms I've attached my draft patch. -- David Geier
Вложения
В списке pgsql-hackers по дате отправления: