Re: Use merge-based matching for MCVs in eqjoinsel
От | David Geier |
---|---|
Тема | Re: Use merge-based matching for MCVs in eqjoinsel |
Дата | |
Msg-id | 1f6e18c6-9149-4b00-9837-fb8ce5304cf4@gmail.com обсуждение исходный текст |
Ответ на | Re: Use merge-based matching for MCVs in eqjoinsel (Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>) |
Ответы |
Re: Use merge-based matching for MCVs in eqjoinsel
|
Список | pgsql-hackers |
Hi! Thanks for the review. On 09.09.2025 09:29, Ilia Evdokimov wrote: > From an architectural perspective, I think the patch is already in good > shape. However, I have some suggestions regarding code style: > > 1. I would move McvHashEntry, McvHashContext, he new hash table > definition, hash_mcv and are_mcvs_equal to the top. Done. I've also moved up try_eqjoinsel_with_hashtable(). > 2. I’m not sure get_hash_func_oid() is needed at all – it seems we > could do without it. Removed. > 3. It would be better to name the parameters matchProductFrequencies -> > matchprodfreq, nMatches -> nmatches, hasMatch1 -> hasmatch1, > hasMatch2 -> hasmatch2 in eqjoinsel_inner_with_hashtable(). Done. > 4. As I wrote earlier, since we now have a dedicated function > eqjoinsel_inner_with_hashtable(), perhaps it could be used in both > eqjoinsel_inner() and eqjoinsel_semi(). And since the hash-based Done. The gains for SEMI join are even bigger because the function is called 3 times for e.g. EXPLAIN SELECT * FROM foo f WHERE EXISTS (SELECT 1 FROM bar b where f.col = b.col); For that query the planning time for me goes from ~1300 ms -> 12 ms. > search was factored out, maybe it would make sense to also factor > out the O(N^2) nested loop implementation? Generally agreed and while tempting, I've refrained from doing the refactoring. Let's better do this in a separate patch to keep things simple. > 5. I think it would be helpful to add a comment explaining that using a > hash table is not efficient when the MCV array length equals 1: > > if (Min(statsSlot1->nvalues, statsSlot2->nvalues) == 1) > return false; Done. >> Did you have a >> chance to check tables with just few MCVs or are there any queries in >> the JOB which regress with very small default_statistics_target? > > > Sure. I need some time to check this. > Could you please do that with the latest attached patch so that we test it once more? Could you also run once more the JOB benchmark to get some test coverage on the SEMI join code (assuming it also uses SEMI joins)? Once we've concluded on above and there are no objections, I will register this patch in the commit fest. -- David Geier
Вложения
В списке pgsql-hackers по дате отправления: