> This topic is interesting - we found very bad performance with hashing large > tables with high work_mem. MergeJoin with quicksort was significantly > faster.
I've seen this also.
> I didn't deeper research - there is a possibility of virtualization > overhead.
I took measurements and the effect was repeatable and happened for all sizes of work_mem, but nothing more to add.
I get similar results if I join on integers. But joining on text, the hash wins by a mile.
I use this as a simple test bed:
alter table pgbench_accounts drop CONSTRAINT pgbench_accounts_pkey;
update pgbench_accounts set filler = md5(aid::text);
set work_mem to whatever keeps the join off of disk for the given scale;
set enable_hashjoin to whatever;
select sum(a1.abalance*a2.abalance) from pgbench_accounts a1 join pgbench_accounts a2 using (aid);
select sum(a1.abalance*a2.abalance) from pgbench_accounts a1 join pgbench_accounts a2 using (filler);
hash integer: 1832.695 ms
merge integer: 1462.913 ms
hash text: 2353.115 ms
merge text: 11,218.628 ms
The cost estimates do not depend on the column used in the join despite a 6 fold difference in run time, so the planner is perhaps missing a trick there.