Dimitrios Apostolou <jimis@gmx.net> writes:
> Please correct me if I'm wrong, as I'm a newcomer to PostgreSQL, but here
> is how I understand things according to posts I've read, and classical
> algorithms:
> + The Hash Join is fastest when one side fits in work_mem. Then on one
> hand you have a hash table lookup (amortized O(1)) and on the other
> hand, if the table has M rows that that do not fit in memory, you have
> sequential reads from the disk (given low fragmentation of the table or
> index files): For every line you read from the disk, you lookup the key
> in the hash table.
> If the hash table does not fit in RAM then the cost becomes prohibitive.
> Every lookup is a random access possibly hitting the disk. The total
> cost should be random_page_cost * M.
That would be true of a simple hash join, but Postgres uses batched
hash joins: we split up the hash key space into subsets, where hopefully
each subset includes few enough inner-side rows to fit into work_mem.
While this can go wrong given pathological distribution of the inner-side
keys, it does mean that the join can perform well even when the inner
side is much larger than work_mem. So it's not the case that the planner
will simply disregard hash joins beyond work_mem. It will apply a cost
penalty for the predicted batching overhead; but that can still come out
cheaper than merge join, because the sorting needed for merge is generally
also far from cheap.
> So I would expect an increased random_page_cost to benefit the Merge Join
> algorithm. And since my setup involves spinning disks, it does makes sense
> to increase it.
What is probably really happening is that random_page_cost affects the
estimated cost of performing the sort using an index scan instead of
a bespoke sort step. AFAIR, cost_sort doesn't consider random_page_cost
at all, and neither does cost_hashjoin.
regards, tom lane