> The hash join code is capable of splitting the table into multiple
> segments, but it won't do so unless the planner's estimate of the
> table size is larger than sort_mem. So the real problem IMHO is
> the bad rowcount estimate.
Ahh, I didn't know that. I assumed that cost estimates would only
affect the query plan itself, not the implementation details of
individual steps. That seems like a very plausible explanation of my
problems, especially since I *know* some of the estimates are way off.
In my query plan, there is a hash join that expects to hash about
395 rows in its outer 'subplan' (for lack of a better term), which
consists of several nested joins (i.e. it's not a simple relation)..
that's at least two or three orders of magnitude off. I'm still
ANALYZEing as we speak to see if I can get it any better, but that's a
pretty tough case thing for any estimator to correctly assess. It
seems somewhat dangerous to rely so heavily on cost estimates if the
consequences are not simply a bad query plan, but outright potential
for query failure.
That still leaves a few questions, though:
- why did the amount of memory used for the hash join exceed sort_mem?
Is sort_mem merely a guideline for planning and not an enforced hard
limit?
- why wasn't the memory cleared after the transaction finished
(failed)? The process was still 2GB large. Does memory used for
sorts just stay around, possibly re-used for other sorts or hashes but
never de-allocated?