Robert Haas <robertmhaas@gmail.com> writes:
> From my point of view, one interesting fact about database
> optimization is that the numbers 0 and 1 are phenomenally important
> special cases.
Yeah.
> It is often the case that a join will return at most 1
> row per outer row, or that an aggregate will generate exactly 1 group,
> or whatever. And the code is littered with special cases - including
> Nested Loop - that cater to making such cases fast. Those cases arise
> frequently because people engineer their data so that they occur
> frequently.
> If we could bias the planner against picking nested loops in cases
> where they will figure to win only a little but might conceivably lose
> a lot, that would probably be a good idea. But it's not obvious
> exactly how to figure that out.
There was discussion awhile ago of trying to teach the planner to generate
rowcount estimates of 0 or 1 row only in cases where that was provably the
case, eg because the query selects on a unique key. In any other
situation, rowcount estimates would be clamped to a minimum of 2 rows.
This should by itself eliminate the worst abuses of nestloop plans, since
the planner would always assume that the outer scan contains at least 2
rows unless it's known not to. Still, there might be a lot of other less
pleasant side effects; it's hard to tell in advance of actually doing the
work.
regards, tom lane