This document provides detailed analyses of representative queries from
the JOB workload that illustrate both failure modes and success cases of
greedy join ordering (GOO).

These case studies are intended to complement the summary discussion in
the main email and are not required for understanding the overall
results.

## GOO(result_size) bad case analysis

TL;DR:
* 12b: Estimation error misleads the greedy choice, causing a highly
selective predicate to be applied too late; with accurate estimates, the
regression can be avoided.
* 3b: An inherent weakness of result_size-based greediness, where small
estimated results hide expensive scans or repeated execution.
* 11b: Row count underestimation leads to a non-materialized NLJ with
repeated execution; accurate estimates or materialization / Hash Join
largely avoid the regression.

Detailed analysis

* 12b: The regression occurs because a highly selective predicate is
applied too late. Due to estimation error, the optimizer fails to
recognize its selectivity and starts from a large table, forcing a
full sequential scan. The selective predicate is applied only as a
late join filter instead of an index condition. This highlights a risk
of result_size-based greediness: it may favor small estimated outputs
while ignoring the cost of scanning large inputs and the benefit of
early predicate application.

* 3b: The planner joins a very large table using a highly selective but
unindexed filter early. Although this yields a small intermediate
result, it requires a full scan of the large table and leads to
repeated rescans later. This is not mainly an estimation issue: even
with correct estimates, result_size-based greediness systematically
prefers small-but-expensive subplans because it ignores how the rows
are produced.

* 11b: The greedy strategy produces a very small intermediate result
early, but at the cost of building an expensive subplan and delaying the
(keyword -> movie_keyword) join. This causes that side to become a
parameterless inner of a Nested Loop join and be re-executed multiple
times. Cardinality underestimation prevents materialization or a switch
to Hash Join. Forcing materialization or Hash Join largely avoids the
regression, indicating a structural issue rather than a pure estimation
problem.

## GOO(combined) bad case analysis

GOO(combined) regressions mainly fall into two categories:
1. failures of the cost model itself (15a), and
2. cases where cost and result_size share the same structural weakness
(17b).

TL;DR:
* 15a: The selector chooses the cost-greedy plan due to lower estimated
cost, even though the result_size plan is much faster in reality. This
reflects cost model unreliability; in such cases, plan diversity
collapses back to a single fragile choice.
* 17b: Both greedy strategies converge to similarly bad plans (~3× DP),
indicating a structural limitation of greedy enumeration rather than a
selector issue.

Detailed analysis

* 15a: GOO(combined) selects the cost-greedy plan because it has a lower
estimated total_cost, while the result_size plan executes fastest. The
cost-based plan joins the many-to-many table movie_keyword too early,
inflating intermediate results and amplifying downstream work. The
root cause is cardinality underestimation (fanout and correlation) on
this join, which makes the plan appear cheap to the cost model but
expensive at execution time.

* 17b: Both greedy strategies perform poorly. The query forms a fan-out
star around movie_id, where locally attractive joins duplicate keys
early and force expensive downstream probes to be repeated many times.
This is a systematic limitation of myopic greedy join ordering: even
with accurate local estimates, a one-step objective cannot account for
fan-out–induced amplification of later work. As a result, both cost
and result_size fail for the same structural reason.

## Representative cases where GOO(result_size) outperforms DP


In addition to regressions, I include a small number of cases where
greedy join ordering outperforms DP. These cases are not meant to
suggest that greedy ordering makes intrinsically better decisions, but
to illustrate different failure modes: avoiding fan-out amplification
under severe underestimation (29c, 31a).

TL;DR:
* 29c: Under severe cardinality underestimation, DP introduces early
fan-out, producing large intermediates and repeated inner execution.
GOO(result_size) starts from a highly selective predicate and avoids the
fan-out explosion.
* 31a: DP chooses an early multiplying join under severe
underestimation, causing row counts to explode. GOO(result_size) delays
the multiplying join until a more selective prefix is built.

Detailed analysis:

* 29c: DP assumes an early join prefix is nearly singleton, but in
reality it produces a very large number of rows, leading to massive
fan-out and repeated inner scans/probes. Once execution enters a “many
loops × inner cost” regime, the error becomes catastrophic.
GOO(result_size) anchors the plan on a highly selective base predicate
and only expands after a small prefix is established, keeping
intermediate results small even under severe misestimation. 

* 31a: This case exhibits a similar fan-out failure mode, but for a
different reason. DP treats an early join as nearly one-to-one when it
is actually high-multiplicity, causing rows to multiply much earlier
than expected. Unlike 29c, there is no single highly selective starting
predicate. GOO(result_size) still performs better by postponing the
multiplying join until after a more selective prefix is formed, so the
multiplicative effect applies to fewer rows.