This document provides detailed analyses of representative queries from the JOB workload that illustrate both failure modes and success cases of greedy join ordering (GOO). These case studies are intended to complement the summary discussion in the main email and are not required for understanding the overall results. ## GOO(result_size) bad case analysis TL;DR: * 12b: Estimation error misleads the greedy choice, causing a highly selective predicate to be applied too late; with accurate estimates, the regression can be avoided. * 3b: An inherent weakness of result_size-based greediness, where small estimated results hide expensive scans or repeated execution. * 11b: Row count underestimation leads to a non-materialized NLJ with repeated execution; accurate estimates or materialization / Hash Join largely avoid the regression. Detailed analysis * 12b: The regression occurs because a highly selective predicate is applied too late. Due to estimation error, the optimizer fails to recognize its selectivity and starts from a large table, forcing a full sequential scan. The selective predicate is applied only as a late join filter instead of an index condition. This highlights a risk of result_size-based greediness: it may favor small estimated outputs while ignoring the cost of scanning large inputs and the benefit of early predicate application. * 3b: The planner joins a very large table using a highly selective but unindexed filter early. Although this yields a small intermediate result, it requires a full scan of the large table and leads to repeated rescans later. This is not mainly an estimation issue: even with correct estimates, result_size-based greediness systematically prefers small-but-expensive subplans because it ignores how the rows are produced. * 11b: The greedy strategy produces a very small intermediate result early, but at the cost of building an expensive subplan and delaying the (keyword -> movie_keyword) join. This causes that side to become a parameterless inner of a Nested Loop join and be re-executed multiple times. Cardinality underestimation prevents materialization or a switch to Hash Join. Forcing materialization or Hash Join largely avoids the regression, indicating a structural issue rather than a pure estimation problem. ## GOO(combined) bad case analysis GOO(combined) regressions mainly fall into two categories: 1. failures of the cost model itself (15a), and 2. cases where cost and result_size share the same structural weakness (17b). TL;DR: * 15a: The selector chooses the cost-greedy plan due to lower estimated cost, even though the result_size plan is much faster in reality. This reflects cost model unreliability; in such cases, plan diversity collapses back to a single fragile choice. * 17b: Both greedy strategies converge to similarly bad plans (~3× DP), indicating a structural limitation of greedy enumeration rather than a selector issue. Detailed analysis * 15a: GOO(combined) selects the cost-greedy plan because it has a lower estimated total_cost, while the result_size plan executes fastest. The cost-based plan joins the many-to-many table movie_keyword too early, inflating intermediate results and amplifying downstream work. The root cause is cardinality underestimation (fanout and correlation) on this join, which makes the plan appear cheap to the cost model but expensive at execution time. * 17b: Both greedy strategies perform poorly. The query forms a fan-out star around movie_id, where locally attractive joins duplicate keys early and force expensive downstream probes to be repeated many times. This is a systematic limitation of myopic greedy join ordering: even with accurate local estimates, a one-step objective cannot account for fan-out–induced amplification of later work. As a result, both cost and result_size fail for the same structural reason. ## Representative cases where GOO(result_size) outperforms DP In addition to regressions, I include a small number of cases where greedy join ordering outperforms DP. These cases are not meant to suggest that greedy ordering makes intrinsically better decisions, but to illustrate different failure modes: avoiding fan-out amplification under severe underestimation (29c, 31a). TL;DR: * 29c: Under severe cardinality underestimation, DP introduces early fan-out, producing large intermediates and repeated inner execution. GOO(result_size) starts from a highly selective predicate and avoids the fan-out explosion. * 31a: DP chooses an early multiplying join under severe underestimation, causing row counts to explode. GOO(result_size) delays the multiplying join until a more selective prefix is built. Detailed analysis: * 29c: DP assumes an early join prefix is nearly singleton, but in reality it produces a very large number of rows, leading to massive fan-out and repeated inner scans/probes. Once execution enters a “many loops × inner cost” regime, the error becomes catastrophic. GOO(result_size) anchors the plan on a highly selective base predicate and only expands after a small prefix is established, keeping intermediate results small even under severe misestimation. * 31a: This case exhibits a similar fan-out failure mode, but for a different reason. DP treats an early join as nearly one-to-one when it is actually high-multiplicity, causing rows to multiply much earlier than expected. Unlike 29c, there is no single highly selective starting predicate. GOO(result_size) still performs better by postponing the multiplying join until after a more selective prefix is formed, so the multiplicative effect applies to fewer rows.