Обсуждение: Add a greedy join search algorithm to handle large join problems

Поиск

Список

Период

Сортировка

Add a greedy join search algorithm to handle large join problems

От

Chengpeng Yan

Дата:

02 декабря, 06:48:26

Hi hackers,

This patch implements GOO (Greedy Operator Ordering), a greedy
join-order search method for large join problems, based on Fegaras (DEXA
’98) [1]. The algorithm repeatedly selects, among all legal joins, the
join pair with the lowest estimated total cost, merges them, and
continues until a single join remains. Patch attached.

To get an initial sense of performance, I reused the star join /
snowflake examples and the testing script from the thread in [2]. The
star-join GUC in that SQL workload was replaced with
`enable_goo_join_search`, so the same tests can run under DP (standard
dynamic programming) / GEQO(Genetic Query Optimizer) / GOO. For these
tests, geqo_threshold was set to 15 for DP, and to 5 for both GEQO and
GOO. Other planner settings, including join_collapse_limit, remained at
their defaults.

On my local machine, a single-client pgbench run produces the following
throughput (tps):

                    |    DP    |   GEQO   |    GOO
--------------------+----------+----------+-----------
starjoin    (inner) | 1762.52 | 192.13 | 6168.89
starjoin    (outer) | 1683.92 | 173.90 | 5626.56
snowflake   (inner) | 1829.04 | 133.40 | 3929.57
snowflake   (outer) | 1397.93 |   99.65 | 3040.52

Open questions:
* The paper ranks joins by estimated result size, while this prototype
reuses the existing planner total_cost. This makes the implementation
straightforward, but it may be worth exploring row-based rule.
* The prototype allocates through multiple memory contexts, aiming to
reduce memory usage during candidate evaluation. Suggestions on
simplification are welcome.
* Planning performance vs GEQO: On the star/snowflake benchmarks above,
GOO reduces planning time significantly compared to GEQO. Broader
evaluation on more representative workloads (e.g., TPC-H / TPC-DS /
JOB) is TODO, covering both planning speed and plan quality.
* There is no tuning knob like `geqo_seed` for GEQO if GOO produces a
poor ordering.
* The long-term integration is open:
* fully replace GEQO,
* keep GEQO available and optionally try both,
* or use GOO as a starting point for GEQO.

Comments and review would be appreciated.

References:
[1] Leonidas Fegaras, “A New Heuristic for Optimizing Large Queries”,
DEXA ’98. ACM Digital Library:
https://dl.acm.org/doi/10.5555/648311.754892 A publicly accessible
PostScript version is available from the author's page:
https://lambda.uta.edu/order.ps
[2] Star/snowflake join thread and benchmarks:
https://www.postgresql.org/message-id/a22ec6e0-92ae-43e7-85c1-587df2a65f51%40vondra.me

--
Best regards,
Chengpeng Yan

Вложения

v1-0001-Add-GOO-Greedy-Operator-Ordering-join-search-as-a.patch

Re: Add a greedy join search algorithm to handle large join problems

От

Dilip Kumar

Дата:

02 декабря, 08:36:50

On Tue, Dec 2, 2025 at 9:18 AM Chengpeng Yan <chengpeng_yan@outlook.com> wrote:
>
> Hi hackers,
>
> This patch implements GOO (Greedy Operator Ordering), a greedy
> join-order search method for large join problems, based on Fegaras (DEXA
> ’98) [1]. The algorithm repeatedly selects, among all legal joins, the
> join pair with the lowest estimated total cost, merges them, and
> continues until a single join remains. Patch attached.

Interesting.

> To get an initial sense of performance, I reused the star join /
> snowflake examples and the testing script from the thread in [2]. The
> star-join GUC in that SQL workload was replaced with
> `enable_goo_join_search`, so the same tests can run under DP (standard
> dynamic programming) / GEQO(Genetic Query Optimizer) / GOO. For these
> tests, geqo_threshold was set to 15 for DP, and to 5 for both GEQO and
> GOO. Other planner settings, including join_collapse_limit, remained at
> their defaults.
>
> On my local machine, a single-client pgbench run produces the following
> throughput (tps):
>
>                     |    DP    |   GEQO   |    GOO
> --------------------+----------+----------+-----------
> starjoin    (inner) |  1762.52 |  192.13  |  6168.89
> starjoin    (outer) |  1683.92 |  173.90  |  5626.56
> snowflake   (inner) |  1829.04 |  133.40  |  3929.57
> snowflake   (outer) |  1397.93 |   99.65  |  3040.52
>

Is pgbench the right workload to test this, I mean what are we trying
to compare here the planning time taken by DP vs GEQO vs GOO or the
quality of the plans generated by different join ordering algorithms
or both?  All pgbench queries are single table scans and there is no
involvement of the join search, so I am not sure how we can justify
these gains?


--
Regards,
Dilip Kumar
Google

Re: Add a greedy join search algorithm to handle large join problems

От

Chengpeng Yan

Дата:

02 декабря, 13:22:51

Hi,

Thanks for taking a look.

> On Dec 2, 2025, at 13:36, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> Is pgbench the right workload to test this, I mean what are we trying
> to compare here the planning time taken by DP vs GEQO vs GOO or the
> quality of the plans generated by different join ordering algorithms
> or both?  All pgbench queries are single table scans and there is no
> involvement of the join search, so I am not sure how we can justify
> these gains?

Just to clarify: as noted in the cover mail, the numbers are not from
default pgbench queries, but from the star-join / snowflake workloads in
thread [1], using the benchmark included in the v5-0001 patch. These
workloads contain multi-table joins and do trigger join search; you can
reproduce them by configuring the GUCs as described in the cover mail.

The benchmark tables contain no data, so execution time is negligible;
the results mainly reflect planning time of the different join-ordering
methods, which is intentional for this microbenchmark.

A broader evaluation on TPC-H / TPC-DS / JOB is TODO, covering both
planning time and plan quality. That should provide a more
representative picture of GOO, beyond this synthetic setup.

References:
[1] Star/snowflake join thread and benchmarks:
https://www.postgresql.org/message-id/a22ec6e0-92ae-43e7-85c1-587df2a65f51%40vondra.me

--
Best regards,
Chengpeng Yan

Re: Add a greedy join search algorithm to handle large join problems

От

Tomas Vondra

Дата:

02 декабря, 13:56:13

On 12/2/25 04:48, Chengpeng Yan wrote:
> Hi hackers,
> 
> This patch implements GOO (Greedy Operator Ordering), a greedy
> join-order search method for large join problems, based on Fegaras (DEXA
> ’98) [1]. The algorithm repeatedly selects, among all legal joins, the
> join pair with the lowest estimated total cost, merges them, and
> continues until a single join remains. Patch attached.
> 
> To get an initial sense of performance, I reused the star join /
> snowflake examples and the testing script from the thread in [2]. The
> star-join GUC in that SQL workload was replaced with
> `enable_goo_join_search`, so the same tests can run under DP (standard
> dynamic programming) / GEQO(Genetic Query Optimizer) / GOO. For these
> tests, geqo_threshold was set to 15 for DP, and to 5 for both GEQO and
> GOO. Other planner settings, including join_collapse_limit, remained at
> their defaults.
> 
> On my local machine, a single-client pgbench run produces the following
> throughput (tps):
> 
>                     |    DP    |   GEQO   |    GOO
> --------------------+----------+----------+-----------
> starjoin    (inner) |  1762.52 |  192.13  |  6168.89
> starjoin    (outer) |  1683.92 |  173.90  |  5626.56
> snowflake   (inner) |  1829.04 |  133.40  |  3929.57
> snowflake   (outer) |  1397.93 |   99.65  |  3040.52
> 

Seems interesting, and also much more ambitious than what I intended to
do in the starjoin thread (which is meant to be just a simplistic
heuristics on top of the regular join order planning).

I think a much broader evaluation will be needed, comparing not just the
planning time, but also the quality of the final plan. Which for the
starjoin tests does not really matter, as the plans are all equal in
this regard.


regards

-- 
Tomas Vondra

Re: Add a greedy join search algorithm to handle large join problems

От

Chengpeng Yan

Дата:

02 декабря, 16:04:11

Hi,

> On Dec 2, 2025, at 18:56, Tomas Vondra <tomas@vondra.me> wrote:
> 
> I think a much broader evaluation will be needed, comparing not just the
> planning time, but also the quality of the final plan. Which for the
> starjoin tests does not really matter, as the plans are all equal in
> this regard.

Many thanks for your feedback. 

You are absolutely right — plan quality is also very important. In my
initial email I only showed the improvements in planning time, but did
not provide results regarding plan quality. I will run tests on more
complex join scenarios, evaluating both planning time and plan quality.

Thanks again!
--
Best regards,
Chengpeng Yan

Re: Add a greedy join search algorithm to handle large join problems

От

Tomas Vondra

Дата:

09 декабря, 22:20:32

On 12/2/25 14:04, Chengpeng Yan wrote:
> Hi,
> 
> 
> 
>> On Dec 2, 2025, at 18:56, Tomas Vondra <tomas@vondra.me> wrote:
>>
>> I think a much broader evaluation will be needed, comparing not just the
>> planning time, but also the quality of the final plan. Which for the
>> starjoin tests does not really matter, as the plans are all equal in
>> this regard.
> 
> 
> Many thanks for your feedback. 
> 
> You are absolutely right — plan quality is also very important. In my
> initial email I only showed the improvements in planning time, but did
> not provide results regarding plan quality. I will run tests on more
> complex join scenarios, evaluating both planning time and plan quality.
> 

I was trying to do some simple experiments by comparing plans for TPC-DS
queries, but unfortunately I get a lot of crashes with the patch. All
the backtraces look very similar - see the attached example. The root
cause seems to be that sort_inner_and_outer() sees

    inner_path = NULL

I haven't investigated this very much, but I suppose the GOO code should
be calling set_cheapest() from somewhere.

regards

-- 
Tomas Vondra

Вложения

crash.txt

Re: Add a greedy join search algorithm to handle large join problems

От

Tomas Vondra

Дата:

10 декабря, 02:30:47


On 12/9/25 20:20, Tomas Vondra wrote:
> On 12/2/25 14:04, Chengpeng Yan wrote:
>> Hi,
>>
>>
>>
>>> On Dec 2, 2025, at 18:56, Tomas Vondra <tomas@vondra.me> wrote:
>>>
>>> I think a much broader evaluation will be needed, comparing not just the
>>> planning time, but also the quality of the final plan. Which for the
>>> starjoin tests does not really matter, as the plans are all equal in
>>> this regard.
>>
>>
>> Many thanks for your feedback. 
>>
>> You are absolutely right — plan quality is also very important. In my
>> initial email I only showed the improvements in planning time, but did
>> not provide results regarding plan quality. I will run tests on more
>> complex join scenarios, evaluating both planning time and plan quality.
>>
> 
> I was trying to do some simple experiments by comparing plans for TPC-DS
> queries, but unfortunately I get a lot of crashes with the patch. All
> the backtraces look very similar - see the attached example. The root
> cause seems to be that sort_inner_and_outer() sees
> 
>     inner_path = NULL
> 
> I haven't investigated this very much, but I suppose the GOO code should
> be calling set_cheapest() from somewhere.
> 

FWIW after looking at the failing queries for a bit, and a bit of
tweaking, it seems the issue is about aggregates in the select list. For
example this TPC-DS query fails (Q7):

 select  i_item_id,
        avg(ss_quantity) agg1,
        avg(ss_list_price) agg2,
        avg(ss_coupon_amt) agg3,
        avg(ss_sales_price) agg4
 from store_sales, customer_demographics, date_dim, item, promotion
 where ss_sold_date_sk = d_date_sk and
       ss_item_sk = i_item_sk and
       ss_cdemo_sk = cd_demo_sk and
       ss_promo_sk = p_promo_sk and
       cd_gender = 'F' and
       cd_marital_status = 'W' and
       cd_education_status = 'Primary' and
       (p_channel_email = 'N' or p_channel_event = 'N') and
       d_year = 1998
 group by i_item_id
 order by i_item_id
 LIMIT 100;

but if I remove the aggregates, it plans just fine:

 select i_item_id
 from store_sales, customer_demographics, date_dim, item, promotion
 where ss_sold_date_sk = d_date_sk and
       ss_item_sk = i_item_sk and
       ss_cdemo_sk = cd_demo_sk and
       ss_promo_sk = p_promo_sk and
       cd_gender = 'F' and
       cd_marital_status = 'W' and
       cd_education_status = 'Primary' and
       (p_channel_email = 'N' or p_channel_event = 'N') and
       d_year = 1998
 group by i_item_id
 order by i_item_id
 LIMIT 100;

The backtrace matches the one I already posted, I'm not going to post
that again.

I looked at a couple more failing queries, and removing the aggregates
fixes them too. Maybe there are other issues/crashes, of course.


regards

-- 
Tomas Vondra

Re: Add a greedy join search algorithm to handle large join problems

От

Chengpeng Yan

Дата:

10 декабря, 08:20:01

Hi,

> On Dec 10, 2025, at 07:30, Tomas Vondra <tomas@vondra.me> wrote:
>
> I looked at a couple more failing queries, and removing the aggregates
> fixes them too. Maybe there are other issues/crashes, of course.

Thanks a lot for pointing this out. I also noticed the same issue when
testing TPC-H Q5. The root cause was that in the goo algorithm I forgot
to handle the case of eager aggregation. This has been fixed in the v2
patch (after the fix, the v2 version works correctly for all TPC-H
queries). I will continue testing it on TPC-DS as well.

Sorry that I didn’t push the related changes earlier. I was running more
experiments on the greedy strategy, and I still needed some time to
organize and analyze the results. During this process, I found that the
current greedy strategy may lead to suboptimal plan quality in some
cases. Because of that, I plan to first evaluate a few basic greedy
heuristics on TPC-H to understand their behavior and limitations. If
there are meaningful results or conclusions, I will share them for
discussion.

Based on some preliminary testing, I’m leaning toward keeping the greedy
strategy simple and explainable, and focusing on the following
indicators together with the planner’s cost estimates:
* join cardinality (number of output rows)
* selectivity (join_size / (left_size * right_size))
* estimated result size in bytes(joinrel->reltarget->width * joinrel->rows)
* cheapest total path cost (cheapest_total_path->total_cost)

At the moment, I’m inclined to prioritize join cardinality with input
size as a secondary factor, but I’d like to validate this step by step:
first by testing these simple heuristics on TPC-H (as a relatively
simple workload) and summarizing some initial conclusions there. After
that, I plan to run more comprehensive experiments on more complex
benchmarks such as JOB and TPC-DS and report the results.

Do you have any thoughts or suggestions on this direction?

Thanks again for your feedback and help.

--
Best regards,
Chengpeng Yan

Вложения

v2-0001-Add-GOO-Greedy-Operator-Ordering-join-search-as-a.patch

Re: Add a greedy join search algorithm to handle large join problems

От

Tomas Vondra

Дата:

10 декабря, 13:20:14

On 12/10/25 06:20, Chengpeng Yan wrote:
> Hi,
> 
>> On Dec 10, 2025, at 07:30, Tomas Vondra <tomas@vondra.me> wrote:
>> 
>> I looked at a couple more failing queries, and removing the aggregates
>> fixes them too. Maybe there are other issues/crashes, of course.
> 
> Thanks a lot for pointing this out. I also noticed the same issue when
> testing TPC-H Q5. The root cause was that in the goo algorithm I forgot
> to handle the case of eager aggregation. This has been fixed in the v2
> patch (after the fix, the v2 version works correctly for all TPC-H
> queries). I will continue testing it on TPC-DS as well.
> 

I can confirm v2 makes it work for planning all 99 TPC-DS queries, i.e.
there are no more crashes during EXPLAIN.

Comparing the plans from master/geqo and master/goo, I see about 80% of
them changed. I haven't done any evaluation of how good the plans are,
really. I'll see if I can get some numbers next, but it'll take a while.

It's also tricky as plan choices depend on GUCs like random_page_cost,
and if those values are not good enough, the optimizer may still end up
with a bad plan. I'm not sure what's the best approach.

I did however notice an interesting thing - running EXPLAIN on the 99
queries (for 3 scales and 0/4 workers, so 6x 99) took this much time:

master:       8s
master/geqo: 20s
master/goo:   5s

Where master/geqo means

  geqo_threshold=2

and master/goo means

  geqo_threshold=2
  enable_goo_join_search = on

It's nice that "goo" seems to be faster than "geqo" - assuming the plans
are comparable or better. But it surprised me switching to geqo makes it
slower than master. That goes against my intuition that geqo is meant to
be cheaper/faster join order planning. But maybe I'm missing something.

> Sorry that I didn’t push the related changes earlier. I was running more
> experiments on the greedy strategy, and I still needed some time to
> organize and analyze the results. During this process, I found that the
> current greedy strategy may lead to suboptimal plan quality in some
> cases. Because of that, I plan to first evaluate a few basic greedy
> heuristics on TPC-H to understand their behavior and limitations. If
> there are meaningful results or conclusions, I will share them for
> discussion.
> 
> Based on some preliminary testing, I’m leaning toward keeping the greedy
> strategy simple and explainable, and focusing on the following
> indicators together with the planner’s cost estimates:
> * join cardinality (number of output rows)
> * selectivity (join_size / (left_size * right_size))
> * estimated result size in bytes(joinrel->reltarget->width * joinrel->rows)
> * cheapest total path cost (cheapest_total_path->total_cost)
> 
> At the moment, I’m inclined to prioritize join cardinality with input
> size as a secondary factor, but I’d like to validate this step by step:
> first by testing these simple heuristics on TPC-H (as a relatively
> simple workload) and summarizing some initial conclusions there. After
> that, I plan to run more comprehensive experiments on more complex
> benchmarks such as JOB and TPC-DS and report the results.
> 
> Do you have any thoughts or suggestions on this direction?
> 
> Thanks again for your feedback and help.
> 

No opinion.

IIUC it's a simplified heuristics, replacing the "full" join planning
algorithm. So inevitably there will be cases when it produces plans that
are "worse" than the actual join order planning. I don't have a great
intuition what's the right trade off yet. Or am I missing something?

regards

-- 
Tomas Vondra

Re: Add a greedy join search algorithm to handle large join problems

От

John Naylor

Дата:

11 декабря, 05:53:09

On Wed, Dec 10, 2025 at 5:20 PM Tomas Vondra <tomas@vondra.me> wrote:
> I did however notice an interesting thing - running EXPLAIN on the 99
> queries (for 3 scales and 0/4 workers, so 6x 99) took this much time:
>
> master:       8s
> master/geqo: 20s
> master/goo:   5s

> It's nice that "goo" seems to be faster than "geqo" - assuming the plans
> are comparable or better. But it surprised me switching to geqo makes it
> slower than master. That goes against my intuition that geqo is meant to
> be cheaper/faster join order planning. But maybe I'm missing something.

Yeah, that was surprising. It seems that geqo has a large overhead, so
it takes a larger join problem for the asymptotic behavior to win over
exhaustive search.

--
John Naylor
Amazon Web Services

Re: Add a greedy join search algorithm to handle large join problems

От

Pavel Stehule

Дата:

11 декабря, 09:12:29

čt 11. 12. 2025 v 3:53 odesílatel John Naylor <johncnaylorls@gmail.com> napsal:

On Wed, Dec 10, 2025 at 5:20 PM Tomas Vondra <tomas@vondra.me> wrote:
> I did however notice an interesting thing - running EXPLAIN on the 99
> queries (for 3 scales and 0/4 workers, so 6x 99) took this much time:
>
> master: 8s
> master/geqo: 20s
> master/goo: 5s

> It's nice that "goo" seems to be faster than "geqo" - assuming the plans
> are comparable or better. But it surprised me switching to geqo makes it
> slower than master. That goes against my intuition that geqo is meant to
> be cheaper/faster join order planning. But maybe I'm missing something.

Yeah, that was surprising. It seems that geqo has a large overhead, so
it takes a larger join problem for the asymptotic behavior to win over
exhaustive search.

If I understand correctly to design - geqo should be slower for any queries with smaller complexity. The question is how many queries in the tested model are really complex.

--
John Naylor
Amazon Web Services

Re: Add a greedy join search algorithm to handle large join problems

От

Tomas Vondra

Дата:

11 декабря, 20:07:54

On 12/11/25 07:12, Pavel Stehule wrote:
> 
> 
> čt 11. 12. 2025 v 3:53 odesílatel John Naylor <johncnaylorls@gmail.com
> <mailto:johncnaylorls@gmail.com>> napsal:
> 
>     On Wed, Dec 10, 2025 at 5:20 PM Tomas Vondra <tomas@vondra.me
>     <mailto:tomas@vondra.me>> wrote:
>     > I did however notice an interesting thing - running EXPLAIN on the 99
>     > queries (for 3 scales and 0/4 workers, so 6x 99) took this much time:
>     >
>     > master:       8s
>     > master/geqo: 20s
>     > master/goo:   5s
> 
>     > It's nice that "goo" seems to be faster than "geqo" - assuming the
>     plans
>     > are comparable or better. But it surprised me switching to geqo
>     makes it
>     > slower than master. That goes against my intuition that geqo is
>     meant to
>     > be cheaper/faster join order planning. But maybe I'm missing
>     something.
> 
>     Yeah, that was surprising. It seems that geqo has a large overhead, so
>     it takes a larger join problem for the asymptotic behavior to win over
>     exhaustive search.
> 
> 
> If I understand correctly to design - geqo should be slower for any
> queries with smaller complexity. The question is how many queries in the
> tested model are really complex.
> 

Depends on what you mean by "really complex". TPC-DS queries are not
trivial, but the complexity may not be in the number of joins.

Of course, setting geqo_threshold to 2 may be too aggressive. Not sure.


regards

-- 
Tomas Vondra

Re: Add a greedy join search algorithm to handle large join problems

От

Tomas Vondra

Дата:

11 декабря, 20:30:46

Hi,

Here's a more complete set of results from a TPC-DS run. See the
run-queries-2.sh script for more details. There are also .sql files with
DDL to create the database, etc. It does not include the parts to
generate the data etc. (you'll need to the generator from TPC site).

The attached CSV has results for scales 1 and 10, with 0 and 4 parallel
workers. It runs three configurations:

- master (geqo=off, threshold=12)
- master-geqo (goo=off, threshold=2)
- master-goo (goo=on, threshold=2)

There's a couple more fields, e.g. whether it's cold/cached run, etc.

A very simple summary of the results is the total duration of the run,
for all 99 queries combined:

  scale  workers     caching     master     master-geqo    master-goo
  ===================================================================
      1        0        cold        816             399          1124
                        warm        784             369          1097
               4        cold        797             384          1085
                        warm        774             366          1069
  -------------------------------------------------------------------
     10        0        cold       2760            2653          2340
                        warm       2580            2470          2177
               4        cold       2563            2423          1969
                        warm       2439            2325          1859

This is interesting, and also a bit funny.

The funny part is that geqo seems to do better than master - on scale 1
it's pretty clear, on scale 10 the difference is much smaller.

The interesting part is that "goo" is doing worse than master (or geqo)
on scale 1, and better on scale 10. I wonder how would it do on larger
scales, but I don't have such results.

There's a PDF with per-query results too.

This may be a little bit misleading because the statement timeout was
set to 300s, and there's a couple queries that did not complete before
this timeout. Maybe it'd be better to not include these queries. I
haven't tried, though.

It might be interesting to look at some of the queries that got worse,
and check why. Maybe that'd help you with picking the heuristics?

FWIW I still think no heuristics can be perfect, so getting slower plans
for some queries should not be treated as "hard failure". The other
thing is the quality of plans depends on GUCs like random_page_cost, and
I kept them at default values.

Anyway, I hope this is helpful input.


regards

-- 
Tomas Vondra

Вложения

Re: Add a greedy join search algorithm to handle large join problems

От

Pavel Stehule

Дата:

11 декабря, 20:33:52

čt 11. 12. 2025 v 18:07 odesílatel Tomas Vondra <tomas@vondra.me> napsal:

On 12/11/25 07:12, Pavel Stehule wrote:
>
>
> čt 11. 12. 2025 v 3:53 odesílatel John Naylor <johncnaylorls@gmail.com
> <mailto:johncnaylorls@gmail.com>> napsal:
>
> On Wed, Dec 10, 2025 at 5:20 PM Tomas Vondra <tomas@vondra.me
> <mailto:tomas@vondra.me>> wrote:
> > I did however notice an interesting thing - running EXPLAIN on the 99
> > queries (for 3 scales and 0/4 workers, so 6x 99) took this much time:
> >
> > master: 8s
> > master/geqo: 20s
> > master/goo: 5s
>
> > It's nice that "goo" seems to be faster than "geqo" - assuming the
> plans
> > are comparable or better. But it surprised me switching to geqo
> makes it
> > slower than master. That goes against my intuition that geqo is
> meant to
> > be cheaper/faster join order planning. But maybe I'm missing
> something.
>
> Yeah, that was surprising. It seems that geqo has a large overhead, so
> it takes a larger join problem for the asymptotic behavior to win over
> exhaustive search.
>
>
> If I understand correctly to design - geqo should be slower for any
> queries with smaller complexity. The question is how many queries in the
> tested model are really complex.
>

Depends on what you mean by "really complex". TPC-DS queries are not
trivial, but the complexity may not be in the number of joins.

Of course, setting geqo_threshold to 2 may be too aggressive. Not sure.

I checked the TPC-H queries and almost all queries are simple - 5 x JOIN -- 2x nested subselect

regards

--
Tomas Vondra

Re: Add a greedy join search algorithm to handle large join problems

От

Chengpeng Yan

Дата:

12 декабря, 12:50:38

> On Dec 10, 2025, at 18:20, Tomas Vondra <tomas@vondra.me> wrote:
> 
> I can confirm v2 makes it work for planning all 99 TPC-DS queries, i.e.
> there are no more crashes during EXPLAIN.

Thanks a lot for testing this — much appreciated.

> It's also tricky as plan choices depend on GUCs like random_page_cost,
> and if those values are not good enough, the optimizer may still end up
> with a bad plan. I'm not sure what's the best approach.

I agree that many other cost-related parameters can influence the join
order. For now, I’m still running my tests with the default settings,
and I’m not entirely sure how large the impact of those cost parameters
will be. This is something I’ll probably consider at a later stage.

> I did however notice an interesting thing - running EXPLAIN on the 99
> queries (for 3 scales and 0/4 workers, so 6x 99) took this much time:
> 
> master:       8s
> master/geqo: 20s
> master/goo:   5s
> 
> It's nice that "goo" seems to be faster than "geqo" - assuming the plans
> are comparable or better.

One additional advantage of goo is that its memory usage should be
better than DP/GEQO (though I haven’t done any measurements yet).
But I don’t think this is the main concern at the moment — I’m just
mentioning it briefly here. What matters more right now is the plan
quality itself.

> On Dec 12, 2025, at 01:30, Tomas Vondra <tomas@vondra.me> wrote:
> 
> A very simple summary of the results is the total duration of the run,
> for all 99 queries combined:
> 
>  scale  workers     caching     master     master-geqo    master-goo
>  ===================================================================
>      1        0        cold        816             399          1124
>                        warm        784             369          1097
>               4        cold        797             384          1085
>                        warm        774             366          1069
>  -------------------------------------------------------------------
>     10        0        cold       2760            2653          2340
>                        warm       2580            2470          2177
>               4        cold       2563            2423          1969
>                        warm       2439            2325          1859
> 
> This is interesting, and also a bit funny.
> 
> The funny part is that geqo seems to do better than master - on scale 1
> it's pretty clear, on scale 10 the difference is much smaller.
> 
> The interesting part is that "goo" is doing worse than master (or geqo)
> on scale 1, and better on scale 10. I wonder how would it do on larger
> scales, but I don't have such results.

It’s interesting to see that goo performs better at scale 10. I’ll try
to dig into the results and understand the reasons behind this in
follow-up work.

> It might be interesting to look at some of the queries that got worse,
> and check why. Maybe that'd help you with picking the heuristics?
> 
> FWIW I still think no heuristics can be perfect, so getting slower plans
> for some queries should not be treated as "hard failure".

I agree that no set of heuristics can be perfect. The best we can do is
to look at existing workloads and understand why certain heuristics
don’t work there, and then evolve them toward strategies that are more
general and more aligned with the nature of joins themselves. There
doesn’t seem to be a silver bullet here — it really comes down to
analyzing specific SQL queries case by case.

Finally, thank you very much for the testing you did and for all the
insights you shared — it helped a lot.

--
Best regards,
Chengpeng Yan

Re: Add a greedy join search algorithm to handle large join problems

От

Chengpeng Yan

Дата:

16 декабря, 17:38:03

Recently, I have been testing the TPC-H SF=1 dataset using four simple
greedy join-ordering strategies: join cardinality (estimated output
rows), selectivity, estimated result size in bytes, and cheapest total
path cost. These can be roughly seen as either output-oriented
heuristics (rows / selectivity / result size), which try to optimize the
shape of intermediate results, or a cost-oriented heuristic, which
prefers the locally cheapest join step.

The main goal of these experiments is to check whether the current
greedy rules show obvious structural weaknesses, and to use the observed
behavior as input for thinking about how a greedy rule might evolve.
While there is unlikely to be a perfect greedy strategy, I am hoping to
identify approaches that behave reasonably well across many cases and
avoid clear pathological behavior.

In the attached files, v3-0001 is identical to the previously submitted
v2-0001 patch and contains the core implementation of the GOO algorithm.
The v3-0002 patch adds testing-only code to evaluate different greedy
rules, including a GUC (goo_greedy_strategy) used only for switching
strategies during experiments.

All tests were performed on the TPC-H SF=1 dataset. After loading the
data, I ran the following commands before executing the benchmarks:

```
VACUUM FREEZE ANALYZE;
CHECKPOINT;

ALTER SYSTEM SET join_collapse_limit = 100;
ALTER SYSTEM SET max_parallel_workers_per_gather = 0;
ALTER SYSTEM SET statement_timeout = 600000;
ALTER SYSTEM SET shared_buffers = ‘4GB’;
ALTER SYSTEM SET effective_cache_size = ‘8GB’;
ALTER SYSTEM SET work_mem = ‘1GB’;
SELECT pg_reload_conf();
```

The detailed benchmark results are summarized in tpch.pdf. Execution
times are reported as ratios, using the DP-based optimizer’s execution
time as the baseline (1.0).

The compressed archive tpch_tests_result.zip contains summary.csv, which
is the raw data used to generate tpch.pdf and was produced by the
run_job.sh script. It also includes files (xxx_plan.txt), which were
generated by the run_analysis.sh script and record the EXPLAIN ANALYZE
outputs for the same query under different join-ordering algorithms, to
make plan differences easier to compare.

Based on the TPC-H results, my high-level observations are:
* The threeoutput-oriented greedy rules (rows, selectivity, result size)
show noticeable regressions compared to DP overall, with a relatively
large number of outliers.
* Using total path cost as the greedy key produces results that are
generally closer to DP, but still shows some clear outliers.

To understand why these regressions occur, I mainly looked at Q20 and
Q7, which show particularly large and consistent regressions and expose
different failure modes.

In Q20, there is a join between partsupp and an aggregated lineitem
subquery. For this join, the planner’s rowcount estimate is wrong by
orders of magnitude (tens of rows estimated versus hundreds of thousands
actually produced). As a result, output-oriented greedy rules strongly
prefer this join very early, because it appears to be extremely
shrinking. In reality, it processes large inputs and produces a large
intermediate, and this early misordering can significantly amplify
downstream join costs. This makes Q20 a clear outlier for output-based
greedy rules when estimates are severely wrong.

Q7 exposes a different issue. The cost-based greedy rule tends to choose
a locally cheap join early, but that join creates an intermediate which
later joins become much more expensive to process. In this case, an
early commitment under relatively weak constraints leads to a
many-to-many intermediate that is only filtered after fact-table joins
are applied. This illustrates how a purely cost-driven greedy rule can
make locally reasonable decisions that turn out to be globally harmful.

Taken together, these outliers suggest that all four single-metric
greedy rules tested so far have structural limitations. Output-oriented
rules appear fragile when join rowcount estimates are badly wrong, while
cost-oriented greedy decisions can still lead to locally reasonable but
globally poor plans.

One question this naturally raises is whether making irreversible greedy
choices based only on a local ranking signal is sufficient, or whether
some mechanism is needed to make the approach more robust and to limit
the impact of such outliers.

As a next step, based on the current results, I plan to ignore
selectivity (which performs poorly in many cases), treat rows as largely
redundant with result_size, and move on to testing on the JOB benchmark.
I also plan to compare the behavior of DP, GEQO, and GOO on JOB, and to
use those results to better understand which signals are most useful for
guiding greedy decisions.

I would be very interested in hearing people’s thoughts on these
observations and on possible directions to explore next.

--
Best regards,
Chengpeng Yan

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Add a greedy join search algorithm to handle large join problems

Вложения

Вложения

Вложения

Вложения

Вложения