Обсуждение: Re: should we have a fast-path planning for OLTP starjoins?

Поиск
Список
Период
Сортировка

Re: should we have a fast-path planning for OLTP starjoins?

От
Nico Williams
Дата:
On Tue, Feb 04, 2025 at 03:00:49PM +0100, Tomas Vondra wrote:

I'm late to this party.  I have apps that do start queries like this a
lot.  These apps fall into this type:

> Or maybe the fact table is "users" and the dimensions have all kinds of
> info about the user (address, primary e-mail address, balance, ...).

In my case the schema is an EAV schema, but essentially it stores users,
groups, and many other such things -- the sorts you expect to find in a
directory service.

Some unsolicited advice:

 - the table source with the longest index key prefix (or full key)
   determined by WHERE clause terms is the table that should lead the
   query plan

   This is really important for the case where the query is a VIEW and
   the WHERE clause terms can get pushed into it, then:

   SELECT ...
   FROM t0
   JOIN t1 ON ...
   JOIN t2 ON ...
   ..
   JOIN tN ON ...
   -- tX's PK is (a, b, c), or maybe (a, b, c, d), but (a, b, c) is a
   -- very selective prefix of that PK index, so the query plan should
   -- start with tX
   WHERE tX.a = ... AND tX.b = ... AND tX.c = ...

 - if there is no such table source then we're asking for "all the
   data", and we might as well start with a full table scan of some
   table source, but, which?  my answer: the lexically first one in the
   query!  (why?  because it gives the query author the power to pick
   which table goes first in this case)

   SELECT ...
   FROM t0
   JOIN ...
   ...; -- no WHERE clause or no terms that provide values for prefixes
        -- of indices that we could use to find a good choice of
        -- starting table, so start with t0

> Anyway, this pattern is quite common, yet it performs quite poorly.

Yes.

> But for starjoins, a lot of this is not really needed. In the simplest
> case (no conditions on dimensions etc) it does not really matter in what
> order we join those, and filters on dimensions make it only a little bit

But here you can just use the order that the SQL uses.  It gives the
author some power.

> more complicated (join the most selective first).

Yes.

> So I've been wondering how difficult would it be to have a special
> fast-path mode for starjoins, completely skipping most of this. I
> cobbled together a WIP/PoC patch (attached) on the way from FOSDEM, and
> it seems to help quite a bit.

Yay!  _Thank you_!

> So that about triples the throughput. If you bump join_collapse_limit to
> 12, it gets even clearer
> 
>      build              1            16
>   --------------------------------------
>     master            200          2000
>    patched           4500         48000
> 
> That's a 20x improvement - not bad. Sure, this is on a tiny dataset, and
> with larger data sets it might need to do I/O, diminishing the benefits.
> It's just an example to demonstrate the benefits.

Fantastic!  I can't wait to use this in prod.

> But the bigger question is whether it makes sense to have such fast-path
> modes for certain query shapes. The patch "hard-codes" the planning for
> starjoin queries, but we clearly can't do that for every possible join
> shape (because then why have dynamic join search at all?).

IMO: Yes for starjoins.  The reason is:

> I do think starjoins might be sufficiently unique / special to justify
> this, but maybe it would be possible to instead improve the regular join
> order search to handle this case better? I don't have a very clear idea
> what would that look like, though :-(

If you're not pessimizing other cases and you're getting a 4x to 20x
improvement then the uniqueness of the starjoin case and the frequent
use of starjoin queries makes this fast-path worthwhile.

Moreover, I think in general if you can cheaply determine a "kind" of
query and then apply query plan searches / optimizations that are most
relevant to the query's "kind" then I think that's a good way to unlock
more useful optimizations.

> I did check what do some other databases do, and they often have some
> sort of "hint" to nudge the let the optimizer know this is a starjoin.

I don't think a hint is needed here.

BTW, I like hints, but only out-of-band, not embedded in the SQL.
Unfortunately out-of-band hinting is not really viable because to
introduce it one would need new APIs (and possibly protocol work).

Nico
-- 



Re: should we have a fast-path planning for OLTP starjoins?

От
Tom Lane
Дата:
Nico Williams <nico@cryptonector.com> writes:
> Some unsolicited advice:
> ...
> But here you can just use the order that the SQL uses.  It gives the
> author some power.

If that's the approach you want, it's been possible for decades:
"set join_collapse_limit = 1" and away you go.  I don't feel a
need to invent a different version of that for star schemas.

I do not think this patch should have ambitions beyond the stated
one of avoiding useless join-order search effort.  If you try to
load more than that onto the plate you'll probably end in failure.

            regards, tom lane



Re: should we have a fast-path planning for OLTP starjoins?

От
Tomas Vondra
Дата:
On 11/15/25 16:57, Tom Lane wrote:
> Nico Williams <nico@cryptonector.com> writes:
>> Some unsolicited advice:
>> ...
>> But here you can just use the order that the SQL uses.  It gives the
>> author some power.
> 
> If that's the approach you want, it's been possible for decades:
> "set join_collapse_limit = 1" and away you go.  I don't feel a
> need to invent a different version of that for star schemas.
> 
> I do not think this patch should have ambitions beyond the stated
> one of avoiding useless join-order search effort.  If you try to
> load more than that onto the plate you'll probably end in failure.
> 

FWIW I certainly agree with this. The motivation is to speed up planning
with starjoin-like queries, and that's still the primary goal. If it
could handle more complex cases (snowflake, inverse starjoin), great.
But I have no ambition to make it somehow generic and much more complex.

The main thing I'm really unsure about is what to do about joins that
increase the cardinality. AFAICS that's the only possible regression if
we simply move joins with dimensions at the end. Not sure what to do
about that before the actual join search ...

regards

-- 
Tomas Vondra




Re: should we have a fast-path planning for OLTP starjoins?

От
Tom Lane
Дата:
I spent a little time staring at the v5 patches.  Obviously there
are a bunch of minor details to be verified, which you've carefully
provided XXX comments about, and I didn't really go through those
yet.  There are two big-picture questions that are bothering me:

1. I do not think I believe the premise that the dimension tables
typically won't have restriction clauses.  ISTM that a typical
query might be like

      select sum(o.total_price) from
    orders o
    join customers c on c.id = o.c_id
    join products p on p.id = o.p_id
    where c.customer_name = 'Wile E Coyote'
    and p.product_name = 'Rocket Skates';

The only reason to join a dimension table that lacks a restriction
clause is if you need some of its fields in the output, which you
might but I'm not sure that's such a common case.  (Have you got
evidence to the contrary?)  So I feel like we're not going to be
getting all that much win if we are not willing to treat such tables
as dimension tables.  We could do something simplistic like order
those dimensions by the selectivity of their baserestrict clauses,
joining the most-restricted ones first and any restriction-free ones
last.

2. I'm pretty un-excited about the 0002 patch altogether.  I'm having
a hard time visualizing cases where it helps, other than left joins
to dimension tables which I don't really think are common either.
I did a bit of poking around on the net and found that it seems to
be common to restrict star-join optimizations to equijoins (e.g.
SAP says explicitly that they only handle that case).  I think we'd
be better off to focus on the allow-baserestrict-clauses extension
than the allow-join-order-restrictions extension.

            regards, tom lane



Re: should we have a fast-path planning for OLTP starjoins?

От
Bruce Momjian
Дата:
On Fri, Nov 21, 2025 at 03:14:15PM -0500, Tom Lane wrote:
> I spent a little time staring at the v5 patches.  Obviously there
> are a bunch of minor details to be verified, which you've carefully
> provided XXX comments about, and I didn't really go through those
> yet.  There are two big-picture questions that are bothering me:
> 
> 1. I do not think I believe the premise that the dimension tables
> typically won't have restriction clauses.  ISTM that a typical
> query might be like
> 
>       select sum(o.total_price) from
>     orders o
>     join customers c on c.id = o.c_id
>     join products p on p.id = o.p_id
>     where c.customer_name = 'Wile E Coyote'
>     and p.product_name = 'Rocket Skates';

Yes, I am sure it is typical because I have seen cartoons use exactly
those products.  ;-)

> The only reason to join a dimension table that lacks a restriction
> clause is if you need some of its fields in the output, which you
> might but I'm not sure that's such a common case.  (Have you got
> evidence to the contrary?)  So I feel like we're not going to be
> getting all that much win if we are not willing to treat such tables
> as dimension tables.  We could do something simplistic like order
> those dimensions by the selectivity of their baserestrict clauses,
> joining the most-restricted ones first and any restriction-free ones
> last.

Oh, I thought the patch already did this, e.g., the patch was going to
make groups, e.g., foreign keys with restrictions, foreign keys without
restrictions, and no foreign key (might add rows).  The first group was
going to be sorted by their selectivity, and the last group was going to
be sorted by how much they add rows, if that is possible.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.



Re: should we have a fast-path planning for OLTP starjoins?

От
Tomas Vondra
Дата:

On 11/21/25 21:14, Tom Lane wrote:
> I spent a little time staring at the v5 patches.  Obviously there
> are a bunch of minor details to be verified, which you've carefully
> provided XXX comments about, and I didn't really go through those
> yet.  There are two big-picture questions that are bothering me:
> 
> 1. I do not think I believe the premise that the dimension tables
> typically won't have restriction clauses.  ISTM that a typical
> query might be like
> 
>       select sum(o.total_price) from
>     orders o
>     join customers c on c.id = o.c_id
>     join products p on p.id = o.p_id
>     where c.customer_name = 'Wile E Coyote'
>     and p.product_name = 'Rocket Skates';
> 
> The only reason to join a dimension table that lacks a restriction
> clause is if you need some of its fields in the output, which you
> might but I'm not sure that's such a common case.  (Have you got
> evidence to the contrary?)  So I feel like we're not going to be
> getting all that much win if we are not willing to treat such tables
> as dimension tables.  We could do something simplistic like order
> those dimensions by the selectivity of their baserestrict clauses,
> joining the most-restricted ones first and any restriction-free ones
> last.
> 

Good question. I don't have a great evidence such joins to dimensions
(without additional restrictions) are a common case. It's partially a
guess and partially based on my past experience.

I have seen a lot of such joins in analytical workloads, where the join
is followed by an aggregation, with GROUP BY referencing attributes from
the dimensions. Of course, that may be an argument against worrying
about the planning too much, because with enough data the timing is
going to be dominated by the join/aggregation execution. However, it's
surprising how little data many analytical workloads actually access, so
it's not that clear.

The other use case I've been thinking about is poorly written queries,
joining more tables than needed. A traditional example is an ORM loading
more data than needed, to load the whole "object". I don't know how
prevalent this is today - it used to be a pretty common issue, and I
doubt it improved. I think it's not that different from the self-join
removal (the tradeoffs may be different, of course). I realize we try
not to add complexity for such cases, especially if it might hurt well
written queries.

Actually, I initially investigated at the opposite example, i.e. all
dimensions joining to the fact.id, see create-2/select-2 scripts. And
then I realized starjoins have mostly the same issue. But it's true the
v5 patch does not actually help this original query.


> 2. I'm pretty un-excited about the 0002 patch altogether.  I'm having
> a hard time visualizing cases where it helps, other than left joins
> to dimension tables which I don't really think are common either.
> I did a bit of poking around on the net and found that it seems to
> be common to restrict star-join optimizations to equijoins (e.g.
> SAP says explicitly that they only handle that case).  I think we'd
> be better off to focus on the allow-baserestrict-clauses extension
> than the allow-join-order-restrictions extension.
> 

I recall seen such queries (with LEFT joins) in analytics workloads, but
it's definitely less common than inner starjoins. So I agree focusing on
allowing baserestrict clauses is probably more useful/important.


FWIW I tried searching for more info too, but all the SAP pages
suggested by google return 404 to me :-(


regards

-- 
Tomas Vondra




Re: should we have a fast-path planning for OLTP starjoins?

От
Tomas Vondra
Дата:
On 11/21/25 21:47, Bruce Momjian wrote:
> On Fri, Nov 21, 2025 at 03:14:15PM -0500, Tom Lane wrote:
>> I spent a little time staring at the v5 patches.  Obviously there
>> are a bunch of minor details to be verified, which you've carefully
>> provided XXX comments about, and I didn't really go through those
>> yet.  There are two big-picture questions that are bothering me:
>>
>> 1. I do not think I believe the premise that the dimension tables
>> typically won't have restriction clauses.  ISTM that a typical
>> query might be like
>>
>>       select sum(o.total_price) from
>>     orders o
>>     join customers c on c.id = o.c_id
>>     join products p on p.id = o.p_id
>>     where c.customer_name = 'Wile E Coyote'
>>     and p.product_name = 'Rocket Skates';
> 
> Yes, I am sure it is typical because I have seen cartoons use exactly
> those products.  ;-)
> 

;-)

>> The only reason to join a dimension table that lacks a restriction
>> clause is if you need some of its fields in the output, which you
>> might but I'm not sure that's such a common case.  (Have you got
>> evidence to the contrary?)  So I feel like we're not going to be
>> getting all that much win if we are not willing to treat such tables
>> as dimension tables.  We could do something simplistic like order
>> those dimensions by the selectivity of their baserestrict clauses,
>> joining the most-restricted ones first and any restriction-free ones
>> last.
> 
> Oh, I thought the patch already did this, e.g., the patch was going to
> make groups, e.g., foreign keys with restrictions, foreign keys without
> restrictions, and no foreign key (might add rows).  The first group was
> going to be sorted by their selectivity, and the last group was going to
> be sorted by how much they add rows, if that is possible.
> 

No, the patch never did that. The various XXX comments discuss that as a
future optimization. Aren't the comments clear enough?

I think it'd work about the way you described, except that joins without
foreign keys can both increase and decrease the cardinality, and those
that reduce cardinality would need to be moved to the first group.

Another question is what to do about snowflake joins, as a common
extension/generalization of starjoins. I think we'd need to identify the
groups of dimensions (for the branches), and treat those as a single
logical dimension.


regards

-- 
Tomas Vondra




Re: should we have a fast-path planning for OLTP starjoins?

От
Bruce Momjian
Дата:
On Sun, Nov 23, 2025 at 03:53:17PM +0100, Tomas Vondra wrote:
> >> The only reason to join a dimension table that lacks a restriction
> >> clause is if you need some of its fields in the output, which you
> >> might but I'm not sure that's such a common case.  (Have you got
> >> evidence to the contrary?)  So I feel like we're not going to be
> >> getting all that much win if we are not willing to treat such tables
> >> as dimension tables.  We could do something simplistic like order
> >> those dimensions by the selectivity of their baserestrict clauses,
> >> joining the most-restricted ones first and any restriction-free ones
> >> last.
> > 
> > Oh, I thought the patch already did this, e.g., the patch was going to
> > make groups, e.g., foreign keys with restrictions, foreign keys without
> > restrictions, and no foreign key (might add rows).  The first group was
> > going to be sorted by their selectivity, and the last group was going to
> > be sorted by how much they add rows, if that is possible.
> > 
> 
> No, the patch never did that. The various XXX comments discuss that as a
> future optimization. Aren't the comments clear enough?

I think my brain got lost in the patch --- I was happy I got as far as I
did in understanding it.  :-)

> I think it'd work about the way you described, except that joins without
> foreign keys can both increase and decrease the cardinality, and those
> that reduce cardinality would need to be moved to the first group.

I see, makes sense.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.



Re: should we have a fast-path planning for OLTP starjoins?

От
Robert Haas
Дата:
On Sun, Nov 23, 2025 at 9:39 AM Tomas Vondra <tomas@vondra.me> wrote:
> > 1. I do not think I believe the premise that the dimension tables
> > typically won't have restriction clauses.  ISTM that a typical
> > query might be like
> >
> >       select sum(o.total_price) from
> >       orders o
> >       join customers c on c.id = o.c_id
> >       join products p on p.id = o.p_id
> >       where c.customer_name = 'Wile E Coyote'
> >       and p.product_name = 'Rocket Skates';
> >
>
> Good question. I don't have a great evidence such joins to dimensions
> (without additional restrictions) are a common case. It's partially a
> guess and partially based on my past experience.

In my experience, restriction clauses on dimension tables are very common.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: should we have a fast-path planning for OLTP starjoins?

От
Tomas Vondra
Дата:
On 11/24/25 21:55, Robert Haas wrote:
> On Sun, Nov 23, 2025 at 9:39 AM Tomas Vondra <tomas@vondra.me> wrote:
>>> 1. I do not think I believe the premise that the dimension tables
>>> typically won't have restriction clauses.  ISTM that a typical
>>> query might be like
>>>
>>>       select sum(o.total_price) from
>>>       orders o
>>>       join customers c on c.id = o.c_id
>>>       join products p on p.id = o.p_id
>>>       where c.customer_name = 'Wile E Coyote'
>>>       and p.product_name = 'Rocket Skates';
>>>
>>
>> Good question. I don't have a great evidence such joins to dimensions
>> (without additional restrictions) are a common case. It's partially a
>> guess and partially based on my past experience.
> 
> In my experience, restriction clauses on dimension tables are very common.
> 

Sure, but does that imply the inverse case (dimensions without non-join
restrictions) are not? I'm not sure.


regards

-- 
Tomas Vondra




Re: should we have a fast-path planning for OLTP starjoins?

От
Robert Haas
Дата:
On Wed, Nov 26, 2025 at 1:30 PM Tomas Vondra <tomas@vondra.me> wrote:
> > In my experience, restriction clauses on dimension tables are very common.
>
> Sure, but does that imply the inverse case (dimensions without non-join
> restrictions) are not? I'm not sure.

Obviously that depends on a lot of things, and I don't completely
understand what the patch does and doesn't do. But, I think it would
be sad to implement an optimization that falls over catastrophically
when such restriction clauses are present. For example, a long time
ago, I used to build web applications. Twenty, even thirty table joins
were common. There certainly wouldn't be a restriction clause on every
dimension table, but it would be an unusual situation if there were NO
restriction clauses on ANY dimension table. It's maybe also worth
mentioning that in those applications, it wasn't always a pure star
join: one central fact table would join to a bunch of codes tables,
but also very often to some other fact tables that had their own codes
tables. Point being that optimizations like this can be shown to have
a LOT of value in individual test cases even if the circumstances in
which they can be applied are very restricted, but lifting some of
those restrictions can enormously expand the number of real-world
cases to which they apply. My intuition is that a smaller gain on a
larger class of queries will win us more praise than the reverse.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: should we have a fast-path planning for OLTP starjoins?

От
Tomas Vondra
Дата:
On 11/28/25 19:57, Robert Haas wrote:
> On Wed, Nov 26, 2025 at 1:30 PM Tomas Vondra <tomas@vondra.me> wrote:
>>> In my experience, restriction clauses on dimension tables are very common.
>>
>> Sure, but does that imply the inverse case (dimensions without non-join
>> restrictions) are not? I'm not sure.
> 
> Obviously that depends on a lot of things, and I don't completely
> understand what the patch does and doesn't do. But, I think it would
> be sad to implement an optimization that falls over catastrophically
> when such restriction clauses are present. For example, a long time
> ago, I used to build web applications. Twenty, even thirty table joins
> were common. There certainly wouldn't be a restriction clause on every
> dimension table, but it would be an unusual situation if there were NO
> restriction clauses on ANY dimension table.

I think it depends on what you mean by "falls over catastrophically".

The patch identifies dimensions without restrictions, moves them aside,
and does regular join search on the rest of the relations (some of which
may be dimensions with restrictions).

So the presence of dimensions with restrictions does not disable the
optimization entirely, it just means the dimensions with restrictions
won't benefit from it. So in the worst case, it'll perform just like
before (assuming the optimization is kept cheap enough).

I'd love if the optimization worked for all dimensions, even for those
with restrictions. But I don't know about such optimization.

> It's maybe also worth
> mentioning that in those applications, it wasn't always a pure star
> join: one central fact table would join to a bunch of codes tables,
> but also very often to some other fact tables that had their own codes
> tables. 
> 

I certainly don't claim this optimization works for all queries, but
it's also not restricted to "pure" starjoins. It simply finds which
relations can be considered dimensions (i.e. joined through a FK). It
does not match the whole plan shape, or anything like that.

Yes, that may reduce the possible benefit of this optimization, because
the patch works within the "groups" generated by join_collapse_limit (so
within 8 relations by default). If those groups have few dimensions, the
patch may not help all that much.

> Point being that optimizations like this can be shown to have
> a LOT of value in individual test cases even if the circumstances in
> which they can be applied are very restricted, but lifting some of
> those restrictions can enormously expand the number of real-world
> cases to which they apply. My intuition is that a smaller gain on a
> larger class of queries will win us more praise than the reverse.

I don't disagree, but isn't this mostly what we're discussing now? I'm
trying to figure out if enough queries would benefit from this
optimization, which only applies to dimensions without restrictions.


regards

-- 
Tomas Vondra




Re: should we have a fast-path planning for OLTP starjoins?

От
Robert Haas
Дата:
On Fri, Nov 28, 2025 at 2:21 PM Tomas Vondra <tomas@vondra.me> wrote:=
> The patch identifies dimensions without restrictions, moves them aside,
> and does regular join search on the rest of the relations (some of which
> may be dimensions with restrictions).
>
> So the presence of dimensions with restrictions does not disable the
> optimization entirely, it just means the dimensions with restrictions
> won't benefit from it. So in the worst case, it'll perform just like
> before (assuming the optimization is kept cheap enough).

I'm guessing that the reason you're limiting this to relations without
restrictions is so that you don't have to reason about the join
causing cardinality changes. But I don't quite understand how you
avoid having that problem anyway. For example, suppose that we have a
fact table F which is joined to relations S1, S2, D1, D2, I1, and I2.
The joins to S1 and S2 are joins on FKs to unconstrained dimension
tables; i.e. cardinality stays the same. The joins to D1 and D2 are
joins on FKs to constrained dimension tables, i.e. cardinality
decreases. The joins to I1 and I2 on average match more than once,
i.e. cardinality increases. The optimal join order is presumably
something like F-D1-D2-S1-S2-I1-I2, so that the number of rows flowing
through each level of the join tree is kept as small as possible, but
how do you achieve this if the joins to S1 and S2 are set aside? In
the presence of row-count-inflating joins, it's wrong to suppose that
S1 and S2 should be joined at the end.

What seems more likely to be true is that we should perform the join
to S1 and the join to S2 consecutively. It's very common in my
experience for complex reporting queries to have a bunch of joins the
results of which matter very little: each one changes the cardinality
very little or not at all, and so any ordering of those joins produces
essentially the same result, and probably it's best to do them all at
whatever point in the join sequence the cardinality of the other input
is at lowest. However, I'm not sure that even this is guaranteed. For
instance, suppose that in the above example there's no index on the
column of F that joins to S2. Could it be that the only reasonable way
to join to S2 is a merge join? If so, it might be best to start by
join F to S2 using an index scan on each, and then continue by joining
to D1, D2, S1, I1, I2 in that order.

Every time I start to think about this kind of optimization, I find
myself getting hung up on corner cases like these which are, maybe,
not all that probable, but which I think people almost certainly do
rely on the planner to get right. I don't know what to do about that.
I certainly agree with the idea that we waste a lot of energy
searching through functionally identical join orders and that we
should find a way to avoid that, but I'm a little suspicious that
avoiding regressions, or even finding out whether we've introduced
any, will prove difficult.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: should we have a fast-path planning for OLTP starjoins?

От
Tomas Vondra
Дата:
On 11/28/25 23:50, Robert Haas wrote:
> On Fri, Nov 28, 2025 at 2:21 PM Tomas Vondra <tomas@vondra.me> wrote:=
>> The patch identifies dimensions without restrictions, moves them aside,
>> and does regular join search on the rest of the relations (some of which
>> may be dimensions with restrictions).
>>
>> So the presence of dimensions with restrictions does not disable the
>> optimization entirely, it just means the dimensions with restrictions
>> won't benefit from it. So in the worst case, it'll perform just like
>> before (assuming the optimization is kept cheap enough).
> 
> I'm guessing that the reason you're limiting this to relations without
> restrictions is so that you don't have to reason about the join
> causing cardinality changes. But I don't quite understand how you
> avoid having that problem anyway. For example, suppose that we have a
> fact table F which is joined to relations S1, S2, D1, D2, I1, and I2.
> The joins to S1 and S2 are joins on FKs to unconstrained dimension
> tables; i.e. cardinality stays the same. The joins to D1 and D2 are
> joins on FKs to constrained dimension tables, i.e. cardinality
> decreases. The joins to I1 and I2 on average match more than once,
> i.e. cardinality increases. The optimal join order is presumably
> something like F-D1-D2-S1-S2-I1-I2, so that the number of rows flowing
> through each level of the join tree is kept as small as possible, but
> how do you achieve this if the joins to S1 and S2 are set aside? In
> the presence of row-count-inflating joins, it's wrong to suppose that
> S1 and S2 should be joined at the end.
> 

Yes, that's certainly possible. It was discussed at the very beginning
of this thread, and at that point there was a suggestion to keep it
simple and just push the joins to the end:

https://www.postgresql.org/message-id/1751009.1738974197@sss.pgh.pa.us

I'm not claiming that's good enough, though. Maybe we should reconsider
and it needs a better solution.

> What seems more likely to be true is that we should perform the join
> to S1 and the join to S2 consecutively. It's very common in my
> experience for complex reporting queries to have a bunch of joins the
> results of which matter very little: each one changes the cardinality
> very little or not at all, and so any ordering of those joins produces
> essentially the same result, and probably it's best to do them all at
> whatever point in the join sequence the cardinality of the other input
> is at lowest. However, I'm not sure that even this is guaranteed. For
> instance, suppose that in the above example there's no index on the
> column of F that joins to S2. Could it be that the only reasonable way
> to join to S2 is a merge join? If so, it might be best to start by
> join F to S2 using an index scan on each, and then continue by joining
> to D1, D2, S1, I1, I2 in that order.
> 

Good points.

The first part reminds me the approach I mentioned a couple messages
back, We might preprocess the join lists, but instead of "removing the
dimensions" from the search, we'd group them together, and treat each
group as a single item in join_search_one_level. Then, whenever
join_search_one_level hits the group, it'd expand it into a sequence of
joins. I have not tried that yet, but it seems doable ...

However, I'm not sure about your second point - what if the join order
matters after all, e.g. because some join order can leverage ordering
(pathkeys) of the inputs, or produces ordering better for the later joins?

> Every time I start to think about this kind of optimization, I find
> myself getting hung up on corner cases like these which are, maybe,
> not all that probable, but which I think people almost certainly do
> rely on the planner to get right. I don't know what to do about that.
> I certainly agree with the idea that we waste a lot of energy
> searching through functionally identical join orders and that we
> should find a way to avoid that, but I'm a little suspicious that
> avoiding regressions, or even finding out whether we've introduced
> any, will prove difficult.
> 

True. I started working on this with two assumptions: (a) we can detect
cases when this is guaranteed to be the optimal join order, and (b) it
would be cheap to do so. Maybe one of those assumptions is not correct.


thanks

-- 
Tomas Vondra




Re: should we have a fast-path planning for OLTP starjoins?

От
Robert Haas
Дата:
On Fri, Nov 28, 2025 at 6:34 PM Tomas Vondra <tomas@vondra.me> wrote:
> True. I started working on this with two assumptions: (a) we can detect
> cases when this is guaranteed to be the optimal join order, and (b) it
> would be cheap to do so. Maybe one of those assumptions is not correct.

Probably needs testing, at least.

I wonder if there's any way that we could rule out the presence of
row-count-inflating joins, or at least identify the joins that are
definitely not row-count-inflating. I have a general feeling that we
postpone some of the estimation work that the planner does for too
long, and that moving some of it earlier would allow better
decision-making. It's very common for a query to involve a driving
table and then a bunch of joins to side tables that are all
essentially independent of each other. That is, each of those joins
will multiply the row count from the other side of the join by a
constant that does not depend very much on the join order. I don't
really know exactly what we could do that would enable us to notice
that pattern and do something about it, but it feels like we're
duplicating a lot of work

For instance, consider a 4-way join between A, B, C, and D, where A is
connected to the other three tables by join clauses, and those are the
only join clauses. Let's say that the join to B is going to multiply
the row count by 2, the join to C is going to multiply it by 1 (i.e.
no change), and the join to D is going to multiply it by 0.1 (i.e.
eliminate 90% of rows). Well, we're going to consider the A-B join and
discover that it has 2 times the cardinality of A, and then later
we'll consider the (A-C)-B join and discover that it has 2 times the
cardinality of A-C, and then later we'll consider the (A-D)-B join and
discover that it has two times the cardinality of A-D, and then later
we'll consider the (A-B-C)-D join and discover that it has two times
the cardinality of A-B-C. As the number of tables grows, the number of
times we rediscover the effect of joining to a certain table grows, I
believe, exponentially.

Of course, in theory, there's no reason why the idea that any
particular join multiplies the row count by a constant that is
independent of the join order has to be correct. For instance, if the
A-B join multiplies the same rows that the A-D join eliminates, then
you can't set a fixed multiplier for either one. But in practice, even
when that's the case, we're not aware of it: when evaluating a join
column from, say, the output of the A-B join, we use the statistics
for the underlying column of A or B, without any modification
reflecting what the A-B join did to the distribution. So I think that
our estimates will behave like join-order-independent multipliers even
when the reality is otherwise. I'm not at all sure that I'm totally
right about this, and I sort of expect you or Tom to point out why I'm
totally weak in the head for thinking that it's the case, but I feel
like there might be the kernel of an idea here even if the details are
wrong.

Because, like, I think the general complexity of determining the
optimal join order is known to be some horrifically non-polynomial
time thing, but there are lots of real-world problems where I think a
human being would try to do it in linear time, by initially choosing
the driving table, and then what to join to first, second, third, etc.
And I think that's the kind of case that you're trying to pursue with
this optimization, so it feels intuitively logical that something
should be possible. That said, I don't know whether from a theoretical
perspective it really is possible to do something like this in a
reliable way, making it a pure optimization, or whether you would be
doomed to always get some cases wrong, making it more like an optional
planner mode or something. Either way, I can't shake the feeling that
determining essentially every important fact about every join only
when we perform the main join search makes it a lot harder to imagine
ever avoiding the main join search.

--
Robert Haas
EDB: http://www.enterprisedb.com