Re: should we have a fast-path planning for OLTP starjoins?
| От | Tomas Vondra |
|---|---|
| Тема | Re: should we have a fast-path planning for OLTP starjoins? |
| Дата | |
| Msg-id | a5a44ba8-4812-4614-9f69-5965c2666b81@vondra.me обсуждение исходный текст |
| Ответ на | Re: should we have a fast-path planning for OLTP starjoins? (Tom Lane <tgl@sss.pgh.pa.us>) |
| Ответы |
Re: should we have a fast-path planning for OLTP starjoins?
|
| Список | pgsql-hackers |
On 11/21/25 21:14, Tom Lane wrote: > I spent a little time staring at the v5 patches. Obviously there > are a bunch of minor details to be verified, which you've carefully > provided XXX comments about, and I didn't really go through those > yet. There are two big-picture questions that are bothering me: > > 1. I do not think I believe the premise that the dimension tables > typically won't have restriction clauses. ISTM that a typical > query might be like > > select sum(o.total_price) from > orders o > join customers c on c.id = o.c_id > join products p on p.id = o.p_id > where c.customer_name = 'Wile E Coyote' > and p.product_name = 'Rocket Skates'; > > The only reason to join a dimension table that lacks a restriction > clause is if you need some of its fields in the output, which you > might but I'm not sure that's such a common case. (Have you got > evidence to the contrary?) So I feel like we're not going to be > getting all that much win if we are not willing to treat such tables > as dimension tables. We could do something simplistic like order > those dimensions by the selectivity of their baserestrict clauses, > joining the most-restricted ones first and any restriction-free ones > last. > Good question. I don't have a great evidence such joins to dimensions (without additional restrictions) are a common case. It's partially a guess and partially based on my past experience. I have seen a lot of such joins in analytical workloads, where the join is followed by an aggregation, with GROUP BY referencing attributes from the dimensions. Of course, that may be an argument against worrying about the planning too much, because with enough data the timing is going to be dominated by the join/aggregation execution. However, it's surprising how little data many analytical workloads actually access, so it's not that clear. The other use case I've been thinking about is poorly written queries, joining more tables than needed. A traditional example is an ORM loading more data than needed, to load the whole "object". I don't know how prevalent this is today - it used to be a pretty common issue, and I doubt it improved. I think it's not that different from the self-join removal (the tradeoffs may be different, of course). I realize we try not to add complexity for such cases, especially if it might hurt well written queries. Actually, I initially investigated at the opposite example, i.e. all dimensions joining to the fact.id, see create-2/select-2 scripts. And then I realized starjoins have mostly the same issue. But it's true the v5 patch does not actually help this original query. > 2. I'm pretty un-excited about the 0002 patch altogether. I'm having > a hard time visualizing cases where it helps, other than left joins > to dimension tables which I don't really think are common either. > I did a bit of poking around on the net and found that it seems to > be common to restrict star-join optimizations to equijoins (e.g. > SAP says explicitly that they only handle that case). I think we'd > be better off to focus on the allow-baserestrict-clauses extension > than the allow-join-order-restrictions extension. > I recall seen such queries (with LEFT joins) in analytics workloads, but it's definitely less common than inner starjoins. So I agree focusing on allowing baserestrict clauses is probably more useful/important. FWIW I tried searching for more info too, but all the SAP pages suggested by google return 404 to me :-( regards -- Tomas Vondra
В списке pgsql-hackers по дате отправления: