Re: should we have a fast-path planning for OLTP starjoins?

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: should we have a fast-path planning for OLTP starjoins?
Дата
Msg-id a22ec6e0-92ae-43e7-85c1-587df2a65f51@vondra.me
обсуждение исходный текст
Ответ на Re: should we have a fast-path planning for OLTP starjoins?  (Tomas Vondra <tomas@vondra.me>)
Ответы Re: should we have a fast-path planning for OLTP starjoins?
Список pgsql-hackers
Hi,

Here's a v5, addressing some (but not all) of the things discussed
earlier in this thread.

This does nothing about the "opposite" type of join, with many small
tables linking to a single "central" table. I'm not convinced it makes
sense to handle that together with starjoins. But I haven't tried yet.


The main improvements/changes in v5 are:

1) comment cleanup / clarification
----------------------------------
A lot of comments was stale, or open questions answered since the
comment was written. So clean that up. Rewording and clarification of
various comments - a lot of places talking about the same thing, so I
de-duplicated that (I hope).


2) starjoin_match_to_foreign_key: improved matching FK to join clauses
----------------------------------------------------------------------
The check is split into starjoin_foreign_key_matched_by_clauses and
starjoin_clauses_matched_by_foreign_key, each doing the check in one
direction. It might be possible to do both in a single function, but I
don't think it'd be more efficient and this seemed simpler.

I was a bit surprised it doesn't need to inspect the EC members, it's
enough to check if the FK matches the EC. At least I don't think it's
needed, and it seems to be working without it. Maybe there's some more
complex case where we actually need to look at the members?

The one remaining XXX is about ensuring the referencing table has the FK
columns marked as NOT NULL, to guarantee the join does not reduce
cardinality. But it's related to the question of OUTER joins, which is
discussed later.


3) has_join_restriction(), but allowing some join order restrictions
--------------------------------------------------------------------
starjoin_is_dimension() now calls has_join_restriction(), and for inner
joins this works fine. But as soon as there's an outer join (say, left
join), it disabled the simplified planning. I find that unfortunate, but
I think we can actually do better - IIUC it's OK to treat a relation
with join restrictions as a dimension, as long as we don't reorder
relations with restrictions (the sublist).

Consider a join list with 8 baserels, and an empty list of dimensions.

  [F, E1, D2, E3, D4, E5, D6, E7]  []

The "F" is the fact, "D" are dimensions, "E" are non-dimension tables.
We can simply move the "D" rels to the dimension list:

  [F, E1, E3, E5, E7] [D2, D4, D6]

The v4 patch would have stopped here, but I think we can do better - we
can move the "E" rels to the dimension list, as long as the only reason
preventing that was "has_join_restriction". We move the rels to the
beginning of the dimensions list, to keep the syntactic join order. And
we stop once we find the first rel that can't be moved (due to not
having a matching FK or has WHERE filter).

starjoin_adjust_joins does that by walking the filter repeatedly, and
stopping when it finds no dimension. I now realize it won't actually
need more than two loops (and one to find nothing) but that's a detail.

This is related to the NOT NULL vs. outer join check mentioned in (2),
because the restrictions are usually due to outer joins, and outer joins
don't need the check (and probably even can't have that). But I'm not
sure what's the best way to check if the order restriction is due to
some kind of outer join, or something else. Not sure.

I kept this separated in 0002, for easier review.


snowflake joins
---------------
There's another possible improvement that I'd like to address in the
next version of the patch - handling snowflake schemas. Right now the
leaf dimensions will be handled fine, but the "inner" dimensions won't
because they reference other tables (the leafs). Which gets rejected by
starjoin_clauses_matched_by_foreign_key, because those join clauses are
not matched by the incoming FK.

I plan to modify starjoin_is_dimension() to allow join clauses pointing
to "known" dimensions, so that the next loop can add the "inner" ones.


some experiments
----------------
To verify if this is effective, I ran the two starjoin and snowflake
selects (select-1 and select-3) with inner/outer joins, on master and
with 0001 and 0002. There's a "run.sh" script in "patch/" directory.

The results look like this:

                    | master  | 0001/off  0001/on | 0002/off  0002/on
  ------------------|---------|-------------------|------------------
  starjoin    inner |   2299  |     2295    15325 |     2299    15131
  starjoin    outer |   2270  |     2272     2257 |     2249    14312
  snowflake   inner |   2718  |     2667     7223 |     2654     7106
  snowflake   outer |   2282  |     2264     2254 |     2273     6210

This is throughput (tps) from a single pgbench run with a single client.
It's quite stable, but there's a bit of noise.

The master and "off" results are virtually the same (and it gives you a
good idea of how much noise is there). 0001 helps with inner joins, but
not the outer joins (because of the restrictions). 0002 fixes that.

The improvement for snowflake joins is smaller because of the "inner"
dimensions. The current patches identify only the leaf ones.


regards

-- 
Tomas Vondra

Вложения

В списке pgsql-hackers по дате отправления: