Guide to PG's capabilities for inlining, predicate hoisting, flattening, etc?

Поиск
Список
Период
Сортировка
От Jay Levitt
Тема Guide to PG's capabilities for inlining, predicate hoisting, flattening, etc?
Дата
Msg-id CANSg5mqt+7j1SU2Gu2tZ_c_vdzCJPkymxEqaQ-OjRXfz=Pfimg@mail.gmail.com
обсуждение исходный текст
Ответы Re: Guide to PG's capabilities for inlining, predicate hoisting, flattening, etc?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance
As I've come up to speed on SQL and PostgreSQL with some
medium-complexity queries, I've asked a few questions about what the
optimizer will do in various situations.  I'm not talking about the
seq-scan-vs-index type of optimizing; I mean "transforming within the
relational calculus (algebra?) to an equivalent but more performant
query".  The same topics come up:

- Flattening.  I think that means "Merge the intent of the subquery
into the various clauses of the parent query".

- Inlining. That's "Don't run this function/subquery/view as an atomic
unit; instead, push it up into the parent query so the optimizer can
see it all at once."  Maybe that's the same as flattening.

- Predicate pushdown. That's "This subquery produces a lot of rows,
but the parent query has a WHERE clause that will eliminate half of
them, so don't produce the unnecessary rows."

Am I right so far?  Now, the big question, which I haven't seen
documented anywhere: Under what circumstances can the optimizer do
each of these things?

For instance, I have a complex query that calculates the similarity of
one user to every other user.  The output is two columns, one row per
user:

   select * from similarity(my_user_id);

   other_user | similarity%
   -----------|-------------
      123     |  99

Being a novice at SQL, I first wrote it in PL/pgSQL, so I could stay
in my imperative, iterative head.  The query performed decently well
when scanning the whole table, but when I only wanted to compare
myself to a single user, I said:

   select * from similarity(my_user_id) as s where s.other_user = 321;

And, of course, similarity() produced the whole table anyway, because
predicates don't get pushed down into PL/pgSQL functions.

So I went and rewrote similarity as a SQL function, but I still didn't
want one big hairy SQL query. Ah ha! CTEs let you write modular
subqueries, and you also avoid problems with lack of LATERAL. I'll use
those.

.. But of course predicates don't get pushed into CTEs, either.  (Or
maybe it was that they would, but only if they were inline with the
predicate.. I forget now.)

So you can see where I'm going.  I know if I break everything into
elegant, composable functions, it'll continue to perform poorly.  If I
write one big hairy, it'll perform great but it will be difficult to
maintain, and it will be inelegant and a kitten will die.  My tools
are CTEs, subqueries, aliases, SQL functions, PL/pgSQL functions, and
views (and other tools?)  What optimizations do each of those prevent?

We're on 9.0 now but will happily upgrade to 9.1 if that matters.

Jay Levitt

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Kevin Grittner"
Дата:
Сообщение: Re: Intel 710 pgbench write latencies
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: Intel 710 pgbench write latencies