Re: Function is called multiple times in subselect

Поиск
Список
Период
Сортировка
От Chris Campbell
Тема Re: Function is called multiple times in subselect
Дата
Msg-id 4051DEE1.60109@bignerdranch.com
обсуждение исходный текст
Ответ на Re: Function is called multiple times in subselect  ("Alex J. Avriette" <alex@posixnap.net>)
Ответы Re: Function is called multiple times in subselect  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
Alex J. Avriette wrote:

> And if you are concerned about the query's cost vs its actual output
> (you weren't clear in your original message), there is the 'iscachable'
> pragma for functions.

I was concerned that my function was being called 4 times for each row
of the result. The cost of the function call is quite high, so the cost
of the query was 4 times what I expected.

So my concern was the query's cost.

     SELECT query.i,
            query.squared AS test1,
            query.squared + 1 AS test2,
            query.squared + 2 AS test3,
            query.squared + 3 AS test4
     FROM (
         SELECT i,
                square_it(i) AS squared
         FROM foo
         OFFSET 0
     ) query;

As Tom explained to me, the optimizer flattens the subselect, so it then
becomes:

     SELECT i,
            square_it(i) AS test1,
            square_it(i) + 1 AS test2,
            square_it(i) + 2 AS test3,
            square_it(i) + 3 AS test4
     FROM   foo;

Thus, each reference to query.squared in the outer query is replaced
with a call to the function.

In this trivial example, that's not really a performance issue. But my
real-world function calculates about 8 values and returns them as a
record, which I then pick apart in the outer query. When the optimizer
flattens the subquery and each reference to the function result is
replaced with a call to the function, it makes for a very expensive query.

     CREATE TYPE patient_balances_type AS (patient_id INTEGER,
                                           account_id INTEGER,
                                           date DATE,
                                           due_now INTEGER,
                                           future_due INTEGER,
                                           copay_balance INTEGER,
                                           expected_insurance INTEGER,
                                           total_balance INTEGER,
                                           contract_amount INTEGER,
                                           real_due_now INTEGER,
                                           real_future_due INTEGER);

     CREATE OR REPLACE FUNCTION patient_balances(INTEGER, INTEGER, DATE)
     RETURNS patient_balances_type AS '...' LANGUAGE 'plpgsql';

  My actual query looked something like:

         SELECT  query.appointment_id AS appointment_id,
                 query.date,
                 query.start_time,
                 query.duration,
                 query.patient_id,
                 (query.bal).total_balance,
                 (query.bal).expected_insurance,
                 (query.bal).future_due,
                 (query.bal).due_now
         FROM
         (
             SELECT appt.appointment_id,
                    appt.start_time,
                    appt.duration,
                    appt.date,
                    p.patient_id,
                    p.account_id,
                    patient_balances(p.patient_id,
                                     p.account_id,
                                     d.system_date) AS bal
             FROM
                    patients AS p
                    JOIN appointments appt ON
                        (p.patient_id = appt.patient_id)
                    JOIN system_date d ON
                        (appt.date = d.system_date)
         ) query;

When that subquery was flattened, each reference to query.bal was
replaced with a call to patient_balances(). Which was pretty expensive.

Tom's suggestion off using OFFSET 0 to cause the optimizer not to
flatten the subquery is exactly what I was looking for.

Is there a better way to be picking apart the result of my function than
putting it in a subquery, since the subquery will be flattened by
default? There are a bunch of places I've done this that I need to go
back to now and add an OFFSET 0 because I didn't realize the optimizer
would negate my cleverly crafted record-dissecting subselect. :)

Or is there a way to advise the optimizer of the cost of my function, so
that it will choose to not flatten the subquery (since the total cost of
doing that will be higher)? Or should the optimizer be assuming that
function calls are fairly expensive by default, and not flattening
subqueries that have function calls? So it's not a bug, but maybe a
feature request? :)

Thanks!

- Chris

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Function is called multiple times in subselect
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Function is called multiple times in subselect