Обсуждение: Query plans for plpgsql triggers

Поиск
Список
Период
Сортировка

Query plans for plpgsql triggers

От
"Eric B. Ridge"
Дата:
I've found a few performance issues with an internal database
application and I'm sure it's related to my misunderstanding of how
and when queries are planned when used in a plpgsql function.  This
is against Postgres 7.4.

For example, suppose this function is defined as a per-statement
update trigger on "some_table":

CREATE OR REPLACE FUNCTION foo() RETURNS TRIGGER LANGUAGE 'plpsql' AS
'DECLARE
    my_id int8;
BEGIN
    my_id := 12;  -- some arbitrary value
    UPDATE some_other_table SET field = 'value' WHERE id = my_id::int8;
    RETURN NULL;
END;'

The above function is (obviously) a stripped down version of
something real, but it accurately represents the basics.

When is the UPDATE statement inside foo() planned?  When the trigger
is first created, or when it's first used per backend, or every time
it's used per backend?  It's gotta be one of the former, because it
sure ain't the latter.

I dunno what plan is being generated, but it's gotta be using a
sequential scan.  "some_other_table" contains roughly 2 million rows
and the "id" column is uniquely indexed, yet the UPDATE takes 35-40
seconds when run via the trigger, but only milliseconds if the
equivalent UPDATE statement is played into psql.  The database is
freshly vacuumed and analyzed.  And of course an EXPLAIN via psql
shows an index scan.

If I turn on statement logging I can plainly see that the embedded
UPDATE statement is the query taking a long time.  Additionally, a
gdb stacktrace of the backend shows Postgres is somewhere inside the
pl_pgsql call handlers.

Changing the UPDATE to be:
    EXECUTE ''UPDATE some_other_table SET field = ''''value'''' WHERE id
= '' || my_id || ''::int8'';

seems to "solve" the bad planning problem, but this sure is ugly.

Maybe I missed it in the 7.4 docs, but I can't find any information
on query planning for plpgsql functions.  Any insight into how this
works would be greatly appreciated.  Also, any mention of how PG
8.1.3 differs in this regard would also be handy.

thanks!

eric

Re: Query plans for plpgsql triggers

От
Tom Lane
Дата:
"Eric B. Ridge" <ebr@tcdi.com> writes:
> When is the UPDATE statement inside foo() planned?  When the trigger
> is first created, or when it's first used per backend, or every time
> it's used per backend?

First use per backend, ignoring corner cases such as replacing the
function definition.

> I dunno what plan is being generated, but it's gotta be using a
> sequential scan.

The issue is probably that the planner is seeing a parameterized
query.  Try this:

prepare foo(int8) as update some_other_table SET field = 'value' WHERE id = $1;
explain execute foo(42);

and see what plan you get.  If the id field has sufficiently
discouraging statistics then the planner may think that a seqscan
is the safest plan.  In a "normal" query where you're comparing id
to a constant, the planner can see whether the constant matches any
of the most common values for the column --- if it doesn't then an
indexscan is a good plan.

If you really want a replan every time, you can get it by using
EXECUTE.

            regards, tom lane

Re: Query plans for plpgsql triggers

От
"Eric B. Ridge"
Дата:
On Mar 24, 2006, at 11:39 PM, Tom Lane wrote:

> The issue is probably that the planner is seeing a parameterized
> query.  Try this:
>
> prepare foo(int8) as update some_other_table SET field = 'value'
> WHERE id = $1;
> explain execute foo(42);

I should have mentioned that while the UPDATE statement in the
trigger function really is as simple as the above, "some_other_table"
is actually a view with the requisite ON UPDATE DO INSTEAD rule:

CREATE OR REPLACE RULE some_other_table_update AS ON UPDATE TO
some_other_real_table DO INSTEAD
(
    UPDATE some_other_real_table_1 SET field = NEW.field WHERE id =
OLD.id::int8;
    UPDATE some_other_real_table_2 SET field = NEW.field WHERE id =
OLD.id::int8;
    ...
    UPDATE some_other_real_table_39 SET field = NEW.field WHERE id =
OLD.id::int8;
    UPDATE some_other_real_table SET field = NEW.field WHERE id =
OLD.id::int8;
);

The explain for your "prepare foo(42)" suggestion shows the correct
index scans for each of the 40 actual tables being updated by the RULE.

> and see what plan you get.  If the id field has sufficiently
> discouraging statistics then the planner may think that a seqscan
> is the safest plan. In a "normal" query where you're comparing id
> to a constant, the planner can see whether the constant matches any
> of the most common values for the column --- if it doesn't then an
> indexscan is a good plan.

the "id" column, for *each* of the tables referenced in the RULE is
defined as
    id int8 NOT NULL PRIMARY KEY

No value should be any more common than the other.

Could the fact that "some_other_table" is a view influence the
planner in some way?

> If you really want a replan every time, you can get it by using
> EXECUTE.

Indeed.  If big-ugly-updateable-views can't influence the planner,
what positive impact would changing the statistics threshold have on
a primary key column?

As an aside, has there ever been any discussion/thought into some
ability to force all plpgsql queries to by dynamically planned w/o
the need to explicitly wrap them inside EXPLAIN?  Maybe something like:

    CREATE OR REPLACE FUNCTION foo() LANGUAGE 'plpgsql' OPTIONS
'dynamic_plans=on' AS '....';

or maybe a plpgsql, named 'plpgsql_dont_preplan_my_queries'?

Something like the above would at least make for "prettier" function
sources.

thanks for your time.

eric

Re: Query plans for plpgsql triggers

От
Tom Lane
Дата:
"Eric B. Ridge" <ebr@tcdi.com> writes:
> I should have mentioned that while the UPDATE statement in the
> trigger function really is as simple as the above, "some_other_table"
> is actually a view with the requisite ON UPDATE DO INSTEAD rule:

This is the sort of detail that you really should not omit.

If you were using something newer than 7.4 then I'd ask for a complete
test case so I could look into improving the behavior --- but as it is,
I'd first suggest upgrading and seeing if the problem is already fixed.

            regards, tom lane

Re: Query plans for plpgsql triggers

От
"Eric B. Ridge"
Дата:
On Mar 25, 2006, at 12:24 AM, Tom Lane wrote:

> This is the sort of detail that you really should not omit.

Yeah, it didn't even occur to me until I ran the "explain execute foo
(42)" thing you suggested.  We've been using these update rules for
so long that I just think of the views as regular tables (rules are
great, btw).

> If you were using something newer than 7.4 then I'd ask for a complete
> test case so I could look into improving the behavior --- but as it
> is,
> I'd first suggest upgrading and seeing if the problem is already
> fixed.

We're working towards an upgrade to 8.1.3, and a new schema.  Both of
which will likely provide all sorts of new "behaviors."

I'm now curious if complex rules can influence the planner in
negative ways.  I don't see how they could -- I've never seen
unexpected EXPLAIN output via psql.  However, I can try to work up a
test case against 7.4.12 if you think it'll be beneficial.  It'll
take a few days and if you wanted 2 million-ish sample rows, be very
large.

eric

Re: Query plans for plpgsql triggers

От
Tom Lane
Дата:
"Eric B. Ridge" <ebr@tcdi.com> writes:
> On Mar 25, 2006, at 12:24 AM, Tom Lane wrote:
>> This is the sort of detail that you really should not omit.

> I'm now curious if complex rules can influence the planner in
> negative ways.

It's possible.  I'm not certain that that is really what you are
seeing, but it could be.

> ... However, I can try to work up a
> test case against 7.4.12 if you think it'll be beneficial.

I doubt we'd consider patching such a problem in 7.4.  At this point
the only issues that will get patched in 7.4 are security and data-loss
risks, not performance problems.

Again: demonstrating the issue in 8.1 or CVS HEAD would be a good step
towards getting people motivated to fix it.

            regards, tom lane