Обсуждение: Query plans for plpgsql triggers
I've found a few performance issues with an internal database application and I'm sure it's related to my misunderstanding of how and when queries are planned when used in a plpgsql function. This is against Postgres 7.4. For example, suppose this function is defined as a per-statement update trigger on "some_table": CREATE OR REPLACE FUNCTION foo() RETURNS TRIGGER LANGUAGE 'plpsql' AS 'DECLARE my_id int8; BEGIN my_id := 12; -- some arbitrary value UPDATE some_other_table SET field = 'value' WHERE id = my_id::int8; RETURN NULL; END;' The above function is (obviously) a stripped down version of something real, but it accurately represents the basics. When is the UPDATE statement inside foo() planned? When the trigger is first created, or when it's first used per backend, or every time it's used per backend? It's gotta be one of the former, because it sure ain't the latter. I dunno what plan is being generated, but it's gotta be using a sequential scan. "some_other_table" contains roughly 2 million rows and the "id" column is uniquely indexed, yet the UPDATE takes 35-40 seconds when run via the trigger, but only milliseconds if the equivalent UPDATE statement is played into psql. The database is freshly vacuumed and analyzed. And of course an EXPLAIN via psql shows an index scan. If I turn on statement logging I can plainly see that the embedded UPDATE statement is the query taking a long time. Additionally, a gdb stacktrace of the backend shows Postgres is somewhere inside the pl_pgsql call handlers. Changing the UPDATE to be: EXECUTE ''UPDATE some_other_table SET field = ''''value'''' WHERE id = '' || my_id || ''::int8''; seems to "solve" the bad planning problem, but this sure is ugly. Maybe I missed it in the 7.4 docs, but I can't find any information on query planning for plpgsql functions. Any insight into how this works would be greatly appreciated. Also, any mention of how PG 8.1.3 differs in this regard would also be handy. thanks! eric
"Eric B. Ridge" <ebr@tcdi.com> writes: > When is the UPDATE statement inside foo() planned? When the trigger > is first created, or when it's first used per backend, or every time > it's used per backend? First use per backend, ignoring corner cases such as replacing the function definition. > I dunno what plan is being generated, but it's gotta be using a > sequential scan. The issue is probably that the planner is seeing a parameterized query. Try this: prepare foo(int8) as update some_other_table SET field = 'value' WHERE id = $1; explain execute foo(42); and see what plan you get. If the id field has sufficiently discouraging statistics then the planner may think that a seqscan is the safest plan. In a "normal" query where you're comparing id to a constant, the planner can see whether the constant matches any of the most common values for the column --- if it doesn't then an indexscan is a good plan. If you really want a replan every time, you can get it by using EXECUTE. regards, tom lane
On Mar 24, 2006, at 11:39 PM, Tom Lane wrote: > The issue is probably that the planner is seeing a parameterized > query. Try this: > > prepare foo(int8) as update some_other_table SET field = 'value' > WHERE id = $1; > explain execute foo(42); I should have mentioned that while the UPDATE statement in the trigger function really is as simple as the above, "some_other_table" is actually a view with the requisite ON UPDATE DO INSTEAD rule: CREATE OR REPLACE RULE some_other_table_update AS ON UPDATE TO some_other_real_table DO INSTEAD ( UPDATE some_other_real_table_1 SET field = NEW.field WHERE id = OLD.id::int8; UPDATE some_other_real_table_2 SET field = NEW.field WHERE id = OLD.id::int8; ... UPDATE some_other_real_table_39 SET field = NEW.field WHERE id = OLD.id::int8; UPDATE some_other_real_table SET field = NEW.field WHERE id = OLD.id::int8; ); The explain for your "prepare foo(42)" suggestion shows the correct index scans for each of the 40 actual tables being updated by the RULE. > and see what plan you get. If the id field has sufficiently > discouraging statistics then the planner may think that a seqscan > is the safest plan. In a "normal" query where you're comparing id > to a constant, the planner can see whether the constant matches any > of the most common values for the column --- if it doesn't then an > indexscan is a good plan. the "id" column, for *each* of the tables referenced in the RULE is defined as id int8 NOT NULL PRIMARY KEY No value should be any more common than the other. Could the fact that "some_other_table" is a view influence the planner in some way? > If you really want a replan every time, you can get it by using > EXECUTE. Indeed. If big-ugly-updateable-views can't influence the planner, what positive impact would changing the statistics threshold have on a primary key column? As an aside, has there ever been any discussion/thought into some ability to force all plpgsql queries to by dynamically planned w/o the need to explicitly wrap them inside EXPLAIN? Maybe something like: CREATE OR REPLACE FUNCTION foo() LANGUAGE 'plpgsql' OPTIONS 'dynamic_plans=on' AS '....'; or maybe a plpgsql, named 'plpgsql_dont_preplan_my_queries'? Something like the above would at least make for "prettier" function sources. thanks for your time. eric
"Eric B. Ridge" <ebr@tcdi.com> writes: > I should have mentioned that while the UPDATE statement in the > trigger function really is as simple as the above, "some_other_table" > is actually a view with the requisite ON UPDATE DO INSTEAD rule: This is the sort of detail that you really should not omit. If you were using something newer than 7.4 then I'd ask for a complete test case so I could look into improving the behavior --- but as it is, I'd first suggest upgrading and seeing if the problem is already fixed. regards, tom lane
On Mar 25, 2006, at 12:24 AM, Tom Lane wrote: > This is the sort of detail that you really should not omit. Yeah, it didn't even occur to me until I ran the "explain execute foo (42)" thing you suggested. We've been using these update rules for so long that I just think of the views as regular tables (rules are great, btw). > If you were using something newer than 7.4 then I'd ask for a complete > test case so I could look into improving the behavior --- but as it > is, > I'd first suggest upgrading and seeing if the problem is already > fixed. We're working towards an upgrade to 8.1.3, and a new schema. Both of which will likely provide all sorts of new "behaviors." I'm now curious if complex rules can influence the planner in negative ways. I don't see how they could -- I've never seen unexpected EXPLAIN output via psql. However, I can try to work up a test case against 7.4.12 if you think it'll be beneficial. It'll take a few days and if you wanted 2 million-ish sample rows, be very large. eric
"Eric B. Ridge" <ebr@tcdi.com> writes: > On Mar 25, 2006, at 12:24 AM, Tom Lane wrote: >> This is the sort of detail that you really should not omit. > I'm now curious if complex rules can influence the planner in > negative ways. It's possible. I'm not certain that that is really what you are seeing, but it could be. > ... However, I can try to work up a > test case against 7.4.12 if you think it'll be beneficial. I doubt we'd consider patching such a problem in 7.4. At this point the only issues that will get patched in 7.4 are security and data-loss risks, not performance problems. Again: demonstrating the issue in 8.1 or CVS HEAD would be a good step towards getting people motivated to fix it. regards, tom lane