Обсуждение: Writing Trigger Functions in C
Hello guys, I've been finding performance issues when using a trigger to modify inserts on a partitioned table. If using the trigger the total time goes from 1 Hour to 4 hours. The trigger is pretty simple: CREATE OR REPLACE FUNCTION quotes_insert_trigger() RETURNS trigger AS $$ BEGIN EXECUTE 'INSERT INTO quotes_'|| to_char(new.received_time,'YYYY_MM_DD') ||' VALUES (($1).*)' USING NEW ; RETURN NULL; END; $$ LANGUAGE plpgsql; I've seen that some of you guys have worked on writing triggers in C. Does anyone have had an experience writing a trigger for partitioning in C ? If you have some code to paste so I can start from I will really appreciate. Thanks
On Fri, Dec 21, 2012 at 10:25 AM, Charles Gomes <charlesrg@outlook.com> wrote: > Hello guys, > > I've been finding performance issues when using a trigger to modify inserts on a partitioned table. > If using the trigger the total time goes from 1 Hour to 4 hours. > > The trigger is pretty simple: > > CREATE OR REPLACE FUNCTION quotes_insert_trigger() > RETURNS trigger AS $$ > BEGIN > EXECUTE 'INSERT INTO quotes_'|| to_char(new.received_time,'YYYY_MM_DD') ||' VALUES (($1).*)' USING NEW ; > RETURN NULL; > END; > $$ > LANGUAGE plpgsql; > > I've seen that some of you guys have worked on writing triggers in C. > > Does anyone have had an experience writing a trigger for partitioning in C ? > > If you have some code to paste so I can start from I will really appreciate. Honestly I'd leave the trigger alone and modify the client code in performance sensitive places to insert directly to the correct partition table. merlin
Hello you can find lot of examples in PostgreSQL source code - see postgresql/contrib/spi directory and documentation http://www.postgresql.org/docs/9.0/static/trigger-example.html Regards Pavel Stehule 2012/12/21 Charles Gomes <charlesrg@outlook.com>: > Hello guys, > > I've been finding performance issues when using a trigger to modify inserts on a partitioned table. > If using the trigger the total time goes from 1 Hour to 4 hours. > > The trigger is pretty simple: > > CREATE OR REPLACE FUNCTION quotes_insert_trigger() > RETURNS trigger AS $$ > BEGIN > EXECUTE 'INSERT INTO quotes_'|| to_char(new.received_time,'YYYY_MM_DD') ||' VALUES (($1).*)' USING NEW ; > RETURN NULL; > END; > $$ > LANGUAGE plpgsql; > > I've seen that some of you guys have worked on writing triggers in C. > > Does anyone have had an experience writing a trigger for partitioning in C ? > > If you have some code to paste so I can start from I will really appreciate. > > Thanks > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
On 12/21/2012 08:39 AM, Merlin Moncure wrote: > On Fri, Dec 21, 2012 at 10:25 AM, Charles Gomes <charlesrg@outlook.com> wrote: >> Hello guys, >> >> I've been finding performance issues when using a trigger to modify inserts on a partitioned table. >> If using the trigger the total time goes from 1 Hour to 4 hours. >> >> The trigger is pretty simple: >> >> CREATE OR REPLACE FUNCTION quotes_insert_trigger() >> RETURNS trigger AS $$ >> BEGIN >> EXECUTE 'INSERT INTO quotes_'|| to_char(new.received_time,'YYYY_MM_DD') ||' VALUES (($1).*)' USING NEW ; >> RETURN NULL; >> END; >> $$ >> LANGUAGE plpgsql; >> >> I've seen that some of you guys have worked on writing triggers in C. >> >> Does anyone have had an experience writing a trigger for partitioning in C ? >> >> If you have some code to paste so I can start from I will really appreciate. > > Honestly I'd leave the trigger alone and modify the client code in > performance sensitive places to insert directly to the correct > partition table. I second that recommendation -- your performance will be much, much, better. Joe -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, & 24x7 Support
On Fri, Dec 21, 2012 at 11:25 AM, Charles Gomes <charlesrg@outlook.com> wrote:
>
> Hello guys,
>
> I've been finding performance issues when using a trigger to modify inserts on a partitioned table.
> If using the trigger the total time goes from 1 Hour to 4 hours.
>
> The trigger is pretty simple:
>
> CREATE OR REPLACE FUNCTION quotes_insert_trigger()
> RETURNS trigger AS $
> BEGIN
> EXECUTE 'INSERT INTO quotes_'|| to_char(new.received_time,'YYYY_MM_DD') ||' VALUES (($1).*)' USING NEW ;
> RETURN NULL;
> END;
> $
> LANGUAGE plpgsql;
>
> I've seen that some of you guys have worked on writing triggers in C.
>
> Does anyone have had an experience writing a trigger for partitioning in C ?
I'd want to be very careful about assuming that implementing the trigger function in C >
> Hello guys,
>
> I've been finding performance issues when using a trigger to modify inserts on a partitioned table.
> If using the trigger the total time goes from 1 Hour to 4 hours.
>
> The trigger is pretty simple:
>
> CREATE OR REPLACE FUNCTION quotes_insert_trigger()
> RETURNS trigger AS $
> BEGIN
> EXECUTE 'INSERT INTO quotes_'|| to_char(new.received_time,'YYYY_MM_DD') ||' VALUES (($1).*)' USING NEW ;
> RETURN NULL;
> END;
> $
> LANGUAGE plpgsql;
>
> I've seen that some of you guys have worked on writing triggers in C.
>
> Does anyone have had an experience writing a trigger for partitioning in C ?
would necessarily improve performance. It's pretty likely that it wouldn't help much,
call, marshalling arguments, and calling the function, none of which would
A *major* cost that your existing implementation has is that it's re-planning
the queries for every single invocation. This is an old, old problem from the
Lisp days, "EVAL considered evil" <http://stackoverflow.com/questions/2571401/why-exactly-is-eval-evil>
the queries for every single invocation. This is an old, old problem from the
Lisp days, "EVAL considered evil" <http://stackoverflow.com/questions/2571401/why-exactly-is-eval-evil>
The EXECUTE winds up replanning queries every time the trigger fires.
If you can instead enumerate the partitions explicitly, putting them into (say) a
CASE clause, the planner could generate the plan once, rather than a million
times, which would be a HUGE savings, vastly greater than you could expect from
recoding into C.
CASE clause, the planner could generate the plan once, rather than a million
times, which would be a HUGE savings, vastly greater than you could expect from
recoding into C.
The function might look more like:
create or replace function quotes_insert_trigger () returns trigger as $$
declare
declare
c_rt text;
begin
c_rt := to_char(new.received_time, 'YYYY_MM_DD');
case c_rt
when '2012_03_01' then
insert into 2012_03_01 values (NEW.*) using new;
when '2012_03_02' then
insert into 2012_03_02 values (NEW.*) using new;
else raise exception 'Need a new partition function for %', c_rt;
end case;
end $$ language plpgsql;
You'd periodically need to change the function to reflect the existing set of
partitions, but that's cheaper than creating a new partition.
partitions, but that's cheaper than creating a new partition.
The case statement gets more expensive (in effect O(n) on the number of
partitions, n) as the number of partitions increases. You could split
the date into pieces (e.g. - years, months, days) to diminish that cost.
partitions, n) as the number of partitions increases. You could split
the date into pieces (e.g. - years, months, days) to diminish that cost.
But at any rate, this should be *way* faster than what you're running now,
and not at any heinous change in development costs (as would likely
be the case reimplementing using SPI).
and not at any heinous change in development costs (as would likely
be the case reimplementing using SPI).
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"
2012/12/21 Joe Conway <mail@joeconway.com>: > On 12/21/2012 08:39 AM, Merlin Moncure wrote: >> On Fri, Dec 21, 2012 at 10:25 AM, Charles Gomes <charlesrg@outlook.com> wrote: >>> Hello guys, >>> >>> I've been finding performance issues when using a trigger to modify inserts on a partitioned table. >>> If using the trigger the total time goes from 1 Hour to 4 hours. >>> >>> The trigger is pretty simple: >>> >>> CREATE OR REPLACE FUNCTION quotes_insert_trigger() >>> RETURNS trigger AS $$ >>> BEGIN >>> EXECUTE 'INSERT INTO quotes_'|| to_char(new.received_time,'YYYY_MM_DD') ||' VALUES (($1).*)' USING NEW ; >>> RETURN NULL; >>> END; >>> $$ >>> LANGUAGE plpgsql; >>> >>> I've seen that some of you guys have worked on writing triggers in C. >>> >>> Does anyone have had an experience writing a trigger for partitioning in C ? >>> >>> If you have some code to paste so I can start from I will really appreciate. >> >> Honestly I'd leave the trigger alone and modify the client code in >> performance sensitive places to insert directly to the correct >> partition table. > > I second that recommendation -- your performance will be much, much, better. sure Pavel > > Joe > > -- > Joe Conway > credativ LLC: http://www.credativ.us > Linux, PostgreSQL, and general Open Source > Training, Service, Consulting, & 24x7 Support > > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
________________________________ > Date: Fri, 21 Dec 2012 11:56:25 -0500 > Subject: Re: [HACKERS] Writing Trigger Functions in C > From: cbbrowne@gmail.com > To: charlesrg@outlook.com > CC: pgsql-hackers@postgresql.org > > On Fri, Dec 21, 2012 at 11:25 AM, Charles Gomes > <charlesrg@outlook.com<mailto:charlesrg@outlook.com>> wrote: > > > > Hello guys, > > > > I've been finding performance issues when using a trigger to modify > inserts on a partitioned table. > > If using the trigger the total time goes from 1 Hour to 4 hours. > > > > The trigger is pretty simple: > > > > CREATE OR REPLACE FUNCTION quotes_insert_trigger() > > RETURNS trigger AS $ > > BEGIN > > EXECUTE 'INSERT INTO quotes_'|| > to_char(new.received_time,'YYYY_MM_DD') ||' VALUES (($1).*)' USING NEW > ; > > RETURN NULL; > > END; > > $ > > LANGUAGE plpgsql; > > > > I've seen that some of you guys have worked on writing triggers in C. > > > > Does anyone have had an experience writing a trigger for partitioning > in C ? > > I'd want to be very careful about assuming that implementing the > trigger function in C > would necessarily improve performance. It's pretty likely that it > wouldn't help much, > as a fair bit of the cost of firing a trigger have to do with figuring > out which function to > call, marshalling arguments, and calling the function, none of which would > magically disappear by virtue of implementing in C. > > A *major* cost that your existing implementation has is that it's re-planning > the queries for every single invocation. This is an old, old problem > from the > Lisp days, "EVAL considered evil" > <http://stackoverflow.com/questions/2571401/why-exactly-is-eval-evil> > > The EXECUTE winds up replanning queries every time the trigger fires. > > If you can instead enumerate the partitions explicitly, putting them > into (say) a > CASE clause, the planner could generate the plan once, rather than a million > times, which would be a HUGE savings, vastly greater than you could > expect from > recoding into C. > > The function might look more like: > > create or replace function quotes_insert_trigger () returns trigger as $$ > declare > c_rt text; > begin > c_rt := to_char(new.received_time, 'YYYY_MM_DD'); > case c_rt > when '2012_03_01' then > insert into 2012_03_01 values (NEW.*) using new; > when '2012_03_02' then > insert into 2012_03_02 values (NEW.*) using new; > else > raise exception 'Need a new partition function for %', c_rt; > end case; > end $$ language plpgsql; > > You'd periodically need to change the function to reflect the existing set of > partitions, but that's cheaper than creating a new partition. > > The case statement gets more expensive (in effect O(n) on the number of > partitions, n) as the number of partitions increases. You could split > the date into pieces (e.g. - years, months, days) to diminish that cost. > > But at any rate, this should be *way* faster than what you're running now, > and not at any heinous change in development costs (as would likely > be the case reimplementing using SPI). > -- > When confronted by a difficult problem, solve it by reducing it to the > question, "How would the Lone Ranger handle this?" I will change and implement it this way, I was not aware of such optimization. Will post back after my benchmark runs.
---------------------------------------- > From: charlesrg@outlook.com > To: cbbrowne@gmail.com > CC: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Writing Trigger Functions in C > Date: Fri, 21 Dec 2012 12:27:26 -0500 > > ________________________________ > > Date: Fri, 21 Dec 2012 11:56:25 -0500 > > Subject: Re: [HACKERS] Writing Trigger Functions in C > > From: cbbrowne@gmail.com > > To: charlesrg@outlook.com > > CC: pgsql-hackers@postgresql.org > > > > On Fri, Dec 21, 2012 at 11:25 AM, Charles Gomes > > <charlesrg@outlook.com<mailto:charlesrg@outlook.com>> wrote: > > > > > > Hello guys, > > > > > > I've been finding performance issues when using a trigger to modify > > inserts on a partitioned table. > > > If using the trigger the total time goes from 1 Hour to 4 hours. > > > > > > The trigger is pretty simple: > > > > > > CREATE OR REPLACE FUNCTION quotes_insert_trigger() > > > RETURNS trigger AS $ > > > BEGIN > > > EXECUTE 'INSERT INTO quotes_'|| > > to_char(new.received_time,'YYYY_MM_DD') ||' VALUES (($1).*)' USING NEW > > ; > > > RETURN NULL; > > > END; > > > $ > > > LANGUAGE plpgsql; > > > > > > I've seen that some of you guys have worked on writing triggers in C. > > > > > > Does anyone have had an experience writing a trigger for partitioning > > in C ? > > > > I'd want to be very careful about assuming that implementing the > > trigger function in C > > would necessarily improve performance. It's pretty likely that it > > wouldn't help much, > > as a fair bit of the cost of firing a trigger have to do with figuring > > out which function to > > call, marshalling arguments, and calling the function, none of which would > > magically disappear by virtue of implementing in C. > > > > A *major* cost that your existing implementation has is that it's re-planning > > the queries for every single invocation. This is an old, old problem > > from the > > Lisp days, "EVAL considered evil" > > <http://stackoverflow.com/questions/2571401/why-exactly-is-eval-evil> > > > > The EXECUTE winds up replanning queries every time the trigger fires. > > > > If you can instead enumerate the partitions explicitly, putting them > > into (say) a > > CASE clause, the planner could generate the plan once, rather than a million > > times, which would be a HUGE savings, vastly greater than you could > > expect from > > recoding into C. > > > > The function might look more like: > > > > create or replace function quotes_insert_trigger () returns trigger as $$ > > declare > > c_rt text; > > begin > > c_rt := to_char(new.received_time, 'YYYY_MM_DD'); > > case c_rt > > when '2012_03_01' then > > insert into 2012_03_01 values (NEW.*) using new; > > when '2012_03_02' then > > insert into 2012_03_02 values (NEW.*) using new; > > else > > raise exception 'Need a new partition function for %', c_rt; > > end case; > > end $$ language plpgsql; > > > > You'd periodically need to change the function to reflect the existing set of > > partitions, but that's cheaper than creating a new partition. > > > > The case statement gets more expensive (in effect O(n) on the number of > > partitions, n) as the number of partitions increases. You could split > > the date into pieces (e.g. - years, months, days) to diminish that cost. > > > > But at any rate, this should be *way* faster than what you're running now, > > and not at any heinous change in development costs (as would likely > > be the case reimplementing using SPI). > > -- > > When confronted by a difficult problem, solve it by reducing it to the > > question, "How would the Lone Ranger handle this?" > > > I will change and implement it this way, I was not aware of such optimization. > Will post back after my benchmark runs. > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers I've to have add 2 weeks of data at a time, therefore I had to keep two weeks of case statements Replaced the short trigger function to: CREATE OR REPLACE FUNCTION quotes_insert_trigger() RETURNS trigger AS $$ DECLARE r_date text; BEGIN r_date = to_char(new.received_time, 'YYYY_MM_DD'); case r_date when '2012_09_10' then insert into quotes_2012_09_10 values (NEW.*) using new; return; when '2012_09_11' then insert into quotes_2012_09_11 values (NEW.*) using new; return; when '2012_09_12' then insert into quotes_2012_09_12 values (NEW.*) using new; return; when '2012_09_13' then insert into quotes_2012_09_13 values (NEW.*) using new; return; when '2012_09_14' then insert into quotes_2012_09_14 values (NEW.*) using new; return; when '2012_09_15' then insert into quotes_2012_09_15 values (NEW.*) using new; return; when '2012_09_16' then insert into quotes_2012_09_16 values (NEW.*) using new; return; when '2012_09_17' then insert into quotes_2012_09_17 values (NEW.*) using new; return; when '2012_09_18' then insert into quotes_2012_09_18 values (NEW.*) using new; return; when '2012_09_19' then insert into quotes_2012_09_19 values (NEW.*) using new; return; when '2012_09_20' then insert into quotes_2012_09_20 values (NEW.*) using new; return; when '2012_09_21' then insert into quotes_2012_09_21 values (NEW.*) using new; return; when '2012_09_22' then insert into quotes_2012_09_22 values (NEW.*) using new; return; when '2012_09_23' then insert into quotes_2012_09_23 values (NEW.*) using new; return; when '2012_09_24' then insert into quotes_2012_09_24 values (NEW.*) using new; return; end case RETURN NULL; END; $$ LANGUAGE plpgsql; And I had no performance improvements at all. Took the same time as with the previous EXECUTE statement; I don't see what am I doing wrong.
On Mon, Dec 24, 2012 at 10:43 AM, Charles Gomes <charlesrg@outlook.com> wrote: > And I had no performance improvements at all. > Took the same time as with the previous EXECUTE statement; > > I don't see what am I doing wrong. You might not be doing anything wrong. Triggers ARE slow. If you have "perf" on your system, you could use "perf top" or "perf record -a" to find out where the CPU time is going while you're doing stuff that fires this trigger. That might provide some clues about how to optimize. But it may be that you'll get a completely flat profile, or something that otherwise boils down to ... triggers are slow. In answer to your original question, there is a C language trigger in contrib/tcn. But, without some proof that the use of PL/pgsql is the problem, I don't know how far down that road it's worth going. It might be worth writing a C trigger that does nothing but return the original tuple, or even a PL/pgsql one. This obviously wouldn't accomplish anything as far as partitioning goes, but it would let you measure the overhead of calling a no-op trigger, which could be a useful thing to know. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
---------------------------------------- > Date: Sat, 29 Dec 2012 23:45:06 -0500 > Subject: Re: [HACKERS] Writing Trigger Functions in C > From: robertmhaas@gmail.com > To: charlesrg@outlook.com > CC: cbbrowne@gmail.com; pgsql-hackers@postgresql.org > > On Mon, Dec 24, 2012 at 10:43 AM, Charles Gomes <charlesrg@outlook.com> wrote: > > And I had no performance improvements at all. > > Took the same time as with the previous EXECUTE statement; > > > > I don't see what am I doing wrong. > > You might not be doing anything wrong. Triggers ARE slow. > > If you have "perf" on your system, you could use "perf top" or "perf > record -a" to find out where the CPU time is going while you're doing > stuff that fires this trigger. That might provide some clues about > how to optimize. But it may be that you'll get a completely flat > profile, or something that otherwise boils down to ... triggers are > slow. > > In answer to your original question, there is a C language trigger in > contrib/tcn. But, without some proof that the use of PL/pgsql is the > problem, I don't know how far down that road it's worth going. It > might be worth writing a C trigger that does nothing but return the > original tuple, or even a PL/pgsql one. This obviously wouldn't > accomplish anything as far as partitioning goes, but it would let you > measure the overhead of calling a no-op trigger, which could be a > useful thing to know. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company I've translated the trigger to C and performance had not increased, just like you guys said. I've created an article withthe trigger and the metrics in case anyone becomes interested in the future http://www.charlesrg.com/linux/71-postgresql-partitioning-the-database-the-fastest-way