Обсуждение: Writing Trigger Functions in C

Поиск
Список
Период
Сортировка

Writing Trigger Functions in C

От
Charles Gomes
Дата:
Hello guys,

I've been finding performance issues when using a trigger to modify inserts on a partitioned table.
If using the trigger the total time goes from 1 Hour to 4 hours.

The trigger is pretty simple:

CREATE OR REPLACE FUNCTION quotes_insert_trigger()
RETURNS trigger AS $$
BEGIN
EXECUTE 'INSERT INTO quotes_'|| to_char(new.received_time,'YYYY_MM_DD') ||' VALUES (($1).*)' USING NEW ;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;

I've seen that some of you guys have worked on writing triggers in C.

Does anyone have had an experience writing a trigger for partitioning in C ?

If you have some code to paste so I can start from I will really appreciate.

Thanks


Re: Writing Trigger Functions in C

От
Merlin Moncure
Дата:
On Fri, Dec 21, 2012 at 10:25 AM, Charles Gomes <charlesrg@outlook.com> wrote:
> Hello guys,
>
> I've been finding performance issues when using a trigger to modify inserts on a partitioned table.
> If using the trigger the total time goes from 1 Hour to 4 hours.
>
> The trigger is pretty simple:
>
> CREATE OR REPLACE FUNCTION quotes_insert_trigger()
> RETURNS trigger AS $$
> BEGIN
> EXECUTE 'INSERT INTO quotes_'|| to_char(new.received_time,'YYYY_MM_DD') ||' VALUES (($1).*)' USING NEW ;
> RETURN NULL;
> END;
> $$
> LANGUAGE plpgsql;
>
> I've seen that some of you guys have worked on writing triggers in C.
>
> Does anyone have had an experience writing a trigger for partitioning in C ?
>
> If you have some code to paste so I can start from I will really appreciate.

Honestly I'd leave the trigger alone and modify the client code in
performance sensitive places to insert directly to the correct
partition table.

merlin



Re: Writing Trigger Functions in C

От
Pavel Stehule
Дата:
Hello

you can find lot of examples in PostgreSQL source code - see
postgresql/contrib/spi directory

and documentation http://www.postgresql.org/docs/9.0/static/trigger-example.html

Regards

Pavel Stehule



2012/12/21 Charles Gomes <charlesrg@outlook.com>:
> Hello guys,
>
> I've been finding performance issues when using a trigger to modify inserts on a partitioned table.
> If using the trigger the total time goes from 1 Hour to 4 hours.
>
> The trigger is pretty simple:
>
> CREATE OR REPLACE FUNCTION quotes_insert_trigger()
> RETURNS trigger AS $$
> BEGIN
> EXECUTE 'INSERT INTO quotes_'|| to_char(new.received_time,'YYYY_MM_DD') ||' VALUES (($1).*)' USING NEW ;
> RETURN NULL;
> END;
> $$
> LANGUAGE plpgsql;
>
> I've seen that some of you guys have worked on writing triggers in C.
>
> Does anyone have had an experience writing a trigger for partitioning in C ?
>
> If you have some code to paste so I can start from I will really appreciate.
>
> Thanks
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



Re: Writing Trigger Functions in C

От
Joe Conway
Дата:
On 12/21/2012 08:39 AM, Merlin Moncure wrote:
> On Fri, Dec 21, 2012 at 10:25 AM, Charles Gomes <charlesrg@outlook.com> wrote:
>> Hello guys,
>>
>> I've been finding performance issues when using a trigger to modify inserts on a partitioned table.
>> If using the trigger the total time goes from 1 Hour to 4 hours.
>>
>> The trigger is pretty simple:
>>
>> CREATE OR REPLACE FUNCTION quotes_insert_trigger()
>> RETURNS trigger AS $$
>> BEGIN
>> EXECUTE 'INSERT INTO quotes_'|| to_char(new.received_time,'YYYY_MM_DD') ||' VALUES (($1).*)' USING NEW ;
>> RETURN NULL;
>> END;
>> $$
>> LANGUAGE plpgsql;
>>
>> I've seen that some of you guys have worked on writing triggers in C.
>>
>> Does anyone have had an experience writing a trigger for partitioning in C ?
>>
>> If you have some code to paste so I can start from I will really appreciate.
> 
> Honestly I'd leave the trigger alone and modify the client code in
> performance sensitive places to insert directly to the correct
> partition table.

I second that recommendation -- your performance will be much, much, better.

Joe

-- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support





Re: Writing Trigger Functions in C

От
Christopher Browne
Дата:
On Fri, Dec 21, 2012 at 11:25 AM, Charles Gomes <charlesrg@outlook.com> wrote:
>
> Hello guys,
>
> I've been finding performance issues when using a trigger to modify inserts on a partitioned table.
> If using the trigger the total time goes from 1 Hour to 4 hours.
>
> The trigger is pretty simple:
>
> CREATE OR REPLACE FUNCTION quotes_insert_trigger()
> RETURNS trigger AS $
> BEGIN
> EXECUTE 'INSERT INTO quotes_'|| to_char(new.received_time,'YYYY_MM_DD') ||' VALUES (($1).*)' USING NEW ;
> RETURN NULL;
> END;
> $
> LANGUAGE plpgsql;
>
> I've seen that some of you guys have worked on writing triggers in C.
>
> Does anyone have had an experience writing a trigger for partitioning in C ?

I'd want to be very careful about assuming that implementing the trigger function in C
would necessarily improve performance.  It's pretty likely that it wouldn't help much,
as a fair bit of the cost of firing a trigger have to do with figuring out which function to
call, marshalling arguments, and calling the function, none of which would
magically disappear by virtue of implementing in C.

A *major* cost that your existing implementation has is that it's re-planning
the queries for every single invocation.  This is an old, old problem from the
Lisp days, "EVAL considered evil"  <http://stackoverflow.com/questions/2571401/why-exactly-is-eval-evil>

The EXECUTE winds up replanning queries every time the trigger fires.

If you can instead enumerate the partitions explicitly, putting them into (say) a
CASE clause, the planner could generate the plan once, rather than a million
times, which would be a HUGE savings, vastly greater than you could expect from
recoding into C.

The function might look more like:

create or replace function quotes_insert_trigger () returns trigger as $$
declare
    c_rt text;
begin
   c_rt := to_char(new.received_time, 'YYYY_MM_DD');
   case c_rt
     when '2012_03_01' then
       insert into 2012_03_01 values (NEW.*) using new;
     when '2012_03_02' then
       insert into 2012_03_02 values (NEW.*) using new;
     else
       raise exception 'Need a new partition function for %', c_rt;
     end case;
end $$ language plpgsql;

You'd periodically need to change the function to reflect the existing set of
partitions, but that's cheaper than creating a new partition.

The case statement gets more expensive (in effect O(n) on the number of
partitions, n) as the number of partitions increases.  You could split
the date into pieces (e.g. - years, months, days) to diminish that cost.

But at any rate, this should be *way* faster than what you're running now,
and not at any heinous change in development costs (as would likely
be the case reimplementing using SPI).
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

Re: Writing Trigger Functions in C

От
Pavel Stehule
Дата:
2012/12/21 Joe Conway <mail@joeconway.com>:
> On 12/21/2012 08:39 AM, Merlin Moncure wrote:
>> On Fri, Dec 21, 2012 at 10:25 AM, Charles Gomes <charlesrg@outlook.com> wrote:
>>> Hello guys,
>>>
>>> I've been finding performance issues when using a trigger to modify inserts on a partitioned table.
>>> If using the trigger the total time goes from 1 Hour to 4 hours.
>>>
>>> The trigger is pretty simple:
>>>
>>> CREATE OR REPLACE FUNCTION quotes_insert_trigger()
>>> RETURNS trigger AS $$
>>> BEGIN
>>> EXECUTE 'INSERT INTO quotes_'|| to_char(new.received_time,'YYYY_MM_DD') ||' VALUES (($1).*)' USING NEW ;
>>> RETURN NULL;
>>> END;
>>> $$
>>> LANGUAGE plpgsql;
>>>
>>> I've seen that some of you guys have worked on writing triggers in C.
>>>
>>> Does anyone have had an experience writing a trigger for partitioning in C ?
>>>
>>> If you have some code to paste so I can start from I will really appreciate.
>>
>> Honestly I'd leave the trigger alone and modify the client code in
>> performance sensitive places to insert directly to the correct
>> partition table.
>
> I second that recommendation -- your performance will be much, much, better.

sure

Pavel

>
> Joe
>
> --
> Joe Conway
> credativ LLC: http://www.credativ.us
> Linux, PostgreSQL, and general Open Source
> Training, Service, Consulting, & 24x7 Support
>
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



Re: Writing Trigger Functions in C

От
Charles Gomes
Дата:
________________________________
> Date: Fri, 21 Dec 2012 11:56:25 -0500
> Subject: Re: [HACKERS] Writing Trigger Functions in C
> From: cbbrowne@gmail.com
> To: charlesrg@outlook.com
> CC: pgsql-hackers@postgresql.org
>
> On Fri, Dec 21, 2012 at 11:25 AM, Charles Gomes
> <charlesrg@outlook.com<mailto:charlesrg@outlook.com>> wrote:
>  >
>  > Hello guys,
>  >
>  > I've been finding performance issues when using a trigger to modify
> inserts on a partitioned table.
>  > If using the trigger the total time goes from 1 Hour to 4 hours.
>  >
>  > The trigger is pretty simple:
>  >
>  > CREATE OR REPLACE FUNCTION quotes_insert_trigger()
>  > RETURNS trigger AS $
>  > BEGIN
>  > EXECUTE 'INSERT INTO quotes_'||
> to_char(new.received_time,'YYYY_MM_DD') ||' VALUES (($1).*)' USING NEW
> ;
>  > RETURN NULL;
>  > END;
>  > $
>  > LANGUAGE plpgsql;
>  >
>  > I've seen that some of you guys have worked on writing triggers in C.
>  >
>  > Does anyone have had an experience writing a trigger for partitioning
> in C ?
>
> I'd want to be very careful about assuming that implementing the
> trigger function in C
> would necessarily improve performance.  It's pretty likely that it
> wouldn't help much,
> as a fair bit of the cost of firing a trigger have to do with figuring
> out which function to
> call, marshalling arguments, and calling the function, none of which would
> magically disappear by virtue of implementing in C.
>
> A *major* cost that your existing implementation has is that it's re-planning
> the queries for every single invocation.  This is an old, old problem
> from the
> Lisp days, "EVAL considered evil"
> <http://stackoverflow.com/questions/2571401/why-exactly-is-eval-evil>
>
> The EXECUTE winds up replanning queries every time the trigger fires.
>
> If you can instead enumerate the partitions explicitly, putting them
> into (say) a
> CASE clause, the planner could generate the plan once, rather than a million
> times, which would be a HUGE savings, vastly greater than you could
> expect from
> recoding into C.
>
> The function might look more like:
>
> create or replace function quotes_insert_trigger () returns trigger as $$
> declare
>      c_rt text;
> begin
>     c_rt := to_char(new.received_time, 'YYYY_MM_DD');
>     case c_rt
>       when '2012_03_01' then
>         insert into 2012_03_01 values (NEW.*) using new;
>       when '2012_03_02' then
>         insert into 2012_03_02 values (NEW.*) using new;
>       else
>         raise exception 'Need a new partition function for %', c_rt;
>       end case;
> end $$ language plpgsql;
>
> You'd periodically need to change the function to reflect the existing set of
> partitions, but that's cheaper than creating a new partition.
>
> The case statement gets more expensive (in effect O(n) on the number of
> partitions, n) as the number of partitions increases.  You could split
> the date into pieces (e.g. - years, months, days) to diminish that cost.
>
> But at any rate, this should be *way* faster than what you're running now,
> and not at any heinous change in development costs (as would likely
> be the case reimplementing using SPI).
> --
> When confronted by a difficult problem, solve it by reducing it to the
> question, "How would the Lone Ranger handle this?"


I will change and implement it this way, I was not aware of such optimization.
Will post back after my benchmark runs.


Re: Writing Trigger Functions in C

От
Charles Gomes
Дата:
----------------------------------------
> From: charlesrg@outlook.com
> To: cbbrowne@gmail.com
> CC: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Writing Trigger Functions in C
> Date: Fri, 21 Dec 2012 12:27:26 -0500
>
> ________________________________
> > Date: Fri, 21 Dec 2012 11:56:25 -0500
> > Subject: Re: [HACKERS] Writing Trigger Functions in C
> > From: cbbrowne@gmail.com
> > To: charlesrg@outlook.com
> > CC: pgsql-hackers@postgresql.org
> >
> > On Fri, Dec 21, 2012 at 11:25 AM, Charles Gomes
> > <charlesrg@outlook.com<mailto:charlesrg@outlook.com>> wrote:
> > >
> > > Hello guys,
> > >
> > > I've been finding performance issues when using a trigger to modify
> > inserts on a partitioned table.
> > > If using the trigger the total time goes from 1 Hour to 4 hours.
> > >
> > > The trigger is pretty simple:
> > >
> > > CREATE OR REPLACE FUNCTION quotes_insert_trigger()
> > > RETURNS trigger AS $
> > > BEGIN
> > > EXECUTE 'INSERT INTO quotes_'||
> > to_char(new.received_time,'YYYY_MM_DD') ||' VALUES (($1).*)' USING NEW
> > ;
> > > RETURN NULL;
> > > END;
> > > $
> > > LANGUAGE plpgsql;
> > >
> > > I've seen that some of you guys have worked on writing triggers in C.
> > >
> > > Does anyone have had an experience writing a trigger for partitioning
> > in C ?
> >
> > I'd want to be very careful about assuming that implementing the
> > trigger function in C
> > would necessarily improve performance. It's pretty likely that it
> > wouldn't help much,
> > as a fair bit of the cost of firing a trigger have to do with figuring
> > out which function to
> > call, marshalling arguments, and calling the function, none of which would
> > magically disappear by virtue of implementing in C.
> >
> > A *major* cost that your existing implementation has is that it's re-planning
> > the queries for every single invocation. This is an old, old problem
> > from the
> > Lisp days, "EVAL considered evil"
> > <http://stackoverflow.com/questions/2571401/why-exactly-is-eval-evil>
> >
> > The EXECUTE winds up replanning queries every time the trigger fires.
> >
> > If you can instead enumerate the partitions explicitly, putting them
> > into (say) a
> > CASE clause, the planner could generate the plan once, rather than a million
> > times, which would be a HUGE savings, vastly greater than you could
> > expect from
> > recoding into C.
> >
> > The function might look more like:
> >
> > create or replace function quotes_insert_trigger () returns trigger as $$
> > declare
> > c_rt text;
> > begin
> > c_rt := to_char(new.received_time, 'YYYY_MM_DD');
> > case c_rt
> > when '2012_03_01' then
> > insert into 2012_03_01 values (NEW.*) using new;
> > when '2012_03_02' then
> > insert into 2012_03_02 values (NEW.*) using new;
> > else
> > raise exception 'Need a new partition function for %', c_rt;
> > end case;
> > end $$ language plpgsql;
> >
> > You'd periodically need to change the function to reflect the existing set of
> > partitions, but that's cheaper than creating a new partition.
> >
> > The case statement gets more expensive (in effect O(n) on the number of
> > partitions, n) as the number of partitions increases. You could split
> > the date into pieces (e.g. - years, months, days) to diminish that cost.
> >
> > But at any rate, this should be *way* faster than what you're running now,
> > and not at any heinous change in development costs (as would likely
> > be the case reimplementing using SPI).
> > --
> > When confronted by a difficult problem, solve it by reducing it to the
> > question, "How would the Lone Ranger handle this?"
>
>
> I will change and implement it this way, I was not aware of such optimization.
> Will post back after my benchmark runs.
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

I've to have add 2 weeks of data at a time, therefore I had to keep two weeks of case statements
Replaced the short trigger function to:


CREATE OR REPLACE FUNCTION quotes_insert_trigger()
RETURNS trigger AS $$
DECLARE
r_date text;
BEGIN
r_date = to_char(new.received_time, 'YYYY_MM_DD');
case r_date
    when '2012_09_10' then
        insert into quotes_2012_09_10 values (NEW.*) using new;
        return;
    when '2012_09_11' then
        insert into quotes_2012_09_11 values (NEW.*) using new;
        return;
    when '2012_09_12' then
        insert into quotes_2012_09_12 values (NEW.*) using new;
        return;
    when '2012_09_13' then
        insert into quotes_2012_09_13 values (NEW.*) using new;
        return;
    when '2012_09_14' then
        insert into quotes_2012_09_14 values (NEW.*) using new;
        return;
    when '2012_09_15' then
        insert into quotes_2012_09_15 values (NEW.*) using new;
        return;
    when '2012_09_16' then
        insert into quotes_2012_09_16 values (NEW.*) using new;
        return;
    when '2012_09_17' then
        insert into quotes_2012_09_17 values (NEW.*) using new;
        return;
    when '2012_09_18' then
        insert into quotes_2012_09_18 values (NEW.*) using new;
        return;
    when '2012_09_19' then
        insert into quotes_2012_09_19 values (NEW.*) using new;
        return;
    when '2012_09_20' then
        insert into quotes_2012_09_20 values (NEW.*) using new;
        return;
    when '2012_09_21' then
        insert into quotes_2012_09_21 values (NEW.*) using new;
        return;
    when '2012_09_22' then
        insert into quotes_2012_09_22 values (NEW.*) using new;
        return;
    when '2012_09_23' then
        insert into quotes_2012_09_23 values (NEW.*) using new;
        return;
    when '2012_09_24' then
        insert into quotes_2012_09_24 values (NEW.*) using new;
        return;
end case
RETURN NULL;
END;
$$
LANGUAGE plpgsql;


And I had no performance improvements at all.
Took the same time as with the previous EXECUTE statement;

I don't see what am I doing wrong.


Re: Writing Trigger Functions in C

От
Robert Haas
Дата:
On Mon, Dec 24, 2012 at 10:43 AM, Charles Gomes <charlesrg@outlook.com> wrote:
> And I had no performance improvements at all.
> Took the same time as with the previous EXECUTE statement;
>
> I don't see what am I doing wrong.

You might not be doing anything wrong.  Triggers ARE slow.

If you have "perf" on your system, you could use "perf top" or "perf
record -a" to find out where the CPU time is going while you're doing
stuff that fires this trigger.  That might provide some clues about
how to optimize.  But it may be that you'll get a completely flat
profile, or something that otherwise boils down to ... triggers are
slow.

In answer to your original question, there is a C language trigger in
contrib/tcn.  But, without some proof that the use of PL/pgsql is the
problem, I don't know how far down that road it's worth going.  It
might be worth writing a C trigger that does nothing but return the
original tuple, or even a PL/pgsql one.  This obviously wouldn't
accomplish anything as far as partitioning goes, but it would let you
measure the overhead of calling a no-op trigger, which could be a
useful thing to know.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Writing Trigger Functions in C

От
Charles Gomes
Дата:
----------------------------------------
> Date: Sat, 29 Dec 2012 23:45:06 -0500
> Subject: Re: [HACKERS] Writing Trigger Functions in C
> From: robertmhaas@gmail.com
> To: charlesrg@outlook.com
> CC: cbbrowne@gmail.com; pgsql-hackers@postgresql.org
>
> On Mon, Dec 24, 2012 at 10:43 AM, Charles Gomes <charlesrg@outlook.com> wrote:
> > And I had no performance improvements at all.
> > Took the same time as with the previous EXECUTE statement;
> >
> > I don't see what am I doing wrong.
>
> You might not be doing anything wrong. Triggers ARE slow.
>
> If you have "perf" on your system, you could use "perf top" or "perf
> record -a" to find out where the CPU time is going while you're doing
> stuff that fires this trigger. That might provide some clues about
> how to optimize. But it may be that you'll get a completely flat
> profile, or something that otherwise boils down to ... triggers are
> slow.
>
> In answer to your original question, there is a C language trigger in
> contrib/tcn. But, without some proof that the use of PL/pgsql is the
> problem, I don't know how far down that road it's worth going. It
> might be worth writing a C trigger that does nothing but return the
> original tuple, or even a PL/pgsql one. This obviously wouldn't
> accomplish anything as far as partitioning goes, but it would let you
> measure the overhead of calling a no-op trigger, which could be a
> useful thing to know.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

I've translated the trigger to C and performance had not increased, just like you guys said. I've created an article
withthe trigger and the metrics in case anyone becomes interested in the future
http://www.charlesrg.com/linux/71-postgresql-partitioning-the-database-the-fastest-way