Re: Performance on Bulk Insert to Partitioned Table

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема Re: Performance on Bulk Insert to Partitioned Table
Дата
Msg-id CAFj8pRBuyzpgw2Knb8RzmVd-PEe04RKsR8ymyYDTQ6rX-ktkHA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Performance on Bulk Insert to Partitioned Table  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-performance
2012/12/27 Jeff Janes <jeff.janes@gmail.com>:
> On Monday, December 24, 2012, Charles Gomes wrote:
>>
>> ________________________________
>
>
>>
>> >
>> > I think your performance bottleneck is almost certainly the dynamic
>> > SQL.  Using C to generate that dynamic SQL isn't going to help much,
>> > because it is still the SQL engine that has to parse, plan, and execute
>> > it.
>> >
>>
>> Jeff, I've changed the code from dynamic to:
>>
>> CREATE OR REPLACE FUNCTION quotes_insert_trigger()
>> RETURNS trigger AS $$
>> DECLARE
>> r_date text;
>> BEGIN
>> r_date = to_char(new.received_time, 'YYYY_MM_DD');
>> case r_date
>>     when '2012_09_10' then
>>         insert into quotes_2012_09_10 values (NEW.*) using new;
>>         return;
>> ...
>
>
>>
>> However I've got no speed improvement.
>> I need to keep two weeks worth of partitions at a time, that's why all the
>> WHEN statements.
>
>
> The 'using new' and return without argument are syntax errors.
>
> When I do a model system with those fixed, I get about 2 fold improvement
> over the dynamic SQL performance.  Even if your performance did not go up,
> did your CPU usage go down?  Perhaps you have multiple bottlenecks all
> sitting at about the same place, and so tackling any one of them at a time
> doesn't get you anywhere.
>
> How does both the dynamic and the CASE scale with the number of threads?  I
> think you said you had something like 70 sessions, but only 8 CPUs.  That
> probably will do bad things with contention, and I don't see how using more
> connections than CPUs is going to help you here.  If the CASE starts out
> faster in single thread but then flat lines and the EXECUTE catches up, that
> suggests a different avenue of investigation than they are always the same.
>
>
>>
>> Wish postgres could automate the partition process natively like the other
>> sql db.
>
>
> More automated would be nice (i.e. one operation to make both the check
> constraints and the trigger, so they can't get out of sync), but would not
> necessarily mean faster.  I don't know what you mean about other db.  Last
> time I looked at partitioning in mysql, it was only about breaking up the
> underlying storage into separate files (without regards to contents of the
> rows), so that is the same as what postgres does automatically.  And in
> Oracle, their partitioning seemed about the same as postgres's as far as
> administrative tedium was concerned.  I'm not familiar with how the MS
> product handles it, and maybe me experience with the other two are out of
> date.

I did simple test - not too precious (just for first orientation) -
tested on 9.3 - compiled without assertions

insert 0.5M rows into empty target table with one trivial trigger and
one index is about 4 sec

same with little bit complex trigger - one IF statement and two assign
statements is about 5 sec

simple forwarding two two tables - 8 sec

using dynamic SQL is significantly slower - 18 sec - probably due
overhead with cached plans

a overhead depends on number of partitions, number of indexes, but I
expect so overhead of redistributed triggers should be about 50-100%
(less on large tables, higher on small tables).

Native implementation should significantly effective evaluate
expressions, mainly simple expressions - (this is significant for
large number of partitions) and probably can do tuple forwarding
faster than is heavy INSERT statement (is question if is possible
decrease some overhead with more sophisticate syntax (by removing
record expand).

So native implementation can carry significant speed up - mainly if we
can distribute tuples without expression evaluating (evaluated by
executor)

Regards

Pavel




>
> Cheers,
>
> Jeff


В списке pgsql-performance по дате отправления:

Предыдущее
От: Jeff Janes
Дата:
Сообщение: Re: Why does the query planner use two full indexes, when a dedicated partial index exists? (solved?)
Следующее
От: Richard Neill
Дата:
Сообщение: Re: Why does the query planner use two full indexes, when a dedicated partial index exists?