Re: Declarative partitioning - another take

Поиск

Список

Период

Сортировка

От	Amit Langote
Тема	Re: Declarative partitioning - another take
Дата	29 ноября 2016 г. 14:24:52
Msg-id	3c9f55c0-6353-052d-cd49-52f573972c55@lab.ntt.co.jp обсуждение исходный текст
Ответ на	Re: Declarative partitioning - another take (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Ответы	Re: Declarative partitioning - another take
Список	pgsql-hackers

Дерево обсуждения

On 2016/11/17 20:27, Amit Langote wrote:
> On 2016/11/16 4:21, Robert Haas wrote:
>> Have you done any performance testing on the tuple routing code?
>> Suppose we insert a million (or 10 million) tuples into an
>> unpartitioned table, a table with 10 partitions, a table with 100
>> partitions, a table with 1000 partitions, and a table that is
>> partitioned into 10 partitions each of which has 10 subpartitions.
>> Ideally, the partitioned cases would run almost as fast as the
>> unpartitioned case, but probably there will be some overhead.
>> However, it would be useful to know how much.  Also, it would be
>> useful to set up the same cases with inheritance using a PL/pgsql ON
>> INSERT trigger for tuple routing and compare.  Hopefully the tuple
>> routing code is far faster than a trigger, but we should make sure
>> that's the case and look for optimizations if not.  Also, it would be
>> useful to know how much slower the tuple-mapping-required case is than
>> the no-tuple-mapping-required case.
> 
> OK, I will share the performance results soon.

Sorry about the delay; here are some numbers with the following
partitioning schema:

# plain table
create table plain (a date, b int, c int);

# partitioned table
create table ptab (a date, b int, c int) partition by range (a, b);

Partitions (not the full commands):

ptab_00001 for values from ('2016-11-29',    1) to ('2016-11-29', 1000);
ptab_00002 for values from ('2016-11-29', 1000) to ('2016-11-29', 2000);
...
ptab_00005 for values from ('2016-11-29', 4000) to ('2016-11-29', 5000);

ptab_00006 for values from ('2016-11-30',    1) to ('2016-11-30', 1000);
...
...
ptab_NNNNN for values from ('20XX-XX-XX', 4000) to ('20XX-XX-XX', 5000);

# inheritance partitioned table
create table itab (a date, b int, c int);
create table itab_00001 (   check part_check check (a = '2016-11-29' and b >=    1 and b < 1000)
) inherits (itab);
...
create table itab_00005 (   check part_check check (a = '2016-11-29' and b >= 4000 and b < 5000)
) inherits (itab);
create table itab_0006 (   check part_check check (a = '2016-11-30' and b >=    1and b < 1000)
) inherits (itab);
...
...
create table itab_NNNNN (   check part_check check (a = '2016-11-29' and b >= 4000 and b < 5000)
) inherits (itab);


The BR trigger (on itab) procedure as follows:

CREATE OR REPLACE FUNCTION itab_ins_trig()
RETURNS TRIGGER AS $$
DECLARE partno text;
BEGIN SELECT to_char((NEW.a - '2016-11-29'::date) * 5 + NEW.b / 1000 + 1,                'fm00000') INTO partno;
EXECUTE'INSERT INTO itab_' || partno || ' SELECT $1.*' USING NEW; RETURN NULL;
 
END; $$ LANGUAGE plpgsql;

Note that the tuple-routing procedure above assumes a fixed-stride range
partitioning scheme (shown as tg-direct-map below).  In other cases, the
simplest approach involves defining a if-else ladder, which I tried too
(shown as tg-if-else below), but reporting times only for up to 200
partitions at most (I'm sure there might be ways to be smarter there
somehow, but I didn't; the point here may only be to compare the new
tuple-routing code's overhead vs. trigger overhead in the traditional method).

# All times in seconds (on my modestly-powerful development VM)
#
# nrows = 10,000,000 generated using:
#
# INSERT INTO $tab
# SELECT '$last'::date - ((s.id % $maxsecs + 1)::bigint || 's')::interval,
#       (random() * 5000)::int % 4999 + 1,
#        case s.id % 10
#          when 0 then 'a'
#          when 1 then 'b'
#          when 2 then 'c'
#          ...
#          when 9 then 'j'
#       end
# FROM generate_series(1, $nrows) s(id)
# ORDER BY random();
#
# The first item in the select list is basically a date that won't fall
# outside the defined partitions.

Time for a plain table = 98.1 sec

#part    parted    tg-direct-map    tg-if-else
=====    ======    =============    ==========
10       114.3     1483.3            742.4
50       112.5     1476.6           2016.8
100      117.1     1498.4           5386.1
500      125.3     1475.5             --
1000     129.9     1474.4             --
5000     137.5     1491.4             --
10000    154.7     1480.9             --


Then for a 2-level partitioned table with each of the above partitions
partitioned by list (c), with 10 sub-partitions each as follows:

ptab_NNNNN_a for values in ('a');
ptab_NNNNN_b for values in ('b');
...
ptab_NNNNN_k for values in ('j');

I didn't include the times for inheritance table with a routing trigger in
this case, as it seems that the results would look something like the above:

Time for a plain table = 98.1 sec

#part    (sub-)parted
=====    ============
10       127.0
50       152.3
100      156.6
500      191.8
1000     187.3


Regarding tuple-mapping-required vs no-tuple-mapping-required, all cases
currently require tuple-mapping, because the decision is based on the
result of comparing parent and partition TupleDesc using
equalTupleDescs(), which fails so quickly because TupleDesc.tdtypeid are
not the same.  Anyway, I simply commented out the tuple-mapping statement
in ExecInsert() to observe just slightly improved numbers as follows
(comparing with numbers in the table just above):

#part    (sub-)parted
=====    =================
10       113.9 (vs. 127.0)
100      135.7 (vs. 156.6)
500      182.1 (vs. 191.8)

Thanks,
Amit

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Amit Kapila
Дата: 29 ноября 2016 г., 14:20:02
Сообщение: Re: make default TABLESPACE belong to target table.

Следующее

От: Amit Kapila
Дата: 29 ноября 2016 г., 14:25:07
Сообщение: Re: Parallel execution and prepared statements

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Declarative partitioning - another take

Предыдущее

Следующее