Batch insert in CTAS/MatView code

Поиск
Список
Период
Сортировка
От Paul Guo
Тема Batch insert in CTAS/MatView code
Дата
Msg-id CAEET0ZHRWxbRUgwzUK_tOFDWx7VE2-P=xMBT6-N+gAa9WQ=xxA@mail.gmail.com
обсуждение исходный текст
Ответы Re: Batch insert in CTAS/MatView code
Re: Batch insert in CTAS/MatView code
Re: Batch insert in CTAS/MatView code
Список pgsql-hackers
Hello, Postgres hackers,

The copy code has used batch insert with function heap_multi_insert() to speed up. It seems that Create Table As or Materialized View could leverage that code also to boost the performance also. Attached is a patch to implement that. That was done by Taylor (cc-ed) and me.

The patch also modifies heap_multi_insert() a bit to do a bit further code-level optimization by using static memory, instead of using memory context and dynamic allocation. For Modifytable->insert, it seems that there are more limitations for batch insert (trigger, etc?) but it seems that it is possible that we could do batch insert for the case that we could do? 

By the way, while looking at the code, I noticed that there are 9 local arrays with large length in toast_insert_or_update() which seems to be a risk of stack overflow. Maybe we should put it as static or global.

Here is a quick simple performance testing on a mirrorless Postgres instance with the SQLs below. The tests cover tables with small column length, large column length and toast.

-- tuples with small size.
drop table if exists t1;
create table t1 (a int);

insert into t1 select * from generate_series(1, 10000000);
drop table if exists t2;
\timing
create table t2 as select * from t1;
\timing

-- tuples that are untoasted and data that is 1664 bytes wide
drop table if exists t1;
create table t1 (a name, b name, c name, d name, e name, f name, g name, h name, i name, j name, k name, l name, m name, n name, o name, p name, q name, r name, s name, t name, u name, v name, w name, x name, y name, z name);

insert into t1 select 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z' from generate_series(1, 500000);
drop table if exists t2;
\timing
create table t2 as select * from t1;
\timing

-- tuples that are toastable.
drop table if exists t1;
create table t1 (a text, b text, c text, d text, e text, f text, g text, h text, i text, j text, k text, l text, m text, n text, o text, p text, q text, r text, s text, t text, u text, v text, w text, x text, y text, z text);

insert into t1 select i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i from (select repeat('123456789', 10000) from generate_series(1,2000)) i;
drop table if exists t2;
\timing
create table t2 as select * from t1;
\timing

Here are the timing results:

With the patch,

Time: 4728.142 ms (00:04.728)
Time: 14203.983 ms (00:14.204)
Time: 1008.669 ms (00:01.009)

Baseline,
Time: 11096.146 ms (00:11.096)
Time: 13106.741 ms (00:13.107)
Time: 1100.174 ms (00:01.100)

While for toast and large column size there is < 10% decrease but for small column size the improvement is super good. Actually if I hardcode the batch count as 4 all test cases are better but the improvement for small column size is smaller than that with current patch. Pretty much the number 4 is quite case specific so I can not hardcode that in the patch. Of course we could further tune that but the current value seems to be a good trade-off?

Thanks.

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Karl O. Pinc"
Дата:
Сообщение: Re: Patch to document base64 encoding
Следующее
От: Tom Lane
Дата:
Сообщение: Re: pg_dump is broken for partition tablespaces