Making "COPY partitioned_table FROM" faster
От | David Rowley |
---|---|
Тема | Making "COPY partitioned_table FROM" faster |
Дата | |
Msg-id | CAKJS1f93DeHN+9RrD9jYn0iF_o89w2B+U8-Ao5V1kd8Cf7oSGQ@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Making "COPY partitioned_table FROM" faster
(David Rowley <david.rowley@2ndquadrant.com>)
Re: Making "COPY partitioned_table FROM" faster (Karen Huddleston <khuddleston@pivotal.io>) |
Список | pgsql-hackers |
I was looking at the COPY FROM performance gap between bulk loads with partitioned tables vs non-partitioned tables. There's quite a gap! Almost twice as slow in my test. It seems to be mostly down to lack of usage of heap_multi_insert() for the partitioned table case, which I guess is because we can only do that into a single heap. I didn't really see any reason not do when the partition for this tuple is the same as the one for the last tuple. Such cases may well be quite common, especially so in time series data stored in RANGE partitioned tables. I've implemented this in the attached. Performance is much better when the rows are located in the same partition. I've also tested the worst case; when the partition changes on each row. That's now slightly slower. Although, if we're worried about that we could probably make the insert-method adaptive, and only enable multi-inserts if the partition remains the same for X consecutive tuples then have it revert back to single inserts when the partition changed X times after X tuples, where X is some number above 1, say 10? I've not done that. I'm not sure it's worthwhile. This patch seems fairly simple, only touching copy.c. I think it's a good candidate for July's 'fest. src/backend/commands/copy.c | 225 +++++++++++++++++++++++++++++++++++--------- 1 file changed, 182 insertions(+), 43 deletions(-) Benchmarks below: Setup: -- non-partitioned (control) CREATE TABLE partbench_ (date TIMESTAMP NOT NULL, i1 INT NOT NULL, i2 INT NOT NULL, i3 INT NOT NULL, i4 INT NOT NULL, i5 INT NOT NULL); -- 10k parts CREATE TABLE partbench (date TIMESTAMP NOT NULL, i1 INT NOT NULL, i2 INT NOT NULL, i3 INT NOT NULL, i4 INT NOT NULL, i5 INT NOT NULL) PARTITION BY RANGE (date); \o /dev/null select 'CREATE TABLE partbench' || x::text || ' PARTITION OF partbench FOR VALUES FROM (''' || '2017-03-06'::date + (x::text || ' hours')::interval || ''') TO (''' || '2017-03-06'::date + ((x+1)::text || ' hours')::interval || ''');' from generate_Series(0,9999) x; \gexec \o Test: -- Time loading of 1GB of data. \timing on copy partbench_ from program $$perl ~/partbench.pl$$ delimiter '|'; truncate table partbench_; copy partbench from program $$perl ~/partbench.pl$$ delimiter '|'; truncate table partbench; copy partbench from program $$perl ~/partbench_alternate.pl$$ delimiter '|'; truncate table partbench; Unpatched: postgres=# copy partbench_ from program $$perl ~/partbench.pl$$ delimiter '|'; COPY 17825782 Time: 22669.017 ms (00:22.669) postgres=# truncate table partbench_; postgres=# copy partbench from program $$perl ~/partbench.pl$$ delimiter '|'; COPY 17825782 Time: 44095.884 ms (00:44.096) postgres=# truncate table partbench; postgres=# copy partbench from program $$perl ~/partbench_alternate.pl$$ delimiter '|'; COPY 17825782 Time: 45129.004 ms (00:45.129) postgres=# truncate table partbench; Patched: postgres=# copy partbench_ from program $$perl ~/partbench.pl$$ delimiter '|'; COPY 17825782 Time: 22701.290 ms (00:22.701) postgres=# truncate table partbench_; postgres=# copy partbench from program $$perl ~/partbench.pl$$ delimiter '|'; COPY 17825782 Time: 27721.054 ms (00:27.721) postgres=# truncate table partbench; postgres=# copy partbench from program $$perl ~/partbench_alternate.pl$$ delimiter '|'; COPY 17825782 Time: 46151.844 ms (00:46.152) postgres=# truncate table partbench; partbench.pl: for (my $i=0; $i < 8912891; $i++) { print "2018-04-26 15:00:00|1|2|3|4|5\n"; print "2018-04-26 15:00:00|1|2|3|4|5\n"; } partbench_alternate.pl: for (my $i=0; $i < 8912891; $i++) { print "2018-04-25 15:00:00|1|2|3|4|5\n"; print "2018-04-26 15:00:00|1|2|3|4|5\n"; } -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Arseny SherДата:
Сообщение: Re: Fix slot's xmin advancement and subxact's lost snapshots in decoding.