Inserting heap tuples in bulk in COPY

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Inserting heap tuples in bulk in COPY
Дата
Msg-id 4E457C22.9010708@enterprisedb.com
обсуждение исходный текст
Ответы Re: Inserting heap tuples in bulk in COPY  (Gurjeet Singh <singh.gurjeet@gmail.com>)
Re: Inserting heap tuples in bulk in COPY  (Florian Pflug <fgp@phlo.org>)
Re: Inserting heap tuples in bulk in COPY  (Robert Haas <robertmhaas@gmail.com>)
Re: Inserting heap tuples in bulk in COPY  (Simon Riggs <simon@2ndQuadrant.com>)
Re: Inserting heap tuples in bulk in COPY  (Merlin Moncure <mmoncure@gmail.com>)
Re: Inserting heap tuples in bulk in COPY  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
COPY is slow. Let's make it faster. One obvious optimization is to
insert heap tuples in bigger chunks, instead of calling heap_insert()
separately for every tuple. That saves the overhead of pinning and
locking the buffer for every tuple, and you only need to write one WAL
record for all the tuples written to the same page, instead of one for
each tuple.

Attached is a WIP patch to do that. It adds a new function,
heap_multi_insert, which does the same thing as heap_insert, but works
in bulk. It takes an array of tuples as argument, and tries to cram as
many of them into the chosen targe page as it can, and only writes a
single WAL record of the operation.

This gives a significant speedup to COPY, particularly for narrow
tables, with small tuples. Grouping multiple tuples into one WAL record
reduces the WAL volume significantly, and the time spent in writing that
WAL. The reduced overhead of repeatedly locking the buffer is also most
noticeable on narrow tables. On wider tables, the effects are smaller.
See copytest-results.txt, containing test results with three tables of
different widths. The scripts used to get those numbers are also attached.

Triggers complicate this. I believe it is only safe to group tuples
together like this if the table has no triggers. A BEFORE ROW trigger
might run a SELECT on the table being copied to, and check if some of
the tuples we're about to insert exist. If we run BEFORE ROW triggers
for a bunch of tuples first, and only then insert them, none of the
trigger invocations will see the other rows as inserted yet. Similarly,
if we run AFTER ROW triggers after inserting a bunch of tuples, the
trigger for each of the insertions would see all the inserted rows. So
at least for now, the patch simply falls back to inserting one row at a
time if there are any triggers on the table.

The patch is WIP, mainly because I didn't write the WAL replay routines
yet, but please let me know if you see any issues.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Kevin Grittner"
Дата:
Сообщение: Re: VACUUM FULL versus system catalog cache invalidation
Следующее
От: "Kevin Grittner"
Дата:
Сообщение: Re: VACUUM FULL versus system catalog cache invalidation