Re: [HACKERS] Parallel COPY FROM execution

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема Re: [HACKERS] Parallel COPY FROM execution
Дата
Msg-id CAFj8pRCH6X+RwZOg_a232-d76MzVvhfWHBFg-aeuHsNsanmwFg@mail.gmail.com
обсуждение исходный текст
Ответ на [HACKERS] Parallel COPY FROM execution  (Alex K <kondratov.aleksey@gmail.com>)
Ответы Re: [HACKERS] Parallel COPY FROM execution  (Alex K <kondratov.aleksey@gmail.com>)
Список pgsql-hackers


2017-06-30 14:23 GMT+02:00 Alex K <kondratov.aleksey@gmail.com>:
Greetings pgsql-hackers,

I am a GSOC student this year, my initial proposal has been discussed
in the following thread
https://www.postgresql.org/message-id/flat/7179F2FD-49CE-4093-AE14-1B26C5DFB0DA%40gmail.com

Patch with COPY FROM errors handling seems to be quite finished, so
I have started thinking about parallelism in COPY FROM, which is the next
point in my proposal.

In order to understand are there any expensive calls in COPY, which
can be executed in parallel, I did a small research. First, please, find
flame graph of the most expensive copy.c calls during the 'COPY FROM file'
attached (copy_from.svg). It reveals, that inevitably serial operations like
CopyReadLine (<15%), heap_multi_insert (~15%) take less than 50% of
time in summary, while remaining operations like heap_form_tuple and
multiple checks inside NextCopyFrom probably can be executed well in parallel.

Second, I have compared an execution time of 'COPY FROM a single large
file (~300 MB, 50000000 lines)' vs. 'COPY FROM four equal parts of the
original file executed in the four parallel processes'. Though it is a
very rough test, it helps to obtain an overall estimation:

Serial:
real 0m56.571s
user 0m0.005s
sys 0m0.006s

Parallel (x4):
real 0m22.542s
user 0m0.015s
sys 0m0.018s

Thus, it results in a ~60% performance boost per each x2 multiplication of
parallel processes, which is consistent with the initial estimation.


the important use case is big table with lot of indexes. Did you test similar case?

Regards

Pavel


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alex K
Дата:
Сообщение: [HACKERS] Parallel COPY FROM execution
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: [HACKERS] CREATE COLLATION definitional questions for ICU