Re: Parallel copy

Поиск
Список
Период
Сортировка
От Bharath Rupireddy
Тема Re: Parallel copy
Дата
Msg-id CALj2ACWeQVd-xoQZHGT01_33St4xPoZQibWz46o7jW1PE3XOqQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Parallel copy  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
I did performance testing on v7 patch set[1] with custom
postgresql.conf[2]. The results are of the triplet form (exec time in
sec, number of workers, gain)

Use case 1: 10million rows, 5.2GB data, 2 indexes on integer columns,
1 index on text column, binary file
(1104.898, 0, 1X), (1112.221, 1, 1X), (640.236, 2, 1.72X), (335.090,
4, 3.3X), (200.492, 8, 5.51X), (131.448, 16, 8.4X), (121.832, 20,
9.1X), (124.287, 30, 8.9X)

Use case 2: 10million rows, 5.2GB data,2 indexes on integer columns, 1
index on text column, copy from stdin, csv format
(1203.282, 0, 1X), (1135.517, 1, 1.06X), (655.140, 2, 1.84X),
(343.688, 4, 3.5X), (203.742, 8, 5.9X), (144.793, 16, 8.31X),
(133.339, 20, 9.02X), (136.672, 30, 8.8X)

Use case 3: 10million rows, 5.2GB data,2 indexes on integer columns, 1
index on text column, text file
(1165.991, 0, 1X), (1128.599, 1, 1.03X), (644.793, 2, 1.81X),
(342.813, 4, 3.4X), (204.279, 8, 5.71X), (139.986, 16, 8.33X),
(128.259, 20, 9.1X), (132.764, 30, 8.78X)

Above results are similar to the results with earlier versions of the patch set.

On Fri, Oct 9, 2020 at 3:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Sure, you need to change the code such that when force_parallel_mode =
> 'regress' is specified then it always uses one worker. This is
> primarily for testing purposes and will help during the development of
> this patch as it will make all exiting Copy tests to use quite a good
> portion of the parallel infrastructure.
>

I performed force_parallel_mode = regress testing and found 2 issues,
the fixes for the same are available in v7 patch set[1].

>
> > Overall, we have below test cases to cover the code and for performance measurements. We plan to run these tests
whenevera new set of patches is posted.
 
> >
> > 1. csv
> > 2. binary
>
> Don't we need the tests for plain text files as well?
>

I added a text use case and above mentioned are perf results on v7 patch set[1].

>
> > 3. force parallel mode = regress
> > 4. toast data csv and binary
> > 5. foreign key check, before row, after row, before statement, after statement, instead of triggers
> > 6. partition case
> > 7. foreign partitions and partitions having trigger cases
> > 8. where clause having parallel unsafe and safe expression, default parallel unsafe and safe expression
> > 9. temp, global, local, unlogged, inherited tables cases, foreign tables
> >
>
> Sounds like good coverage. So, are you doing all this testing
> manually? How are you maintaining these tests?
>

All test cases listed above, except for the cases that are meant to
measure perf gain with huge data, are present in v7-0005 patch in v7
patch set[1].

[1] https://www.postgresql.org/message-id/CALDaNm1n1xW43neXSGs%3Dc7zt-mj%2BJHHbubWBVDYT9NfCoF8TuQ%40mail.gmail.com

[2]
shared_buffers = 40GB
max_worker_processes = 32
max_parallel_maintenance_workers = 24
max_parallel_workers = 32
synchronous_commit = off
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dilip Kumar
Дата:
Сообщение: Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables
Следующее
От: Andy Fan
Дата:
Сообщение: RelationGetNumberOfBlocks is called every time of heap_rescan.