Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

Поиск
Список
Период
Сортировка
От Rushabh Lathia
Тема Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Дата
Msg-id CAGPqQf3a79OOVX=N3eqY37aqVVw3kQdKs+RQxv8D1ga8=XM2mg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)  (Thomas Munro <thomas.munro@enterprisedb.com>)
Ответы Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers


On Wed, Mar 22, 2017 at 3:19 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Wed, Mar 22, 2017 at 10:03 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Mar 21, 2017 at 3:50 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>> I disagree with that. It is a
>> trade-off, I suppose. I have now run out of time to work through it
>> with you or Thomas, though.
>
> Bummer.

I'm going to experiment with refactoring the v10 parallel CREATE INDEX
patch to use the SharedBufFileSet interface from
hj-shared-buf-file-v8.patch today and see what problems I run into.


As per the earlier discussion in the thread, I did experiment using
BufFileSet interface from parallel-hash-v18.patchset.  I took the reference
of parallel-hash other patches to understand the BufFileSet APIs, and
incorporate the changes to parallel create index.

In order to achieve the same:

- Applied 0007-Remove-BufFile-s-isTemp-flag.patch and
0008-Add-BufFileSet-for-sharing-temporary-files-between-b.patch from the
parallel-hash-v18.patchset.
- Removed the buffile.c/logtap.c/fd.c changes from the parallel CREATE
INDEX v10 patch.
- incorporate the BufFileSet API to the parallel tuple sort for CREATE INDEX.
- Changes into few existing functions as well as added few to support the
BufFileSet changes.

To check the performance, I used the similar test which Peter posted in
his earlier thread. which is:

Machine: power2 machine with 512GB of RAM

Setup:

CREATE TABLE parallel_sort_test AS
    SELECT hashint8(i) randint,
    md5(i::text) collate "C" padding1,
    md5(i::text || '2') collate "C" padding2
    FROM generate_series(0, 1e9::bigint) i;

vacuum ANALYZE parallel_sort_test;

postgres=# show max_parallel_workers_per_gather;
 max_parallel_workers_per_gather
---------------------------------
 8
(1 row)

postgres=# show maintenance_work_mem;
 maintenance_work_mem
----------------------
 8GB
(1 row)

postgres=# show max_wal_size ;
 max_wal_size
--------------
 4GB
(1 row)

CREATE INDEX serial_idx ON parallel_sort_test (randint);

Without patch:

Time: 3430054.220 ms (57:10.054)

With patch (max_parallel_workers_maintenance  = 8):

Time: 1163445.271 ms (19:23.445)

Thanks to my colleague Thomas Munro for his help and off-line discussion
for the patch.

Attaching v11 patch and trace_sort output for the test.
 
Thanks,
Rushabh Lathia
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables
Следующее
От: Dilip Kumar
Дата:
Сообщение: Re: [HACKERS] path toward faster partition pruning