Re: [HACKERS] Parallel Hash take II

Поиск
Список
Период
Сортировка
От Rushabh Lathia
Тема Re: [HACKERS] Parallel Hash take II
Дата
Msg-id CAGPqQf3GyCnDbDFRs4Le4e=dt4drNbh=SDXTkZsMe0nDXAAg2A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Parallel Hash take II  (Thomas Munro <thomas.munro@enterprisedb.com>)
Ответы Re: [HACKERS] Parallel Hash take II  (Thomas Munro <thomas.munro@enterprisedb.com>)
Список pgsql-hackers
While re-basing the parallel-B-tree-index-build patch on top v22 patch
sets, found cosmetic review:

1) BufFileSetEstimate is removed but it's still into buffile.h

+extern size_t BufFileSetEstimate(int stripes);


2) BufFileSetCreate is renamed with BufFileSetInit, but used at below
place in comments:

* Attach to a set of named BufFiles that was created with BufFileSetCreate.

Thanks,

On Wed, Oct 25, 2017 at 11:33 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Tue, Oct 24, 2017 at 10:10 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Here is an updated patch set that does that ^.

It's a bit hard to understand what's going on with the v21 patch set I
posted yesterday because EXPLAIN ANALYZE doesn't tell you anything
interesting.  Also, if you apply the multiplex_gather patch[1] I
posted recently and set multiplex_gather to off then it doesn't tell
you anything at all, because the leader has no hash table (I suppose
that could happen with unpatched master given sufficiently bad
timing).  Here's a new version with an extra patch that adds some
basic information about load balancing to EXPLAIN ANALYZE, inspired by
what commit bf11e7ee did for Sort.

Example output:

enable_parallel_hash = on, multiplex_gather = on:

 ->  Parallel Hash (actual rows=1000000 loops=3)
       Buckets: 131072  Batches: 16
       Leader:    Shared Memory Usage: 3552kB  Hashed: 396120  Batches Probed: 7
       Worker 0:  Shared Memory Usage: 3552kB  Hashed: 276640  Batches Probed: 6
       Worker 1:  Shared Memory Usage: 3552kB  Hashed: 327240  Batches Probed: 6
       ->  Parallel Seq Scan on simple s (actual rows=333333 loops=3)

 ->  Parallel Hash (actual rows=10000000 loops=8)
       Buckets: 131072  Batches: 256
       Leader:    Shared Memory Usage: 2688kB  Hashed: 1347720
Batches Probed: 36
       Worker 0:  Shared Memory Usage: 2688kB  Hashed: 1131360
Batches Probed: 33
       Worker 1:  Shared Memory Usage: 2688kB  Hashed: 1123560
Batches Probed: 38
       Worker 2:  Shared Memory Usage: 2688kB  Hashed: 1231920
Batches Probed: 38
       Worker 3:  Shared Memory Usage: 2688kB  Hashed: 1272720
Batches Probed: 34
       Worker 4:  Shared Memory Usage: 2688kB  Hashed: 1234800
Batches Probed: 33
       Worker 5:  Shared Memory Usage: 2688kB  Hashed: 1294680
Batches Probed: 37
       Worker 6:  Shared Memory Usage: 2688kB  Hashed: 1363240
Batches Probed: 35
       ->  Parallel Seq Scan on big s2 (actual rows=1250000 loops=8)

enable_parallel_hash = on, multiplex_gather = off (ie no leader participation):

 ->  Parallel Hash (actual rows=1000000 loops=2)
       Buckets: 131072  Batches: 16
       Worker 0:  Shared Memory Usage: 3520kB  Hashed: 475920  Batches Probed: 9
       Worker 1:  Shared Memory Usage: 3520kB  Hashed: 524080  Batches Probed: 8
       ->  Parallel Seq Scan on simple s (actual rows=500000 loops=2)

enable_parallel_hash = off, multiplex_gather = on:

 ->  Hash (actual rows=1000000 loops=3)
       Buckets: 131072  Batches: 16
       Leader:    Memory Usage: 3227kB
       Worker 0:  Memory Usage: 3227kB
       Worker 1:  Memory Usage: 3227kB
       ->  Seq Scan on simple s (actual rows=1000000 loops=3)

enable_parallel_hash = off, multiplex_gather = off:

 ->  Hash (actual rows=1000000 loops=2)
       Buckets: 131072  Batches: 16
       Worker 0:  Memory Usage: 3227kB
       Worker 1:  Memory Usage: 3227kB
       ->  Seq Scan on simple s (actual rows=1000000 loops=2)

parallelism disabled (traditional single-line output, unchanged):

 ->  Hash (actual rows=1000000 loops=1)
       Buckets: 131072  Batches: 16  Memory Usage: 3227kB
       ->  Seq Scan on simple s (actual rows=1000000 loops=1)

(It actually says "Tuples Hashed", not "Hashed" but I edited the above
to fit on a standard punchcard.)  Thoughts?

[1] https://www.postgresql.org/message-id/CAEepm%3D2U%2B%2BLp3bNTv2Bv_kkr5NE2pOyHhxU%3DG0YTa4ZhSYhHiw%40mail.gmail.com



--
Rushabh Lathia

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Rushabh Lathia
Дата:
Сообщение: Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Следующее
От: Robert Haas
Дата:
Сообщение: Re: [HACKERS] path toward faster partition pruning