Re: pgsql: Add parallel-aware hash joins.

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: pgsql: Add parallel-aware hash joins.
Дата
Msg-id CAEepm=1c7SpPfsFtFS0gJaPYXwLu=BpCi8WX3Ws7DsuVh1KzkQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pgsql: Add parallel-aware hash joins.  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: pgsql: Add parallel-aware hash joins.
Список pgsql-hackers
On Wed, Jan 24, 2018 at 12:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> There is a very clear secular trend up in the longer data series,
> which indicates that we're testing more stuff,

+1

> which doesn't bother
> me in itself as long as the time is well spent.  However, the trend
> over the last two months is very bad, and I do not think that we can
> point to any large improvement in test coverage that someone committed
> since November.

I'm not sure if coverage.postgresql.org has a way to view historical
reports so we can see the actual percentage change, but as I recall
it, commit fa330f9ad "Add some regression tests that exercise hash
join code." pushed nodeHash.c and possibly nodeHashJoin.c into green
territory on here:

https://coverage.postgresql.org/src/backend/executor/index.html

> Looking more closely at the shorter series, there are four pretty obvious
> step changes since 2016-09.  The PNG's x-axis doesn't have enough
> resolution to match these up to commits, but looking at the underlying
> data, they clearly correspond to:
>
> Branch: master Release: REL_10_BR [b801e1200] 2016-10-18 15:57:58 -0400
>     Improve regression test coverage for hash indexes.
>
> Branch: master Release: REL_10_BR [4a8bc39b0] 2017-04-12 16:17:53 -0400
>     Speed up hash_index regression test.
>
> Branch: master [fa330f9ad] 2017-11-29 16:06:50 -0800
>     Add some regression tests that exercise hash join code.

Joining check runtimes with the commit log (see attached), I see:

 2017-11-30 | fa330f9a | Add some regression tests that exercise  | 00:08:30
 2017-11-29 | 84940644 | New C function: bms_add_range            | 00:08:18

That's +2.4%.

> Branch: master [180428404] 2017-12-21 00:43:41 -0800
>     Add parallel-aware hash joins.

 2017-12-21 | cce1ecfc | Adjust assertion in GetCurrentCommandId. | 00:09:03
 2017-12-21 | 6719b238 | Rearrange execution of PARAM_EXTERN Para |
 2017-12-21 | 8a0596cb | Get rid of copy_partition_key            |
 2017-12-21 | 9ef6aba1 | Fix typo                                 |
 2017-12-21 | c98c35cd | Avoid putting build-location-dependent s |
 2017-12-21 | 59d1e2b9 | Cancel CV sleep during subtransaction ab |
 2017-12-21 | 18042840 | Add parallel-aware hash joins.           |
 2017-12-20 | f94eec49 | When passing query strings to workers, p | 00:08:45

That's +3.4%.  That's a bit more than I expected.  I saw 2.5% on my
development box and hoped that'd be OK for a complex feature with a
lot of paths to test.

But hang on a minute -- how did we get to 08:45 from 08:30 between
those commits?  Of course this is all noisy data and individual
samples are all over the place, but I think I see some signal here:

 2017-12-20 | f94eec49 | When passing query strings to workers, p | 00:08:45
 2017-12-19 | 7d3583ad | Test instrumentation of Hash nodes with  | 00:08:43
 2017-12-19 | 8526bcb2 | Try again to fix accumulation of paralle |
 2017-12-19 | 38fc5470 | Re-fix wrong costing of Sort under Gathe | 00:08:31
 2017-12-19 | 09a65f5a | Mark a few parallelism-related variables | 00:08:27

Both 8526bcb2 and 7d3583ad add Gather rescans.  Doesn't 8526bcb2 add a
query that forks 40 times?  I wonder how long it takes to start a
background worker on a Mac Cube.  From that commit:

+         ->  Gather (actual rows=9800 loops=10)
+               Workers Planned: 4
+               Workers Launched: 4
+               ->  Parallel Seq Scan on tenk1 (actual rows=1960 loops=50)

> I thought that the hash index test case was excessively expensive for
> what it covered, and I'm now thinking the same about hash joins.

Does join-test-shrink.patch (from my earlier message) help much, on
prairiedog?  It cut "check" time by ~1.5% on my low end machine.

-- 
Thomas Munro
http://www.enterprisedb.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Re: Remove utils/dsa.h from autovacuum.c
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: Failed to request an autovacuum work-item in silence