Re: parallel joins, and better parallel explain

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: parallel joins, and better parallel explain
Дата
Msg-id CA+TgmobK=kcrmjJD3uvxXNSHcb48kmNpkT9Zm=v3qN5OjswQzg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: parallel joins, and better parallel explain  (Dilip Kumar <dilipbalaut@gmail.com>)
Ответы Re: parallel joins, and better parallel explain  (Dilip Kumar <dilipbalaut@gmail.com>)
Список pgsql-hackers
On Mon, Jan 4, 2016 at 4:50 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> I tried to create a inner table such that, inner table data don't fit in RAM
> (I created VM with 1GB Ram).
> Purpose of this is to make Disk scan dominant,
> and since parallel join is repeating the Disk Scan and hash table building
> of inner table, so there will be lot of Parallel I/O and it has to pay
> penalty.
>
> I think even though, inner table scanning and hash table building is
> parallel, but there will be lot of parallel I/O which will become
> bottleneck.

Hmm.  Because only 1/1024th of the hash table fits in work_mem, the
executor is going to have to write out all of the tuples that don't
belong to the first batch to a temporary file and then read them back
in.  So each backend is going to write essentially the entirety of t2
out to disk and then read it all back in again. The non-parallel case
will also write most of the table contents and then read them back in,
but at least it will only be doing that once rather than 7 times, so
it's not as bad.  Also, with fewer backends running, the non-parallel
case will have a bit more memory free for caching purposes.

> Do we need to consider the cost for parallel i/o also, i am not sure can we
> really do that... ?

It seems to me that the problem here is that you've set
max_parallel_degree to an unrealistically high value.  The query
planner is entitled to assume that the user has set
max_parallel_degree to a value which is small enough that the workers
won't be fighting too viciously with each other over resources.  It
doesn't really matter whether those resources are CPU resources or I/O
resources.  I'm wondering if your 1GB VM really even has as many as 7
vCPUs, because that would seem to be something of an unusual
configuration - and if it doesn't, then setting max_parallel_degree to
a value that high is certainly user error. Even if it does, it's still
not right to set the value as high as six unless the system also has
enough I/O bandwidth to accommodate the amount of I/O that you expect
your queries to generate, and here it seems like it probably doesn't.

To put that another way, you can always make parallel query perform
badly by telling it to use too many workers relative to the size of
the machine you have. This is no different than getting bad query
plans by configuring work_mem or effective_cache_size or any other
query planner GUC to a value that doesn't reflect the actual execution
environment.  I would only consider this to be a problem with the
parallel join patch if the chosen plan is slower even on a machine
that's big enough to justify setting max_parallel_degree=6 in the
first place.

While studying this example, I thought about whether we try to fix
this case by generating a partial hash join path only if we expect a
single batch, which would then cause the query planner to plan this
query some other way.  But after some thought I don't think that's the
right approach.  Multi-batch hash joins are in general quite a lot
slower than single-batch hash joins - and initial_cost_hashjoin knows
that - but if the system has adequate I/O bandwidth, that problem
shouldn't be any worse for a parallel hash join than it is for a
non-parallel hash join.  I think the reason you're losing here is
because the system either doesn't have as many vCPUs as the number of
worker processes you are giving it, or it has a very limited amount of
I/O bandwidth that can't handle multiple processes doing sequential
I/O at the same time - e.g. a single spinning disk, or a single SSD
plus a bunch of virtualization overhead.  But that need not be the
case.  On a system where temporary files are written to a filesystem
backend by an array of disks, you might well get some I/O parallelism.
Of course if we experiment and find that doesn't work out well for
some reason, then we've got a problem, but it doesn't seem implausible
that it might be just fine.

Another interesting question about this particular query is whether a
merge join would have been faster, especially given all Peter
Geoghegan's work to improve sort performance.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jim Nasby
Дата:
Сообщение: Re: 9.5 BLOCKER: regrole and regnamespace and quotes
Следующее
От: Robert Haas
Дата:
Сообщение: Re: remove wal_level archive