Re: Parallel Seq Scan

Поиск
Список
Период
Сортировка
От Jim Nasby
Тема Re: Parallel Seq Scan
Дата
Msg-id 54C9486A.6050101@BlueTreble.com
обсуждение исходный текст
Ответ на Re: Parallel Seq Scan  (Stephen Frost <sfrost@snowman.net>)
Ответы Re: Parallel Seq Scan  (Stephen Frost <sfrost@snowman.net>)
Список pgsql-hackers
On 1/28/15 9:56 AM, Stephen Frost wrote:
> * Robert Haas (robertmhaas@gmail.com) wrote:
>> On Wed, Jan 28, 2015 at 10:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I thought the proposal to chunk on the basis of "each worker processes
>>> one 1GB-sized segment" should work all right.  The kernel should see that
>>> as sequential reads of different files, issued by different processes;
>>> and if it can't figure out how to process that efficiently then it's a
>>> very sad excuse for a kernel.
>
> Agreed.
>
>> I agree.  But there's only value in doing something like that if we
>> have evidence that it improves anything.  Such evidence is presently a
>> bit thin on the ground.
>
> You need an i/o subsystem that's fast enough to keep a single CPU busy,
> otherwise (as you mentioned elsewhere), you're just going to be i/o
> bound and having more processes isn't going to help (and could hurt).
>
> Such i/o systems do exist, but a single RAID5 group over spinning rust
> with a simple filter isn't going to cut it with a modern CPU- we're just
> too darn efficient to end up i/o bound in that case.  A more complex
> filter might be able to change it over to being more CPU bound than i/o
> bound and produce the performance improvments you're looking for.

Except we're nowhere near being IO efficient. The vast difference between Postgres IO rates and dd shows this. I
suspectthat's because we're not giving the OS a list of IO to perform while we're doing our thing, but that's just a
guess.

> The caveat to this is if you have multiple i/o *channels* (which it
> looks like you don't in this case) where you can parallelize across
> those channels by having multiple processes involved.

Keep in mind that multiple processes is in no way a requirement for that. Async IO would do that, or even just
requestingstuff from the OS before we need it.
 

>  We only support
> multiple i/o channels today with tablespaces and we can't span tables
> across tablespaces.  That's a problem when working with large data sets,
> but I'm hopeful that this work will eventually lead to a parallelized
> Append node that operates against a partitioned/inheirited table to work
> across multiple tablespaces.

Until we can get a single seqscan close to dd performance, I fear worrying about tablespaces and IO channels is
entirelypremature.
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: PATCH: decreasing memory needlessly consumed by array_agg
Следующее
От: Pavel Stehule
Дата:
Сообщение: pg_dump issue - push useless statements REVOKE, GRANT