Re: Parallel Seq Scan

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: Parallel Seq Scan
Дата
Msg-id 20150129012721.GB3854@tamriel.snowman.net
обсуждение исходный текст
Ответ на Re: Parallel Seq Scan  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Ответы Re: Parallel Seq Scan  (Robert Haas <robertmhaas@gmail.com>)
Re: Parallel Seq Scan  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Список pgsql-hackers
Jim,

* Jim Nasby (Jim.Nasby@BlueTreble.com) wrote:
> On 1/28/15 9:56 AM, Stephen Frost wrote:
> >Such i/o systems do exist, but a single RAID5 group over spinning rust
> >with a simple filter isn't going to cut it with a modern CPU- we're just
> >too darn efficient to end up i/o bound in that case.  A more complex
> >filter might be able to change it over to being more CPU bound than i/o
> >bound and produce the performance improvments you're looking for.
>
> Except we're nowhere near being IO efficient. The vast difference between Postgres IO rates and dd shows this. I
suspectthat's because we're not giving the OS a list of IO to perform while we're doing our thing, but that's just a
guess.

Uh, huh?  The dd was ~321000 and the slowest uncached PG run from
Robert's latest tests was 337312.554, based on my inbox history at
least.  I don't consider ~4-5% difference to be vast.

> >The caveat to this is if you have multiple i/o *channels* (which it
> >looks like you don't in this case) where you can parallelize across
> >those channels by having multiple processes involved.
>
> Keep in mind that multiple processes is in no way a requirement for that. Async IO would do that, or even just
requestingstuff from the OS before we need it. 

While I agree with this in principle, experience has shown that it
doesn't tend to work out as well as we'd like with a single process.

> > We only support
> >multiple i/o channels today with tablespaces and we can't span tables
> >across tablespaces.  That's a problem when working with large data sets,
> >but I'm hopeful that this work will eventually lead to a parallelized
> >Append node that operates against a partitioned/inheirited table to work
> >across multiple tablespaces.
>
> Until we can get a single seqscan close to dd performance, I fear worrying about tablespaces and IO channels is
entirelypremature. 

I feel like one of us is misunderstanding the numbers, which is probably
in part because they're a bit piecemeal over email, but the seqscan
speed in this case looks pretty close to dd performance for this
particular test, when things are uncached.  Cached numbers are
different, but that's not what we're discussing here, I don't think.

Don't get me wrong- I've definitely seen cases where we're CPU bound
because of complex filters, etc, but that doesn't seem to be the case
here.
Thanks!
    Stephen

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Giuseppe Broccolo
Дата:
Сообщение: Re: File based Incremental backup v7
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: compiler warnings in copy.c