Re: Benchmark Data requested

Поиск
Список
Период
Сортировка
От Greg Smith
Тема Re: Benchmark Data requested
Дата
Msg-id Pine.GSO.4.64.0802052211020.6216@westnet.com
обсуждение исходный текст
Ответ на Re: Benchmark Data requested  (Simon Riggs <simon@2ndquadrant.com>)
Ответы Re: Benchmark Data requested  (NikhilS <nikkhils@gmail.com>)
Re: Benchmark Data requested  (Dimitri Fontaine <dfontaine@hi-media.com>)
Список pgsql-performance
On Tue, 5 Feb 2008, Simon Riggs wrote:

> On Tue, 2008-02-05 at 15:50 -0500, Jignesh K. Shah wrote:
>>
>> Even if it is a single core, the mere fact that the loading process will
>> eventually wait for a read from the input file which cannot be
>> non-blocking, the OS can timeslice it well for the second process to use
>> those wait times for the index population work.
>
> If Dimitri is working on parallel load, why bother?

pgloader is a great tool for a lot of things, particularly if there's any
chance that some of your rows will get rejected.  But the way things pass
through the Python/psycopg layer made it uncompetative (more than 50%
slowdown) against the straight COPY path from a rows/second perspective
the last time (V2.1.0?) I did what I thought was a fair test of it (usual
caveat of "with the type of data I was loading").  Maybe there's been some
gigantic improvement since then, but it's hard to beat COPY when you've
got an API layer or two in the middle.

I suspect what will end up happening is that a parallel loading pgloader
will scale something like this:

1 CPU:  Considerably slower than COPY
2-3 CPUs: Close to even with COPY
4+ CPUs:  Faster than COPY

Maybe I'm wrong, but I wouldn't abandon looking into another approach
until that territory is mapped out a bit better.

Given the very large number of dual-core systems out there now relative to
those with more, optimizing the straight COPY path with any way to take
advantage of even one more core to things like index building is well
worth doing.  Heikki's idea sounded good to me regardless, and if that can
be separated out enough to get a second core into the index building at
the same time so much the better.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

В списке pgsql-performance по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: Benchmark Data requested
Следующее
От: NikhilS
Дата:
Сообщение: Re: Benchmark Data requested