Re: Benchmark Data requested --- pgloader CE design ideas

Поиск
Список
Период
Сортировка
От Greg Smith
Тема Re: Benchmark Data requested --- pgloader CE design ideas
Дата
Msg-id Pine.GSO.4.64.0802061041230.15780@westnet.com
обсуждение исходный текст
Ответ на Re: Benchmark Data requested --- pgloader CE design ideas  (Simon Riggs <simon@2ndquadrant.com>)
Ответы Re: Benchmark Data requested --- pgloader CE design ideas  (Luke Lonergan <llonergan@greenplum.com>)
Re: Benchmark Data requested --- pgloader CE design ideas  ("Jignesh K. Shah" <J.K.Shah@Sun.COM>)
Re: Benchmark Data requested --- pgloader CE design ideas  (Dimitri Fontaine <dfontaine@hi-media.com>)
Список pgsql-performance
On Wed, 6 Feb 2008, Simon Riggs wrote:

> For me, it would be good to see a --parallel=n parameter that would
> allow pg_loader to distribute rows in "round-robin" manner to "n"
> different concurrent COPY statements. i.e. a non-routing version.

Let me expand on this.  In many of these giant COPY situations the
bottleneck is plain old sequential I/O to a single process.  You can
almost predict how fast the rows will load using dd.  Having a process
that pulls rows in and distributes them round-robin is good, but it won't
crack that bottleneck.  The useful approaches I've seen for other
databases all presume that the data files involved are large enough that
on big hardware, you can start multiple processes running at different
points in the file and beat anything possible with a single reader.

If I'm loading a TB file, odds are good I can split that into 4 or more
vertical pieces (say rows 1-25%, 25-50%, 50-75%, 75-100%), start 4 loaders
at once, and get way more than 1 disk worth of throughput reading.  You
have to play with the exact number because if you push the split too far
you introduce seek slowdown instead of improvements, but that's the basic
design I'd like to see one day.  It's not parallel loading that's useful
for the cases I'm thinking about until something like this comes around.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

В списке pgsql-performance по дате отправления:

Предыдущее
От: Greg Smith
Дата:
Сообщение: Re: Benchmark Data requested
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Optimizer : query rewrite and execution plan ?