Re: Concurrent COPY commands

Поиск

Список

Период

Сортировка

От	Phillip Sitbon
Тема	Re: Concurrent COPY commands
Дата	9 июля 2008 г. 16:35:23
Msg-id	536685ea0807090935i2e191146y7f3acf5bba4ca0ea@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Concurrent COPY commands (Alan Hodgson <ahodgson@simkin.ca>)
Ответы	Re: Concurrent COPY commands
Список	pgsql-novice

Дерево обсуждения

Sorry about the late reply.

I only have two fast SATA drives on software RAID, but that really isn't the issue- while the copy commands are going, disk activity is relatively low. By relatively I mean that I have seen it a lot higher under certain circumstances, and I know for sure the disks aren't holding anything back. I know it's a bad comparison, but the process generating this huge amount of data can write directly to the disk very fast and still be CPU-bound, while it eventually ends up waiting for postgres when I try to pipe it into the database. I figured some overhead was to be expected and that's why I tried the parallel setup in the first place.

What I see is that after some buffering (not sure it is buffering, but after it gets some data), one postgres process will ramp up to 100% CPU (on one core) for some time, thus blocking its input FIFO. That is when the hard drive activity goes up a bit, but whatever it is doing is definitely CPU-bound on that core.

No more than one worker process does this at a time. And no matter what kind of FIFO buffers and select() calls I use, the calling process eventually gets blocked because the postgres processes don't appear to be working in parallel as well as they could be; hence, postgres doesn't take in any more data for a while. I'm really curious about why going parallel x6 is so much slower than one process when the disks aren't being pushed that hard compared to their capabilities.

I am suspecting something wrong with my config, but I can't be sure. Is 1-2 GB for work_mem ok? Would that hurt it?

On a positive note, I let the single-process version run to completion and I now have a solid TB of data that I can access and use at lightning speed :)

Cheers,

Phillip

On Wed, Jul 2, 2008 at 10:02 AM, Alan Hodgson <ahodgson@simkin.ca> wrote:

On Wednesday 02 July 2008, Phillip Sitbon <phillip@sitbon.net> wrote:
> Hello,
>
> I am running some queries that use multiple connections to issue COPY
> commands which bring data into the same table via different files (FIFOs
> to be precise). This is being done on a SMP machine and I am noticing
> that none of the postgres worker processes operate in parallel, even
> though there is data available to all of them. The performance is nearly
> exactly the same as it is for issuing a single COPY command.
> Is this
> normal behavior, even with all of the separate transactions still in
> progress? Would I be better off doing multithreaded bulk inserts from my
> C program rather than sending the data to FIFOs?

Sounds like you're I/O bound - I doubt any other concurrency mechanism will
change that much.

>
> The machine I am using has 16GB of memory and 8 cores, so I've tried to
> optimize the configuration accordingly but I am a little lost in some
> places.

Ah, but what does your RAID controller and drives look like?

--
Alan

--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice

В списке pgsql-novice по дате отправления:

Предыдущее

От: nobs@nobswolf.info (Emil Obermayr)
Дата: 09 июля 2008 г., 07:13:01
Сообщение: Re: how to get dependancies of a table?

Следующее

От: Alan Hodgson
Дата: 09 июля 2008 г., 16:45:50
Сообщение: Re: Concurrent COPY commands

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Concurrent COPY commands

Предыдущее

Следующее