Обсуждение: Using multiple cores for index creation?

Поиск
Список
Период
Сортировка

Using multiple cores for index creation?

От
henk de wit
Дата:
Hi,

When I try to restore a database dump on PostgreSQL 8.3 that's approximately 130GB in size and takes about 1 hour, I noticed index creation makes up the bulk of that time. I'm using a very fast I/O subsystem (16 Mtron Pro 7535 SSDs using a dual 1.2Ghz IOP/4GB cache RAID controller), fast CPUs (2 quad core C2Q's at 2.6Ghz) and 32GB RAM. From monitoring the restore process, I learned that only 10 minutes is spend doing IO, while the rest of the time is spend on creating the indexes. Index creation seems to be completely CPU bound.

The problem is that only 1 CPU core is used. My other 7 cores are just sitting there doing nothing. It seems to me that creating each index, especially for different tables, is something that can be done independently.

Is there some way I can let PostgreSQL use multiple cores for creating the indexes?

Thanks in advance


Express yourself instantly with MSN Messenger! MSN Messenger

Re: Using multiple cores for index creation?

От
Robert Haas
Дата:
On Thu, Jan 29, 2009 at 3:21 PM, henk de wit <henk53602@hotmail.com> wrote:
> Hi,
> When I try to restore a database dump on PostgreSQL 8.3
> that's approximately 130GB in size and takes about 1 hour, I noticed index
> creation makes up the bulk of that time. I'm using a very fast I/O subsystem
> (16 Mtron Pro 7535 SSDs using a dual 1.2Ghz IOP/4GB cache RAID controller),
> fast CPUs (2 quad core C2Q's at 2.6Ghz) and 32GB RAM. From monitoring the
> restore process, I learned that only 10 minutes is spend doing IO, while the
> rest of the time is spend on creating the indexes. Index creation seems to
> be completely CPU bound.
> The problem is that only 1 CPU core is used. My other 7 cores are just
> sitting there doing nothing. It seems to me that creating each index,
> especially for different tables, is something that can be done
> independently.
> Is there some way I can let PostgreSQL use multiple cores for creating the
> indexes?

Andrew Dunstan has been working on this problem.  His latest parallel
restore patch can be found here:

http://archives.postgresql.org/message-id/4977E070.6070604@dunslane.net

...Robert

Re: Using multiple cores for index creation?

От
"Joshua D. Drake"
Дата:
On Thu, 2009-01-29 at 18:09 -0500, Robert Haas wrote:
> On Thu, Jan 29, 2009 at 3:21 PM, henk de wit <henk53602@hotmail.com> wrote:
> > Hi,
> > When I try to restore a database dump on PostgreSQL 8.3
> > that's approximately 130GB in size and takes about 1 hour, I noticed index
> > creation makes up the bulk of that time. I'm using a very fast I/O subsystem
> > (16 Mtron Pro 7535 SSDs using a dual 1.2Ghz IOP/4GB cache RAID controller),
> > fast CPUs (2 quad core C2Q's at 2.6Ghz) and 32GB RAM. From monitoring the
> > restore process, I learned that only 10 minutes is spend doing IO, while the
> > rest of the time is spend on creating the indexes. Index creation seems to
> > be completely CPU bound.
> > The problem is that only 1 CPU core is used. My other 7 cores are just
> > sitting there doing nothing. It seems to me that creating each index,
> > especially for different tables, is something that can be done
> > independently.
> > Is there some way I can let PostgreSQL use multiple cores for creating the
> > indexes?
>
> Andrew Dunstan has been working on this problem.  His latest parallel
> restore patch can be found here:
>
> http://archives.postgresql.org/message-id/4977E070.6070604@dunslane.net


Yeah but that isn't useful for 8.3. What can be done in this specific
situation is to make sure you dump with the -Fc option. You can then
pull a TOC out with pg_restore and break that appart. Reading the TOC is
pretty self evident. Once you get down to index creation you can create
multiple files each with a group of indexes to create. Then call
pg_restore multiple times in a script against the individual TOC and you
will use all cores.

Joshua D. Drake

P.S. Increase maintenance_work_mem can help too


>
> ...Robert
>
--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


Re: Using multiple cores for index creation?

От
henk de wit
Дата:
Hi,

> You can then
> pull a TOC out with pg_restore and break that appart. Reading the TOC is
> pretty self evident. Once you get down to index creation you can create
> multiple files each with a group of indexes to create. Then call
> pg_restore multiple times in a script against the individual TOC and you
> will use all cores.

I figured something like this would be possible. Thanks for the explanation. Ultimately I'm looking for something more automatic though. Not that I personally mind doing the above, but when an emergency restore is needed for some production server manually editing a dump is probably not the safest course of action ;)

It sounds like something though that a tool could do automatically. The mentioned patch sounds interesting too, is there anything known about whether this patch will make it into the main stream Postgres source? I guess it's too late for inclusion in PostgreSQL 8.4, but 8.5 perhaps?

Kind regards


What can you do with the new Windows Live? Find out

Re: Using multiple cores for index creation?

От
Tom Lane
Дата:
"Joshua D. Drake" <jd@commandprompt.com> writes:
> On Thu, 2009-01-29 at 18:09 -0500, Robert Haas wrote:
>> Andrew Dunstan has been working on this problem.  His latest parallel
>> restore patch can be found here:
>>
>> http://archives.postgresql.org/message-id/4977E070.6070604@dunslane.net

> Yeah but that isn't useful for 8.3.

Sure it is.  Andrew has made a point of making sure that the improved
version of pg_restore can work against older servers (not sure how far
back, but it's definitely supposed to work with 8.3).

> What can be done in this specific
> situation is to make sure you dump with the -Fc option...

You're essentially proposing a manual reimplementation of Andrew's
patch ...

            regards, tom lane

Re: Using multiple cores for index creation?

От
Scott Carey
Дата:
It may not be that possible with your schema, but when I had to restore a 2.5TB database (with plenty fast I/O, it was never I/O bound) I used 3 or 4 copies of pg_restore operating on different tables.

With the –Fc option, like a plain dump you can have it restore just certain schemas or tables per command.  A bit of manual work but you don’t have to modify the dump file(s), just make various differing commands that operate on different sections of the database.  How hard that is depend on the schema.  In my case, we have most of the data in partitioned tables and can use a regex to peel off chunks of them by date to restore in different processes once the parent tables and schema are in place.

It still took all day though, and it wasn’t I/O bound.

On 1/29/09 12:21 PM, "henk de wit" <henk53602@hotmail.com> wrote:

Hi,

When I try to restore a database dump on PostgreSQL 8.3 that's approximately 130GB in size and takes about 1 hour, I noticed index creation makes up the bulk of that time. I'm using a very fast I/O subsystem (16 Mtron Pro 7535 SSDs using a dual 1.2Ghz IOP/4GB cache RAID controller), fast CPUs (2 quad core C2Q's at 2.6Ghz) and 32GB RAM. From monitoring the restore process, I learned that only 10 minutes is spend doing IO, while the rest of the time is spend on creating the indexes. Index creation seems to be completely CPU bound.

The problem is that only 1 CPU core is used. My other 7 cores are just sitting there doing nothing. It seems to me that creating each index, especially for different tables, is something that can be done independently.

Is there some way I can let PostgreSQL use multiple cores for creating the indexes?

Thanks in advance


Express yourself instantly with MSN Messenger! MSN Messenger <http://clk.atdmt.com/AVE/go/onm00200471ave/direct/01/>