Обсуждение: Creating multiple indexes in one table scan.

Поиск
Список
Период
Сортировка

Creating multiple indexes in one table scan.

От
Gurjeet Singh
Дата:
I was going though the minutes of PGCon 2012 DEveloper Meeting [1] and came across this:
    Bruce points out the even simpler case is to build several indexes in parallel over the same scan.

I thought I had posted a patch to that effect long back, but upon searching my emails apparently I forgot about the patch.

Attached is the patch that I developed in Nov. 2010, so expect a lot of bit rot. I had tried to make it elegant, but I have to admit its a hack. This patch does not imply that it is using any kind of parallelism, the context in which that above statement was made. It just helps to avoid scanning the same relation multiple times. I performed some tests on it and AFAICR, this did not produce a net win. But those tests may have been performed in a virtual machine and not on a bare metal, I forget.

AFAICR, this is how it works:

.) It introduces a new type of SQL statement:
    CREATE INDEX ....., CREATE INDEX ..., CREATE INDEX.

    That is, it makes it legal to join multiple CREATE INDEX statements, separated by comma.

.) During the execution phase (IndexBuildHeapScan()), it gathers all the statements in a stack-like data structure before processing any tuples from the heap.

.) When done gathering all the CREATE INDEX statements, it proceeds to process the heap.

.) For each heap tuple it calls each index's build method as usual.

Best regards,

PS: Please expect intermittent replies over the next few days.

[1] http://wiki.postgresql.org/wiki/PgCon_2012_Developer_Meeting
--
Gurjeet Singh
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Вложения

Re: Creating multiple indexes in one table scan.

От
Stephen Frost
Дата:
* Gurjeet Singh (singh.gurjeet@gmail.com) wrote:
>     Bruce points out the even simpler case is to build several indexes in
> parallel over the same scan.
>
> I thought I had posted a patch to that effect long back, but upon searching
> my emails apparently I forgot about the patch.
>
> Attached is the patch that I developed in Nov. 2010, so expect a lot of bit
> rot. I had tried to make it elegant, but I have to admit its a hack. This
> patch does not imply that it is using any kind of parallelism, the context
> in which that above statement was made. It just helps to avoid scanning the
> same relation multiple times. I performed some tests on it and AFAICR, this
> did not produce a net win. But those tests may have been performed in a
> virtual machine and not on a bare metal, I forget.

I'm not too surprised that it didn't help all that much since you're
doing everything in one backend.  My guess at what Bruce was talking
about is being able to actually have multiple CREATE INDEX's going in
parallel in different backends.  If they didn't all grab an
AccessExclusive lock, wouldn't they be able to use the same ring buffer
to read through the table anyway?  And with CREATE INDEX CONCURRENTLY,
isn't this supported already?

I haven't tried this yet, but it sure seems like something we could
probably already claim to support..
Thanks,
    Stephen

Re: Creating multiple indexes in one table scan.

От
Gurjeet Singh
Дата:
On Thu, May 24, 2012 at 11:22 AM, Stephen Frost <sfrost@snowman.net> wrote:
* Gurjeet Singh (singh.gurjeet@gmail.com) wrote:
>     Bruce points out the even simpler case is to build several indexes in
> parallel over the same scan.
>
> I thought I had posted a patch to that effect long back, but upon searching
> my emails apparently I forgot about the patch.
>
> Attached is the patch that I developed in Nov. 2010, so expect a lot of bit
> rot. I had tried to make it elegant, but I have to admit its a hack. This
> patch does not imply that it is using any kind of parallelism, the context
> in which that above statement was made. It just helps to avoid scanning the
> same relation multiple times. I performed some tests on it and AFAICR, this
> did not produce a net win. But those tests may have been performed in a
> virtual machine and not on a bare metal, I forget.

I'm not too surprised that it didn't help all that much since you're
doing everything in one backend.  My guess at what Bruce was talking
about is being able to actually have multiple CREATE INDEX's going in
parallel in different backends.  If they didn't all grab an
AccessExclusive lock, wouldn't they be able to use the same ring buffer
to read through the table anyway?  And with CREATE INDEX CONCURRENTLY,
isn't this supported already?

I haven't tried this yet, but it sure seems like something we could
probably already claim to support..

It'd be great if one of standard utilities like pg_restore supported this, by spawning every concurrent index build in separate backends. Just a thought.


--
Gurjeet Singh
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: Creating multiple indexes in one table scan.

От
Robert Haas
Дата:
On Thu, May 24, 2012 at 11:25 AM, Gurjeet Singh <singh.gurjeet@gmail.com> wrote:
> It'd be great if one of standard utilities like pg_restore supported this,
> by spawning every concurrent index build in separate backends. Just a
> thought.

If parallel restore doesn't already take this into account when doing
job scheduling, that would be a worthwhile improvement to consider.

Personally, I think the big win in this area is likely to be parallel
sort.  There may well be some more we can squeeze out of our existing
sort implementation first, and I'm all in favor of that, but
ultimately if you've got 60GB of data to sort and it's all in cache,
you want to be able to use more than one CPU for that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Creating multiple indexes in one table scan.

От
Andrew Dunstan
Дата:

On 05/24/2012 11:44 AM, Robert Haas wrote:
> On Thu, May 24, 2012 at 11:25 AM, Gurjeet Singh<singh.gurjeet@gmail.com>  wrote:
>> It'd be great if one of standard utilities like pg_restore supported this,
>> by spawning every concurrent index build in separate backends. Just a
>> thought.
> If parallel restore doesn't already take this into account when doing
> job scheduling, that would be a worthwhile improvement to consider.
>
>


Parallel restore allows it in some cases, but not in others. You can't 
create an index-backed constraint concurrently with any other index 
because ALTER TABLE takes an ACCESS EXCLUSIVE lock.

We could get around that changing pg_dump to create the index first and 
then add the constraint using the existing index.

cheers

andrew