Обсуждение: Creating multiple indexes in one table scan.
I was going though the minutes of PGCon 2012 DEveloper Meeting [1] and came across this:
Bruce points out the even simpler case is to build several indexes in parallel over the same scan.
I thought I had posted a patch to that effect long back, but upon searching my emails apparently I forgot about the patch.
Attached is the patch that I developed in Nov. 2010, so expect a lot of bit rot. I had tried to make it elegant, but I have to admit its a hack. This patch does not imply that it is using any kind of parallelism, the context in which that above statement was made. It just helps to avoid scanning the same relation multiple times. I performed some tests on it and AFAICR, this did not produce a net win. But those tests may have been performed in a virtual machine and not on a bare metal, I forget.
AFAICR, this is how it works:
.) It introduces a new type of SQL statement:
CREATE INDEX ....., CREATE INDEX ..., CREATE INDEX.
That is, it makes it legal to join multiple CREATE INDEX statements, separated by comma.
.) During the execution phase (IndexBuildHeapScan()), it gathers all the statements in a stack-like data structure before processing any tuples from the heap.
.) When done gathering all the CREATE INDEX statements, it proceeds to process the heap.
.) For each heap tuple it calls each index's build method as usual.
Best regards,
PS: Please expect intermittent replies over the next few days.
[1] http://wiki.postgresql.org/wiki/PgCon_2012_Developer_Meeting
--
Bruce points out the even simpler case is to build several indexes in parallel over the same scan.
I thought I had posted a patch to that effect long back, but upon searching my emails apparently I forgot about the patch.
Attached is the patch that I developed in Nov. 2010, so expect a lot of bit rot. I had tried to make it elegant, but I have to admit its a hack. This patch does not imply that it is using any kind of parallelism, the context in which that above statement was made. It just helps to avoid scanning the same relation multiple times. I performed some tests on it and AFAICR, this did not produce a net win. But those tests may have been performed in a virtual machine and not on a bare metal, I forget.
AFAICR, this is how it works:
.) It introduces a new type of SQL statement:
CREATE INDEX ....., CREATE INDEX ..., CREATE INDEX.
That is, it makes it legal to join multiple CREATE INDEX statements, separated by comma.
.) During the execution phase (IndexBuildHeapScan()), it gathers all the statements in a stack-like data structure before processing any tuples from the heap.
.) When done gathering all the CREATE INDEX statements, it proceeds to process the heap.
.) For each heap tuple it calls each index's build method as usual.
Best regards,
PS: Please expect intermittent replies over the next few days.
[1] http://wiki.postgresql.org/wiki/PgCon_2012_Developer_Meeting
--
Gurjeet Singh
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Вложения
* Gurjeet Singh (singh.gurjeet@gmail.com) wrote: > Bruce points out the even simpler case is to build several indexes in > parallel over the same scan. > > I thought I had posted a patch to that effect long back, but upon searching > my emails apparently I forgot about the patch. > > Attached is the patch that I developed in Nov. 2010, so expect a lot of bit > rot. I had tried to make it elegant, but I have to admit its a hack. This > patch does not imply that it is using any kind of parallelism, the context > in which that above statement was made. It just helps to avoid scanning the > same relation multiple times. I performed some tests on it and AFAICR, this > did not produce a net win. But those tests may have been performed in a > virtual machine and not on a bare metal, I forget. I'm not too surprised that it didn't help all that much since you're doing everything in one backend. My guess at what Bruce was talking about is being able to actually have multiple CREATE INDEX's going in parallel in different backends. If they didn't all grab an AccessExclusive lock, wouldn't they be able to use the same ring buffer to read through the table anyway? And with CREATE INDEX CONCURRENTLY, isn't this supported already? I haven't tried this yet, but it sure seems like something we could probably already claim to support.. Thanks, Stephen
On Thu, May 24, 2012 at 11:22 AM, Stephen Frost <sfrost@snowman.net> wrote:
It'd be great if one of standard utilities like pg_restore supported this, by spawning every concurrent index build in separate backends. Just a thought.
* Gurjeet Singh (singh.gurjeet@gmail.com) wrote:I'm not too surprised that it didn't help all that much since you're
> Bruce points out the even simpler case is to build several indexes in
> parallel over the same scan.
>
> I thought I had posted a patch to that effect long back, but upon searching
> my emails apparently I forgot about the patch.
>
> Attached is the patch that I developed in Nov. 2010, so expect a lot of bit
> rot. I had tried to make it elegant, but I have to admit its a hack. This
> patch does not imply that it is using any kind of parallelism, the context
> in which that above statement was made. It just helps to avoid scanning the
> same relation multiple times. I performed some tests on it and AFAICR, this
> did not produce a net win. But those tests may have been performed in a
> virtual machine and not on a bare metal, I forget.
doing everything in one backend. My guess at what Bruce was talking
about is being able to actually have multiple CREATE INDEX's going in
parallel in different backends. If they didn't all grab an
AccessExclusive lock, wouldn't they be able to use the same ring buffer
to read through the table anyway? And with CREATE INDEX CONCURRENTLY,
isn't this supported already?
I haven't tried this yet, but it sure seems like something we could
probably already claim to support..
It'd be great if one of standard utilities like pg_restore supported this, by spawning every concurrent index build in separate backends. Just a thought.
--
Gurjeet Singh
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
On Thu, May 24, 2012 at 11:25 AM, Gurjeet Singh <singh.gurjeet@gmail.com> wrote: > It'd be great if one of standard utilities like pg_restore supported this, > by spawning every concurrent index build in separate backends. Just a > thought. If parallel restore doesn't already take this into account when doing job scheduling, that would be a worthwhile improvement to consider. Personally, I think the big win in this area is likely to be parallel sort. There may well be some more we can squeeze out of our existing sort implementation first, and I'm all in favor of that, but ultimately if you've got 60GB of data to sort and it's all in cache, you want to be able to use more than one CPU for that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 05/24/2012 11:44 AM, Robert Haas wrote: > On Thu, May 24, 2012 at 11:25 AM, Gurjeet Singh<singh.gurjeet@gmail.com> wrote: >> It'd be great if one of standard utilities like pg_restore supported this, >> by spawning every concurrent index build in separate backends. Just a >> thought. > If parallel restore doesn't already take this into account when doing > job scheduling, that would be a worthwhile improvement to consider. > > Parallel restore allows it in some cases, but not in others. You can't create an index-backed constraint concurrently with any other index because ALTER TABLE takes an ACCESS EXCLUSIVE lock. We could get around that changing pg_dump to create the index first and then add the constraint using the existing index. cheers andrew