Обсуждение: [HACKERS] avoid bloat from CREATE INDEX CONCURRENTLY

Поиск
Список
Период
Сортировка

[HACKERS] avoid bloat from CREATE INDEX CONCURRENTLY

От
Alvaro Herrera
Дата:
Here's another small patch, this time from Simon Riggs.  Maybe he already
posted it for this commitfest, but I didn't find it in a quick look so
here it is.

This patch reduces the amount of bloat you get from running CREATE INDEX
CONCURRENTLY by destroying the snapshot taken in the first phase, before
entering the second phase.  This allows the global xmin to advance,
letting concurrent vacuum keep bloat in other tables in check.
Currently this implements the change for btree indexes only, but doing
it for other indexes should be a one-liner.

-- 
Álvaro Herrera

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] avoid bloat from CREATE INDEX CONCURRENTLY

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> This patch reduces the amount of bloat you get from running CREATE INDEX
> CONCURRENTLY by destroying the snapshot taken in the first phase, before
> entering the second phase.  This allows the global xmin to advance,

Um ... isn't there a transaction boundary there anyway?
        regards, tom lane



Re: [HACKERS] avoid bloat from CREATE INDEX CONCURRENTLY

От
Simon Riggs
Дата:
On 28 February 2017 at 13:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> This patch reduces the amount of bloat you get from running CREATE INDEX
>> CONCURRENTLY by destroying the snapshot taken in the first phase, before
>> entering the second phase.  This allows the global xmin to advance,
>
> Um ... isn't there a transaction boundary there anyway?

Yes, the patch releases the snapshot early, so it does not hold it
once the build scan has completed. This allows the sort and build
phases to occur without holding back the xmin.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] avoid bloat from CREATE INDEX CONCURRENTLY

От
Tom Lane
Дата:
Simon Riggs <simon@2ndquadrant.com> writes:
> On 28 February 2017 at 13:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Um ... isn't there a transaction boundary there anyway?

> Yes, the patch releases the snapshot early, so it does not hold it
> once the build scan has completed. This allows the sort and build
> phases to occur without holding back the xmin.

Oh ... so Alvaro explained it badly.  The reason this is specific to
btree is that it's the only AM with any significant post-scan building
time.

However, now that I read the patch: this is a horribly ugly hack.
I really don't like the API (if it even deserves the dignity of that
name) that you've added to snapmgr.  I supposwe the zero documentation
for it fits in nicely with the fact that it's a badly-thought-out kluge.

I think it would be better to just move the responsibility for snapshot
popping in this sequence to the index AMs, full stop.
        regards, tom lane



Re: [HACKERS] avoid bloat from CREATE INDEX CONCURRENTLY

От
Simon Riggs
Дата:
On 28 February 2017 at 13:30, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
>> On 28 February 2017 at 13:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Um ... isn't there a transaction boundary there anyway?
>
>> Yes, the patch releases the snapshot early, so it does not hold it
>> once the build scan has completed. This allows the sort and build
>> phases to occur without holding back the xmin.
>
> Oh ... so Alvaro explained it badly.  The reason this is specific to
> btree is that it's the only AM with any significant post-scan building
> time.
>
> However, now that I read the patch: this is a horribly ugly hack.
> I really don't like the API (if it even deserves the dignity of that
> name) that you've added to snapmgr.  I supposwe the zero documentation
> for it fits in nicely with the fact that it's a badly-thought-out kluge.

WTF. Frankly, knowing it would generate such a ridiculously negative
response was the reason it wasn't me that submitted it and why its not
fully documented. Documentation in this case would be a short
paragraph in the index AM, explaining for the user what is already in
code comments.

You're right to point out that there is significant post-scan build
time and the reduction in bloat during that time is well worth the
trouble. I'm pleased to have thought of it and to have contributed it
to the community.

> I think it would be better to just move the responsibility for snapshot
> popping in this sequence to the index AMs, full stop.

There were two choices: a) leave the responsibility to the index AM,
giving a clean API, or b) don't trust that all index AMs would know or
implement this correctly. If the index AM doesn't implement this
correctly it becomes a crash bug, which seemed unacceptable in an
extensible server.

After implementing (a), I chose (b) and took extra time to implement
the the ugly API in preference to the possibility of a crash bug. I am
open to following consensus on that and to resubmit other patches as
required.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services