Обсуждение: [HACKERS] avoid bloat from CREATE INDEX CONCURRENTLY

Поиск

Список

Период

Сортировка

[HACKERS] avoid bloat from CREATE INDEX CONCURRENTLY

От

Alvaro Herrera

Дата:

28 февраля 2017 г., 08:23:02

Here's another small patch, this time from Simon Riggs.  Maybe he already
posted it for this commitfest, but I didn't find it in a quick look so
here it is.

This patch reduces the amount of bloat you get from running CREATE INDEX
CONCURRENTLY by destroying the snapshot taken in the first phase, before
entering the second phase.  This allows the global xmin to advance,
letting concurrent vacuum keep bloat in other tables in check.
Currently this implements the change for btree indexes only, but doing
it for other indexes should be a one-liner.

-- 
Álvaro Herrera

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

avoid_bloat_from_cic.v1.patch

Re: [HACKERS] avoid bloat from CREATE INDEX CONCURRENTLY

От

Tom Lane

Дата:

28 февраля 2017 г., 16:05:16

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> This patch reduces the amount of bloat you get from running CREATE INDEX
> CONCURRENTLY by destroying the snapshot taken in the first phase, before
> entering the second phase.  This allows the global xmin to advance,

Um ... isn't there a transaction boundary there anyway?
        regards, tom lane

Re: [HACKERS] avoid bloat from CREATE INDEX CONCURRENTLY

От

Simon Riggs

Дата:

28 февраля 2017 г., 16:23:05

On 28 February 2017 at 13:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> This patch reduces the amount of bloat you get from running CREATE INDEX
>> CONCURRENTLY by destroying the snapshot taken in the first phase, before
>> entering the second phase.  This allows the global xmin to advance,
>
> Um ... isn't there a transaction boundary there anyway?

Yes, the patch releases the snapshot early, so it does not hold it
once the build scan has completed. This allows the sort and build
phases to occur without holding back the xmin.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] avoid bloat from CREATE INDEX CONCURRENTLY

От

Tom Lane

Дата:

28 февраля 2017 г., 16:30:43

Simon Riggs <simon@2ndquadrant.com> writes:
> On 28 February 2017 at 13:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Um ... isn't there a transaction boundary there anyway?

> Yes, the patch releases the snapshot early, so it does not hold it
> once the build scan has completed. This allows the sort and build
> phases to occur without holding back the xmin.

Oh ... so Alvaro explained it badly.  The reason this is specific to
btree is that it's the only AM with any significant post-scan building
time.

However, now that I read the patch: this is a horribly ugly hack.
I really don't like the API (if it even deserves the dignity of that
name) that you've added to snapmgr.  I supposwe the zero documentation
for it fits in nicely with the fact that it's a badly-thought-out kluge.

I think it would be better to just move the responsibility for snapshot
popping in this sequence to the index AMs, full stop.
        regards, tom lane

Re: [HACKERS] avoid bloat from CREATE INDEX CONCURRENTLY

От

Simon Riggs

Дата:

28 февраля 2017 г., 20:54:58

On 28 February 2017 at 13:30, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
>> On 28 February 2017 at 13:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Um ... isn't there a transaction boundary there anyway?
>
>> Yes, the patch releases the snapshot early, so it does not hold it
>> once the build scan has completed. This allows the sort and build
>> phases to occur without holding back the xmin.
>
> Oh ... so Alvaro explained it badly.  The reason this is specific to
> btree is that it's the only AM with any significant post-scan building
> time.
>
> However, now that I read the patch: this is a horribly ugly hack.
> I really don't like the API (if it even deserves the dignity of that
> name) that you've added to snapmgr.  I supposwe the zero documentation
> for it fits in nicely with the fact that it's a badly-thought-out kluge.

WTF. Frankly, knowing it would generate such a ridiculously negative
response was the reason it wasn't me that submitted it and why its not
fully documented. Documentation in this case would be a short
paragraph in the index AM, explaining for the user what is already in
code comments.

You're right to point out that there is significant post-scan build
time and the reduction in bloat during that time is well worth the
trouble. I'm pleased to have thought of it and to have contributed it
to the community.

> I think it would be better to just move the responsibility for snapshot
> popping in this sequence to the index AMs, full stop.

There were two choices: a) leave the responsibility to the index AM,
giving a clean API, or b) don't trust that all index AMs would know or
implement this correctly. If the index AM doesn't implement this
correctly it becomes a crash bug, which seemed unacceptable in an
extensible server.

After implementing (a), I chose (b) and took extra time to implement
the the ugly API in preference to the possibility of a crash bug. I am
open to following consensus on that and to resubmit other patches as
required.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: [HACKERS] avoid bloat from CREATE INDEX CONCURRENTLY

[HACKERS] avoid bloat from CREATE INDEX CONCURRENTLY

Вложения

Re: [HACKERS] avoid bloat from CREATE INDEX CONCURRENTLY

Re: [HACKERS] avoid bloat from CREATE INDEX CONCURRENTLY

Re: [HACKERS] avoid bloat from CREATE INDEX CONCURRENTLY

Re: [HACKERS] avoid bloat from CREATE INDEX CONCURRENTLY