Re: GUC for cleanup indexes threshold.

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: GUC for cleanup indexes threshold.
Дата
Msg-id CAD21AoCDtRvsu750KKfB9ExhUgrSsxwAeY-UpEQ1-3Ad=jV7EA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: GUC for cleanup indexes threshold.  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: GUC for cleanup indexes threshold.
Список pgsql-hackers
On Fri, Mar 31, 2017 at 11:44 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Mar 29, 2017 at 2:23 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> I was thinking that the status of this patch is still "Needs review"
>> because I sent latest version patch[1].
>
> I think you're right.
>
> I took a look at this today.  I think there is some problem with the
> design of this patch.  I originally proposed a threshold based on the
> percentage of not-all-visible pages on the theory that we'd just skip
> looking at the indexes altogether in that case.  But that's not what
> the patch does: it only avoids the index *cleanup*, not the index
> *vacuum*.

Hmm, I guess there is misunderstanding on this thread (my English
might have confused you, sorry).

I've been proposing the patch that allows lazy vacuum skip only the
index cleanup vacuum. Because the problem reported by xu jian[1] is
that second vacuum freeze takes long time even if the table is not
modified since first vaucuum. The reason is that we cannot skip
lazy_cleanup_index even if the table is not changed at all since
previous vacuum. If the table has a garbage the lazy vacuum does
lazy_vacuum_index, on the other hand, if the table doesn't have
garbage, which means that only insertion was execued or the table was
not modified, the lazy vacuum does lazy_cleanup_index instead. Since
I'd been knew this restriction I proposed it. That's why proposing new
GUC parameter name has been "vacuum_cleanup_index_scale_factor".

> And the comments in btvacuumcleanup say this:
>
>     /*
>      * If btbulkdelete was called, we need not do anything, just return the
>      * stats from the latest btbulkdelete call.  If it wasn't called, we must
>      * still do a pass over the index, to recycle any newly-recyclable pages
>      * and to obtain index statistics.
>      *
>      * Since we aren't going to actually delete any leaf items, there's no
>      * need to go through all the vacuum-cycle-ID pushups.
>      */

In current design, we can skip lazy_cleanup_index in case where the
number of pages need to be vacuumed is less than the threshold. For
example, after frozen whole table by aggressive vacuum the vacuum can
skip scanning on the index if only a few insertion is execute on that
table. But if a transaction made a garbage on the table, this patch
cannot prevent from vacuuming on the index as you mentioned.

By skipping index cleanup we could leave recyclable pages that are not
marked as a recyclable. But it can be reclaimed by both next index
vacuum and next index cleanup becuase btvacuumpage calls
RecordFreeIndexPage for recyclable page. So I think it doesn't become
a problem.

And a next concern pointed by Peter is that we stash an XID when a
btree page is deleted, which is used to determine when it's finally
safe to recycle the page. I prevent from this situation by not
allowing lazy vacuum to skip the index cleanup when aggressive vacuum.
But since this makes the advantage of this patch weak I'm considering
better idea.

> So, if I'm reading this correctly, the only time this patch saves
> substantial work - at least in the case of a btree index - is in the
> case where there are no dead tuples at all.  But if we only want to
> avoid the work in that case, then a threshold based on the percentage
> of all-visible pages seems like the wrong thing, because the other
> stuff btvacuumcleanup() is doing doesn't have anything to do with the
> number of all-visible pages.

I thought that if the lazy vacuum skipped almost table according to
visibility map it means that the state of table is changed just a
little and we can skip updating the index statistics. Am I missing
something?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Anastasia Lubennikova
Дата:
Сообщение: Re: WIP: Covering + unique indexes.
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Allow to specify #columns in heap/index_form_tuple